How Occam’s Razor Provides a Neat Definition of Direct Causation Alexander Gebharter & Gerhard Schurz Duesseldorf Center for Logic and Philosophy of Science University of Duesseldorf Universitaetsstrasse 1 40225 Duesseldorf, Germany Abstract goes beyond its merits as a methodological principle dic- tating that one should always decide in favor of minimal In this paper we show that the application of Oc- causal models. In particular, we show that Occam’s ra- cam’s razor to the theory of causal Bayes nets zor provides a neat definition of direct causal relatedness gives us a neat definition of direct causation. In in the sense of Woodward (2003), provided suitable in- particular we show that Occam’s razor implies tervention variables exist and CMC is satisfied. Note the Woodward’s (2003) definition of direct causa- connection of this enterprise to Zhang and Spirtes’ (2011) tion, provided suitable intervention variables ex- project. Zhang and Spirtes prove that CMC and an in- ist and the causal Markov condition (CMC) is terventionist definition of direct causation a la Woodward satisfied. We also show how Occam’s razor can (2003) together imply minimality. So Occam’s razor is account for direct causal relationships Woodward well-motivated within a manipulationist framework such as style when only stochastic intervention variables Woodward’s. We show, vice versa, that CMC and minimal- are available. ity together imply Woodward’s definition of direct causa- tion. So if one wants a neat definition of direct causation, it is reasonable to apply Occam’s razor in the sense of as- suming minimality. 1 INTRODUCTION The paper is structured as follows: In sec. 2 we introduce Occam’s razor is typically seen as a methodological prin- the notation we use in subsequent sections. In sec. 3 we ciple. There are many possible ways to apply the razor to present Woodward’s (2003) definition of direct causation the theory of causal Bayes nets. It could, for example, sim- and his definition of an intervention variable. In sec. 4 we ply be interpreted to suggest preferring the simplest causal give precise reconstructions of both definitions in terms of structure compatible with the given data among all compat- causal Bayes nets. We also provide a definition of the no- ible causal structures. The simplest causal structure could, tion of an intervention expansion, which is needed to ac- for instance, be the one (or one of the ones) featuring the count for direct causal relations in terms of the existence of fewest causal arrows. certain intervention variables. In sec. 5 we show that Oc- cam’s razor gives us Woodward’s definition of direct cau- In this paper, however, we are interested in a slightly dif- sation if CMC is assumed and the existence of suitable in- ferent application of Occam’s razor: Our interpretation of tervention variables is granted (theorem 2). In sec. 6 we Occam’s razor asserts that given a causal structure is com- go a step further and show how Occam’s razor allows us patible with the data, it should only be chosen if it satis- to account for direct causation Woodward style when only fies the causal minimality condition (Min) in the sense of stochastic intervention variables (cf. Korb et al., 2004, sec. Spirtes et al. (2000, p. 31), which requires that no causal 5) are available (theorem 3). We conclude in sec. 7. arrow in the structure can be omitted in such a way that the resulting substructure would still be compatible with the Note that though the main results of the present paper data. When speaking of a causal structure being compat- (i.e., theorems 2 and 3) can be used for causal discov- ible with the data, we have a causal structure and a prob- ery, the goal of this paper is not to provide a method for ability distribution satisfying the causal Markov condition uncovering direct causal connections among variables in (CMC) in mind. (For details, see sec. 5.) In the following, a set of variables V of interest. The goal of this paper applying Occam’s razor always means to assume that the is to establish a connection between Woodward’s (2003) causal minimality condition is satisfied. intervention-based notion of direct causation and the pres- ence of a causal arrow in a minimal causal Bayes net, which In this paper we give a motivation for Occam’s razor that can be interpreted as support for Occam’s razor. Because of ditional probabilistic (in)dependence between X and Y this, the present paper does not discuss the relation of theo- (In)Dep(X, Y ) is defined as (In)Dep(X, Y |∅). X, Y , rems 2 and 3 to results about causal discovery by means of and Z in definition 1 can be variables or sequences of interventions such as, e.g., (Eberhardt and Scheines, 2007) variables. When X, Y, Z, ... are sequences of variables, or (Nyberg and Korb, 2007). we write them in bold letters. We write also the values x, y, z, ... of sequences X, Y, Z, ... in bold letters. The set of values x of a sequence X of variables X1 , ..., Xn 2 NOTATION is val(X1 ) × ... × val(Xn ), where val(Xi ) is the set of Xi ’s possible values. We represent causal structures by graphs, i.e., by ordered pairs hV, Ei, where V is a set of variables and E is a binary relation on V (E ⊆ V × V). V’s elements are called the 3 WOODWARD’S DEFINITION OF graph’s “vertices” and E’s elements are called its “edges”. DIRECT CAUSATION “X → Y ” stands short for “hX, Y i ∈ E” and is interpreted as “X is a direct cause of Y in hV, Ei” or as “Y is a direct effect of X in hV, Ei”. P ar(Y ) is the set of all X ∈ V Woodward’s (2003) interventionist theory of causation with X → Y in hV, Ei. The elements of P ar(Y ) are aims to explicate direct causation w.r.t. a set of variables called Y ’s parents. We write “X – Y ” for “X → Y or V in terms of possible interventions. Woodward (2003, X ← Y ”. A path π : X – ... – Y is called a (causal) p. 98) provides the following definition of an intervention path connecting X and Y in hV, Ei. A causal path π is variable: called a directed causal path from X to Y if and only if (“iff” for short) it has the form X → ... → Y . X is called Definition 2 (IVW ) I is an intervention variable for X a cause of Y and Y an effect of X in that case. A causal with respect to Y if and only if I meets the following con- path π is called a common cause path iff it has the form ditions: X ← ... ← Z → ... → Y and no variable appears more I1. I causes X. often than once on π. Z is called a common cause of X I2. I acts as a switch for all the other variables that cause and Y lying on path π in that case. A variable Z lying on a X. That is, certain values of I are such that when I attains path π : X – ... → Z ← ... – Y is called a collider lying those values, X ceases to depend on the values of other on this path. A variable X is called exogenous iff no arrow variables that cause X and instead depends only on the is pointing at X; it is called endogenous otherwise. value taken by I. I3. Any directed path from I to Y [if there exists one] goes A graph hV, Ei is called a directed graph in case all edges through X [...]. in E are one-headed arrows “→”. It is called cyclic iff I4. I is (statistically) independent of any variable Z that it features a causal path of the form X → ... → X and causes Y and that is on a directed path that does not go acyclic otherwise. A causal structure hV, Ei together with through X. a probability distribution P over V is called a causal model hV, E, P i. P is intended to provide information about the (IVW ) is intended to single out those variables as interven- strengths of causal influences represented by the arrows in tion variables for X w.r.t. Y that allow for correct causal hV, Ei. A causal model hV, E, P i is called cyclic iff its inference according to Woodward’s (2003) definition of di- graph hV, Ei is cyclic; it is called acyclic otherwise. In rect causation. For I to be an intervention variable for X the following, we will only be interested in acyclic causal w.r.t. Y it is required that I is causally relevant to X (con- models. dition I1), that X is only under I’s influence when I = on We use the standard notions of (conditional) probabilistic (condition I2), and that a correlation between I and Y can dependence and independence: only be due to a directed causal path from I to Y going through X (conditions I3 and I4). For a detailed motivation Definition 1 (conditional probabilistic (in)dependence) of I1-I4, see (Woodward, 2003, sec. 3.1.4). For problems X and Y are probabilistically dependent conditional on Z with Woodward’s definitions, see (Gebharter and Schurz, iff there are X-, Y -, and Z-values x, y, and z, respectively, ms). such that P (x|y, z) 6= P (x|z) ∧ P (y, z) > 0. An intervention on X w.r.t. Y (from now on we refer to X X and Y are probabilistically independent conditional on as the intervention’s “target variable” and to Y as the “test Z iff X and Y are not probabilistically dependent condi- variable”) is then straightforwardly defined as an interven- tional on Z. tion variable I for X w.r.t. Y taking one of its on-values, which forces X to take a certain value x. We will call in- Probabilistic independence between X and Y conditional terventions whose on-values force X to take certain values on Z is abbreviated as “Indep(X, Y |Z)”, probabilistic x “deterministic interventions” (cf. Korb et al., 2004, sec. dependence is abbreviated as “Dep(X, Y |Z)”. Uncon- 5). Note that Woodward’s (2003) notion of an intervention is, Definition 4 (IV) IX ∈ V is an intervention variable for on the one hand, strong because it requires interventions X ∈ V w.r.t. Y ∈ V in a causal model hV, E, P i iff to be deterministic interventions. It is, on the other hand, (a) IX is exogenous and there is a path π : IX → X in weak in another respect: In contrast to structural or surgi- hV, Ei, cal interventions (cf. Eberhardt and Scheines, 2007, p. 984; (b) for every on-value of IX there is an X-value x such Pearl, 2009) Woodward’s interventions are allowed to be that P (x|IX = on) = 1 and Dep(x, IX = on|z) holds for direct causes of more than one variable as long as the in- every instantiation z of every Z ⊆ V\{IX , X}, tervention’s direct effects which are non-target variables do (c) all paths IX → ... → Y in hV, Ei have the form IX → not cause the test variable over a path not going through the ... → X → ... → Y , intervention’s target variable (intervention condition I3). (d) IX is independent from every variable C (in V or not in V) which causes Y over a path not going through X. Based on his notion of an intervention, Woodward (2003, p. 59) gives the following definition of direct causation w.r.t. a variable set V: Note that (IV) still allows for intervention variables IX that are common causes of their target variable X and other Definition 3 (DCW ) A necessary and sufficient condition variables in V. Condition (a) requires IX to be exogenous. for X to be a (type-level) direct cause of Y with respect to This is, though it is a typical assumption made for interven- a variable set V is that there be a possible intervention on tion variables, not explicit in Woodward’s (2003) original X that will change Y or the probability distribution of Y definition (IVW ). One problem that might arise for Wood- when one holds fixed at some value all other variables Zi ward’s account when not making this assumption is that IX in V. in a causal structure Y → IX → X may turn out to be an intervention variable for X w.r.t. Y . If Y then depends on (DCW ) neatly explicates direct causation w.r.t. a variable IX = on, (DCW ) would falsely determine X to be a cause set V in terms of possible interventions: X is a direct cause of Y (cf. Gebharter and Schurz, ms). IX → X in condi- of Y w.r.t. V if Y can be wiggled by wiggling X; and if tion (a) is a harmless simplification of I1. Condition (b) X is a direct cause of Y w.r.t. V, then there are possible captures Woodward’s requirement that interventions have interventions by whose means one can influence Y by ma- to be deterministic, from which I2 follows. X is assumed nipulating X.1 to be under full control of IX when IX is on. This does not only require that for every on-value of IX there is an Note that (DCW ) may be too strong because many domains X-value x such that P (x|IX = on) = 1, but also that involve variables one cannot control by deterministic inter- IX = on actually has an influence on x in every possible ventions. Scenarios of this kind include, for example, the context, i.e., under conditionalization on arbitrary instanti- decay of uranium or states of entangled systems in quantum ations z of all kinds of subsets Z of V\{IX , X}. Condition mechanics. The decay of uranium can only be probabilis- (c) directly mirrors I3. Condition (d) mirrors Woodward’s tically influenced, and any attempt to manipulate the state I4. Note that condition (d) requires reference to variables C of one of two entangled photons, for example, would de- possibly not contained in V (cf. Woodward, 2008, p. 202). stroy the entangled system. Glymour (2004) also considers variables for sex and race as not manipulable by means of If we want to account for direct causal connection in a intervention variables in the sense of (IVW ). causal model hV, E, P i by means of interventions, we have to add intervention variables to V. In other words: To avoid all problems that might arise for Woodward’s We have to expand hV, E, P i in a certain way. But how (2003) account due to variables that are not manipulable do we have to expand hV, E, P i? To answer this question, by deterministic interventions, we will reconstruct Wood- let us assume that we want to know whether X is a direct ward’s (DCW ) as a partial definition in sec. 4. In particular, cause of Y in the unmanipulated model hV, E, P i. Then we will define direct causation only for sets of variables V the manipulated model hV0 , E 0 , P 0 i will have to contain an for which suitable intervention variables exist. intervention variable IX for X w.r.t. Y and also interven- tion variables IZ for all Z ∈ V different from X and Y by 4 RECONSTRUCTING WOODWARD’S whose means these Z can be controlled. X is a direct cause DEFINITION of Y if IX has some on-values such that we can influence Y by manipulating X with IX = on when all IZ have taken certain on-values. On the other hand, to guarantee that X In this section we reconstruct Woodward’s (2003) defini- is not a direct cause of Y , we have to demonstrate that no tion of direct causation in terms of causal Bayes nets. The one of Y ’s values can be influenced by manipulating some reconstruction of (IVW ) is straightforward: X-value by some intervention. For establishing such a neg- 1 Note that Woodward (2003) does not require the intervention ative causal claim, we require an intervention variable IX variables I to be elements of the set of variables V containing the by whose means we can control every X-value x. (Oth- target variable X and the test variable Y . erwise it could be that Y depends only on X-values that are not correlated with IX -values; then IX = on would for X w.r.t. Y . Condition (d) explains how the manipu- have no probabilistic influence on Y , though X may be lated model’s associated probability distribution P 0 fits to a causal parent of Y .) In addition, we require for every the unmanipulated model’s distribution P . Condition (e) Z 6= X, Y an intervention variable IZ by whose means Z says that all values of intervention variables have to be re- can be forced to take every value z. (Otherwise it could alized by some individuals in the domain. be that we can bring about only such Z-value instantia- With help of this notion of an i-expansion we can now re- tions which screen X and Y off each other; then IX = on construct Woodward’s (2003) definition of direct causation. would have no probabilistic influence on Y when Z’s value As already mentioned, Woodward’s definition requires the is fixed by interventions, though X may be a causal parent existence of suitable intervention variables. Thus, we re- of Y .) construct (DCW ) as a partial definition whose if-condition In the unmanipulated model hV, E, P i, all interven- presupposes the required intervention variables: tion variables I are of f . In the manipulated model hV0 , E 0 , P 0 i, all intervention variables’ values are realized Definition 6 (DC) If there exist i-expansions hV0 , E 0 , P 0 i for some but not for all individuals in the domain. This of hV, E, P i w.r.t. Y ∈ V, then: X ∈ V is a direct move allows us to compute probabilities for variables in V cause of Y w.r.t. V iff Dep(Y, IX = on|IZ = on) holds when I = of f as well as probabilities for variables in V in some i-expansions hV0 , E 0 , P 0 i of hV, E, P i w.r.t. Y , for all combinations of on-value realizations of interven- where IX is an intervention variable for X w.r.t. Y in tion variables I, while the causal structure of the unmanip- hV0 , E 0 , P 0 i and IZ is the set of all intervention variables ulated model will be preserved in the manipulated model. in hV0 , E 0 , P 0 i different from IX . (Note that we deviate here from the typical “arrow break- (DC) mirrors Woodward’s definition restricted to cases in ing” representation of interventions in the literature which which the required intervention variables (more precisely: assumes that in the manipulated model all individuals get the required i-expansions) exist: In case Y can be proba- manipulated.) This amounts to the following notion of an bilistically influenced by manipulating X by means of an intervention expansion (“i-expansion” for short): intervention variable IX for X w.r.t. Y in one of these i- expansions, X is a direct cause of Y in the unmanipulated Definition 5 (intervention expansion) hV0 , E 0 , P 0 i is an model. And vice versa: In case X is a direct cause of Y intervention expansion of hV, E, P i w.r.t. Y ∈ V iff in the unmanipulated model, there will be an intervention (a) V0 = V∪V ˙ I , where VI contains for every X ∈ V variable IX for X w.r.t. Y in one of these i-expansions such different from Y an intervention variable IX w.r.t. Y (and that Y is probabilistically sensitive to IX = on. nothing else), (b) for all Zi , Zj ∈ V : Zi → Zj in E 0 iff Zi → Zj in E, In the next section we show that (DC) can account for all (c) for every X-value x of every X ∈ V different from direct causal dependencies in a causal model if suitable i- Y there is an on-value of the corresponding interven- expansions exist and CMC and Min are assumed to be sat- tion variable IX such that P 0 (x|IX = on) = 1 and isfied. Dep(x, IX = on|z) holds for every instantiation z of every Z ⊆ V\{IX , X}, 5 OCCAM’S RAZOR, DETERMINISTIC 0 (d) PI=off ↑ V = P, INTERVENTIONS, AND DIRECT (e) P (I = on), P 0 (I = off ) > 0. 0 CAUSATION I in conditions (d) and (e) is the set of all newly added in- 0 0 The theory of causal Bayes nets’ core axiom is the causal tervention variables I. PI=off ↑ V in (d) is PI=off := 0 0 Markov condition (CMC) (cf. Spirtes et al., 2000, p. 29): P (−|I = off ) restricted to V. Hence, “PI=off ↑ V = P ” 0 means that PI=off coincides with P on the value space Definition 7 (causal Markov condition) A causal model of variables in V. Condition (a) guarantees that the i- hV, E, P i satisfies the causal Markov condition iff every expansion contains all the intervention variables required X ∈ V is probabilistically independent of all its non- for testing for direct causal relationships in the sense of effects conditional on its causal parents. Woodward’s (2003) definition of direct causation. The as- sumption that VI contains only intervention variables for CMC is assumed to hold for causal models whose variable X w.r.t. Y is a harmless simplification. Thanks to condi- sets are causally sufficient. A variable set V is causally suf- tion (b), the manipulated model’s causal structure fits to the ficient iff every common cause C of variables X and Y in unmanipulated model’s causal structure. In particular, the V is also in V or takes the same value c for all individuals i-expansion is only allowed to introduce new causal arrows in the domain (cf. Spirtes et al., 2000, p. 22). From now on going from intervention variables to variables in V. Due we implicitly assume causal sufficiency, i.e., we only con- to condition (c), every X ∈ V different from Y can be sider causal models whose variable sets are causally suffi- fully controlled by means of an intervention variable IX cient. A finite causal model hV, E, P i satisfies the Markov con- Definition 11 (causal productivity condition) A causal dition iff P admits the following Markov factorization rel- model hV, E, P i satisfies the causal productivity condition ative to hV, Ei (cf. Pearl, 2009, p. 16): iff Dep(X, Y |P ar(Y )\{X}) holds for all X, Y ∈ V with Y X → Y in hV, Ei. P (X1 , ..., Xn ) = P (Xi |P ar(Xi )) (1) i Theorem 1 For every acyclic causal model hV, E, P i sat- isfying CMC, the causal minimality condition and the The conditional probabilities P (Xi |P ar(Xi )) are called causal productivity condition are equivalent. Xi ’s parameters. The equivalence of Min and Prod reveals the full content of For acyclic causal models, CMC is equivalent to the d- Min: In minimal causal models, no causal arrow is super- separation criterion (Verma, 1986; Pearl, 1988, pp. 119f): fluous, i.e., every causal arrow from X to Y is productive, meaning that it is responsible for some probabilistic depen- Definition 8 (d-separation criterion) hV, E, P i satisfies dence between X and Y (when the values of all other par- the d-separation criterion iff the following holds for all ents of Y are fixed). X, Y ∈ V and Z ⊆ V\{X, Y }: If X and Y are d- separated by Z in hV, Ei, then Indep(X, Y |Z). We can now prove the following theorem: Definition 9 (d-separation, d-connection) X ∈ V and Theorem 2 If hV, E, P i is an acyclic causal model and Y ∈ V are d-separated by Z ⊆ V\{X, Y } in hV, Ei iff for every Y ∈ V there is an i-expansion hV0 , E 0 , P 0 i of X and Y are not d-connected given Z in hV, Ei. hV, E, P i w.r.t. Y satisfying CMC and Min, then for all X, Y ∈ V (with X 6= Y ) the following two statements are X ∈ V and Y ∈ V are d-connected given Z ⊆ V\{X, Y } equivalent: in hV, Ei iff X and Y are connected by a path π in hV, Ei (i) X → Y in hV, Ei. such that no non-collider on π is in Z, while all colliders (ii) Dep(Y, IX = on|IZ = on) holds in some i-expansions on π are in Z or have an effect in Z. hV0 , E 0 , P 0 i of hV, E, P i w.r.t. Y , where IX is an interven- tion variable for X w.r.t. Y in hV0 , E 0 , P 0 i and IZ is the set The equivalence between CMC and the d-separation cri- of all intervention variables in hV0 , E 0 , P 0 i different from terion reveals the full content of CMC: If a causal model IX . satisfies CMC, then every (conditional) probabilistic inde- pendence can be explained by missing (conditional) causal Theorem 2 shows that direct causation a la Woodward connections, and every (conditional) probabilistic depen- (2003) coincides with the graph theoretical notion of direct dence can be explained by some existing (conditional) causation in systems hV, E, P i with i-expansions w.r.t. ev- causal connection. ery variable Y ∈ V satisfying CMC and Min. In particular, In case there is a path π between X and Y in hV, Ei such theorem 2 says the following: Assume we are interested in that no non-collider on π is in Z ⊆ V\{X, Y } and all col- a causal model hV, E, P i. Assume further that for every liders on π are in Z or have an effect in Z, π is said to be Y in V there is an i-expansion hV0 , E 0 , P 0 i of hV, E, P i activated by Z. We also say that X and Y are d-connected w.r.t. Y satisfying CMC and Min. This means (among given Z over path π in that case. If π is not activated by Z, other things) that for every pair of variables hX, Y i there is π is said to be blocked by Z. We also say that X and Y are at least one i-expansion with an intervention variable IX for d-separated by Z over path π in that case. X w.r.t. Y and intervention variables IZ for every Z ∈ V (different from X and Y ) w.r.t. Y by whose means one can Occam’s razor (as we understand it in this paper) dictates force the variables in V\{Y } to take any combination of to prefer from all those causal structures hV, Ei, which to- value realizations. Given this setup, theorem 2 tells us for gether with a given probability distribution P over V sat- every X and Y (with X 6= Y ) in V that X is a causal par- isfy CMC, the ones which also satisfy the causal minimal- ent of Y in hV, Ei iff Dep(Y, IX = on|IZ = on) holds in ity condition (Min): one of the presupposed i-expansions w.r.t. Y . Definition 10 (causal minimality condition) A causal model hV, E, P i satisfying CMC satisfies the causal 6 OCCAM’S RAZOR, STOCHASTIC minimality condition iff no model hV, E 0 , P i with E 0 ⊂ E INTERVENTIONS, AND DIRECT also satisfies CMC (cf. Spirtes et al., 2000, p. 31). CAUSATION For acyclic causal models satisfying CMC, the following In this section we generalize the main finding of sec. 5 to causal productivity condition (Prod) (cf. Schurz and Geb- cases in which only stochastic interventions are available. harter, forthcoming) can be seen as a reformulation of the To account for direct causal relations X → Y by means causal minimality condition: of stochastic intervention variables, two intervention vari- ables are needed, one for X and one for Y . (For details, than one direct effect, the second intervention variable IY see below.) We define a stochastic intervention variable as is assumed to be a causal parent only of Y . (This is required follows: for accounting for direct causal connections; for details see (i) ⇒ (ii) in the proof of theorem 3 in the appendix.) Definition 12 (IVS ) IX ∈ V is a stochastic intervention variable for X ∈ V w.r.t. Y ∈ V in hV, E, P i iff The second intervention variable IY is required to exclude (a) IX is exogenous and there is a path π : IX → X in independence between IX and Y due to a fine-tuning of hV, Ei, Y ’s parameters. Such an independence can arise even if (b) for every on-value of IX there is an X-value x such CMC and Min are satisfied, X is a causal parent of Y , that Dep(x, IX = on|z) holds for every instantiation z of and IX and Y are each correlated with the same X-values every Z ⊆ V\{IX , X}, x. For examples of this kind of non-faithfulness, see, e.g., (c) all paths IX → ... → Y in hV, Ei have the form IX → (Neapolitan, 2004, p. 96) or (Naeger, forthcoming). In con- ... → X → ... → Y , dition (c.2) we assume that every one of Y ’s parameters can (d) IX is independent from every variable C (in V or not be changed independently of all other Y -parameters (to a in V) which causes Y over a path not going through X. value r ∈ ]0, 1]) by changing IY ’s on-value. This suffices to exclude non-faithful independencies between IX and Y The only difference between (IVS ) and (IV) is condition of the kind described above. (b). For stochastic interventions it is not required that When not presupposing deterministic interventions, it can- IX = on determines X’s value to be x with probability not be guaranteed anymore that the value of every vari- 1. It suffices that IX = on and x are correlated conditional able in our model of interest different from the test variable on every value z of every Z ⊆ V\{IX , X}. This specific Y can be fixed by interventions. The values of a causal constraint guarantees that X can be influenced by IX = on model’s variables can, however, also be fixed by condition- under all circumstances, i.e., under all kinds of condition- alization. To account for direct causation between X and alization on instantiations of remainder variables in V. Y when only stochastic interventions are available, one has We do also have to modify our notion of an intervention ex- to conditionalize on a suitably chosen set Z ⊆ V\{X, Y } pansion in case we allow for stochastic interventions. We that (i) blocks all indirect causal paths between X and Y , define the following notion of a stochastic intervention ex- and that (ii) fixes all X-alternative parents of Y . That Z pansion: blocks all indirect paths between X and Y is required to assure that dependence between IX = on and Y cannot be Definition 13 (stochastic intervention expansion) due to an indirect path, and fixing the values of all parents hV0 , E 0 , P 0 i is a stochastic intervention expansion of of Y different from X is required to exclude independence hV, E, P i for X ∈ V w.r.t. Y ∈ V iff of IX = on and Y due to a fine-tuning of Y ’s X-alternative (a) V0 = V∪V ˙ I , where VI contains one stochastic parents that may cancel the influence of IX = on on Y over intervention variable IX for X w.r.t. Y and one stochastic a path IX → X → Y .2 Fortunately, every directed acyclic intervention variable IY for Y w.r.t. Y which is a parent graph hV, Ei features a set Z satisfying requirement (i), only of Y (and nothing else), viz. P ar(Y )\{X} (cf. Schurz and Gebharter, forthcom- (b) for all Zi , Zj ∈ V : Zi → Zj in E 0 iff Zi → Zj in E, ing). Trivially, P ar(Y )\{X} also satisfies requirement (c.1) for every X-value x there is an on-value of IX such (ii). that Dep(x, IX = on|z) holds for every instantiation z of With the help of (IVS ) and definition 13, we can now de- every Z ⊆ V0 \{IX , X}, fine direct causation in terms of stochastic interventions for (c.2) for every Y -value y, every instantiation r of P ar(Y ), models for which suitable stochastic i-expansions exist: and every on-value of IY there is an on-value on∗ of IY such that P 0 (y|IY = on∗ , r) 6= P 0 (y|IY = on, r), Definition 14 (DCS ) If there exist stochastic i-expansions P 0 (y|IY = on∗ , r) > 0, and P 0 (y|IY = on∗ , r∗ ) = hV0 , E 0 , P 0 i of hV, E, P i for X w.r.t. Y , then: X P 0 (y|IY = on, r∗ ) holds for all r∗ ∈ val(P ar(Y )) is a direct cause of Y w.r.t. V iff Dep(Y, IX = different from r, 0 on|P ar(Y )\{X}, IY = on) holds in some i-expansions (d) PI=off ↑ V = P, hV0 , E 0 , P 0 i of hV, E, P i for X w.r.t. Y , where IX (e) P (I = on), P 0 (I = off ) > 0. 0 is a stochastic intervention variable for X w.r.t. Y in This definition differs from the definition of a (non- hV0 , E 0 , P 0 i and IY is a stochastic intervention variable stochastic) i-expansion with respect to conditions (a) and for Y w.r.t. Y in hV0 , E 0 , P 0 i. (c): A stochastic i-expansion for X w.r.t. Y contains ex- Now the following theorem can be proven: actly two intervention variables, viz. one stochastic inter- vention variable IX for X w.r.t. Y and one stochastic inter- 2 For details on such cases of non-faithfulness due to com- vention variable IY for Y w.r.t. Y (which trivially satisfies pensating parents see (Schurz and Gebharter, forthcoming; Pearl, conditions (c) and (d) in (IVS )). While IX may have more 1988, p. 256). Theorem 3 If hV, E, P i is an acyclic causal model and Acknowledgements for every X, Y ∈ V (with X 6= Y ) there is a stochastic i-expansion hV0 , E 0 , P 0 i of hV, E, P i for X w.r.t. Y satis- This work was supported by DFG, research unit “Causa- fying CMC and Min, then for all X, Y ∈ V (with X 6= Y ) tion, Laws, Dispositions, Explanation” (FOR 1063). Our the following two statements are equivalent: thanks go to Frederick Eberhardt and Paul Naeger for im- (i) X → Y in hV, Ei. portant discussions, to two anonymous referees for helpful (ii) Dep(Y, IX = on|P ar(Y )\{X}, IY = on) holds in comments on an earlier version of the paper, and to Sebas- some i-expansions hV0 , E 0 , P 0 i of hV, E, P i for X w.r.t. tian Maaß for proofreading. Y , where IX is a stochastic intervention variable for X w.r.t. Y in hV0 , E 0 , P 0 i and IY is a stochastic intervention References variable for Y w.r.t. Y in hV0 , E 0 , P 0 i. F. Eberhardt, and R. Scheines (2007). Interventions and causal inference. Philosophy of Science 74(5):981-995. Theorem 3 shows that direct causation a la Woodward A. Gebharter, and G. Schurz (ms). Woodward’s interven- (2003) coincides with the graph theoretical notion of di- tionist theory of causation: Problems and proposed solu- rect causation in systems hV, E, P i with stochastic i- tions. expansions for every X ∈ V w.r.t. every Y ∈ V (with X 6= Y ) satisfying CMC and Min. In particular, theo- C. Glymour (2004). Critical notice. British Journal for the rem 3 says the following: Assume we are interested in Philosophy of Science 55(4):779-790. a causal model hV, E, P i. Assume further that for every K. B. Korb, L. R. Hope, A. E. Nicholson, and K. Axnick X, Y in V (with X 6= Y ) there is a stochastic i-expansion (2004). Varieties of causal intervention. In C. Zhang, H. W. hV0 , E 0 , P 0 i of hV, E, P i for X w.r.t. Y satisfying CMC Guesgen, W.-K. Yeap (eds.), Proceedings of the 8th Pacific and Min. This means (among other things) that for every Rim International Conference on AI 2004: Trends in Arti- pair of variables hX, Y i there is at least one stochastic i- ficial Intelligence, 322-331. Berlin: Springer. expansion featuring a stochastic intervention variable IX for X w.r.t. Y and a stochastic intervention variable IY for P. Naeger (forthcoming). The causal problem of entangle- Y w.r.t. Y . Given this setup, theorem 3 can account for ev- ment. Synthese. ery causal arrow between every X and Y (with X 6= Y ) R. Neapolitan (2004). Learning Bayesian Networks. Upper in V: It says that X is a causal parent of Y in hV, Ei iff Saddle River, NJ: Prentice Hall. Dep(Y, IX = on|P ar(Y )\{X}, IY = on) holds in some of the presupposed stochastic i-expansions for X w.r.t. Y . E. P. Nyberg, and K. B. Korb (2006). Informative interven- tions. Technical report 2006/204, Clayton School of Infor- mation Technology, Monash University, Melbourne. J. Pearl (1988). Probabilistic Reasoning in Expert Systems. 7 CONCLUSION San Mateo, MA: Morgan Kaufmann. J. Pearl (2009). Causality. Cambridge: Cambridge Univer- In this paper we investigated the consequences of assuming sity Press. a certain version of Occam’s razor. If one applies the razor G. Schurz, and A. Gebharter (forthcoming). Causality as in such a way to the theory of causal Bayes nets that it dic- a theoretical concept: Explanatory warrant and empirical tates to prefer only minimal causal models, one can show content of the theory of causal nets. Synthese. that Occam’s razor provides a neat definition of direct cau- sation. In particular, we demonstrated that one gets Wood- P. Spirtes, C. Glymour, and R. Scheines (2000). Causation, ward’s (2003) definition of direct causation translated into Prediction, and Search. Cambridge, MA: MIT Press. causal Bayes nets terminology and restricted to contexts in T. S. Verma (1986). Causal networks: Semantics and ex- which suitable i-expansions satisfying the causal Markov pressiveness. Technical report R-65, Cognitive Systems condition (CMC) exist. In the last section we showed how Laboratory, University of California, Los Angeles. Occam’s razor can be used to account for direct causal connections Woodward style even if no deterministic in- J. Woodward (2003). Making Things Happen. Oxford: Ox- terventions are available. These results can be seen as a ford University Press. motivation of Occam’s razor going beyond its merits as a J. Woodward (2008). Response to Strevens. Philosophy methodological principle: If one wants a nice and simple and Phenomenological Research 77(1):193-212. interventionist definition of direct causation in the sense of Woodward (or its stochastic counterpart developed in sec. J. Zhang, and P. Spirtes (2011). Intervention, determinism, 6), then it is reasonable to apply a version of Occam’s razor and the causal minimality condition. Synthese 182(3):335- that suggests to eliminate non-minimal causal models. 347. Appendix anteed by condition (c) in definition 5.) Then we have Dep(IX = on, x|IZ = on, r) ∧ Dep(x, y|IZ = on, r). The following proof of theorem 1 rests on the equivalence From the axiom of weak union (2) (cf. Pearl, 2009, p. 11), of CMC and the Markov factorization (1). It is, thus, re- which is probabilistically valid, we get (3) and (4) (in which stricted to finite causal structures. s = hx, ri is a value realization of P ar(Y )): Proof of theorem 1 Suppose hV, E, P i with V = Indep(X, Y W |Z) ⇒ Indep(X, Y |ZW ) (2) {X1 , ..., Xn } to be a finite acyclic causal model satisfying CMC. Indep(IX = on, s = hx, ri|IZ = on) ⇒ (3) Prod ⇒ Min: Assume that hV, E, P i does not satisfy Min, Indep(IX = on, x|IZ = on, r) meaning that there are X, Y ∈ V with X → Y in hV, Ei Indep(s = hx, ri, y|IZ = on) ⇒ such that hV, E 0 , P i, which results from deleting X → Y (4) from hV, Ei, still satisfies CMC. But then P ar(Y )\{X} Indep(x, y|IZ = on, r) d-separates X and Y in hV, E 0 i, and thus, the d-separation With the contrapositions of (3) and (4) it now follows criterion implies Indep(X, Y |P ar(Y )\{X}), which vio- that Dep(IX = on, s = hx, ri|IZ = on) ∧ Dep(s = lates Prod. hx, ri, y|IZ = on). Min ⇒ Prod: Assume that hV, E, P i satisfies Min, mean- We now show that Dep(IX = on, s|IZ = on) ∧ ing that there are no X, Y ∈ V with X → Y in hV, Ei Dep(s, y|IZ = on) and the d-separation criterion imply such that hV, E 0 , P i, which results from deleting X → Y Dep(IX = on, y|IZ = on). We define P ∗ (−) as from hV, Ei, still satisfies CMC. The latter is the case P 0 (−|IZ = on) and proceed as follows: iff (*) the parent set P ar(Y ) of every Y ∈ V (with P ar(Y ) 6= ∅) is minimal in the sense that removing one P ∗ (y|IX = on) = of Y ’s parents X from P ar(Y ) would make a differ- (5) X P ∗ (y|si , IX = on) · P ∗ (si |IX = on) ence for Y , meaning that P (y|x, P ar(Y )\{X} = r) 6= i P (y|P ar(Y )\{X} = r) holds for some X-values x, some Y -values y, and some instantiations r of P ar(Y )\{X}. Equation (5) is probabilistically valid. Because P ar(Y ) Otherwise P would admit the Markov factorization rela- blocks all paths between IX and Y , we get (6) from (5): tive to hV, Ei and relative to hV, E 0 i, meaning that also hV, E 0 , P i, which results from deleting X → Y from P ∗ (y|IX = on) = (6) X hV, Ei, would satisfy CMC. But then hV, E, P i would P ∗ (y|si ) · P ∗ (si |IX = on) not be minimal, which would contradict the assumption. i Now (*) entails that Dep(X, Y |P ar(Y )\{X}) holds for all X, Y ∈ V with X → Y , i.e., that hV, E, P i satisfies Since IX = on forces P ar(Y ) to take value s when Prod.  IZ = on, P ∗ (si |IX = on) = 1 in case si = s, and P ∗ (si |IX = on) = 0 otherwise. Thus, we get (7) from Proof of theorem 2 Assume hV, E, P i is an acyclic (6): causal model and for every Y ∈ V there is an i-expansion P ∗ (y|IX = on) = P ∗ (y|s) · 1 (7) hV0 , E 0 , P 0 i of hV, E, P i w.r.t. Y satisfying CMC and For reductio, let us assume that Indep(IX = Min. Let X and Y be arbitrarily chosen elements of V on, y|IZ = on), meaning that P ∗ (y|IX = on) = P ∗ (y). such that X 6= Y . But then we get (8) from (7): (i) ⇒ (ii): Suppose X → Y in hV, Ei. We assumed that there exists an i-expansion hV0 , E 0 , P 0 i of hV, E, P i w.r.t. P ∗ (y) = P ∗ (y|s) · 1 (8) Y satisfying CMC and Min. From condition (b) of defi- Equation (8) contradicts Dep(s, y|IZ = on) above. nition 5 it follows that X → Y in hV0 , E 0 i. Since Min Hence, Dep(IX = on, y|IZ = on) has to hold when is equivalent to Prod, X and Y are dependent when the Dep(IX = on, s|IZ = on) ∧ Dep(s, y|IZ = on) holds. values of all parents of Y different from X are fixed to Therefore, Dep(Y, IX = on|IZ = on). certain values, meaning that there will be an X-value x and a Y -value y such that Dep(x, y|P ar(Y )\{X} = r) (ii) ⇒ (i): Suppose hV0 , E 0 , P 0 i is one of the presupposed holds for an instantiation r of P ar(Y )\{X}. Now there i-expansions such that Dep(Y, IX = on|IZ = on) holds, will also be a value of IZ that fixes the set of all parents of where IX is an intervention variable for X w.r.t. Y in Y different from X to r. Let on be this IZ -value. Thus, hV0 , E 0 , P 0 i and IZ is the set of all intervention variables also Dep(x, y|IZ = on) and also Dep(x, y|IZ = on, r) in hV0 , E 0 , P 0 i different from IX . Then the d-separation will hold. Now let us assume that on is one of the IX - criterion implies that there must be a causal path π d- values which are correlated with x and which force X to connecting IX and Y . π cannot be a path featuring col- take value x. (The existence of such an IX -value is guar- liders, because IX and Y would be d-separated over such a path. π also cannot have the form IX ← ... – Y . This of hV, E, P i for X w.r.t. Y satisfying CMC and Min. is excluded by condition (a) in (IV). So π must have the From condition (b) of definition 13 it follows that X → form IX → ... – Y . Since π cannot feature colliders, Y in hV0 , E 0 i. Since Min is equivalent to Prod, π must be a directed path IX → ... → Y . Now either Dep(x, y|P ar(Y )\{X} = r, IY = on) holds for some X- (A) π goes through X, or (B) π does not go through X. values x, for some Y -values y, for some of IY ’s on-values (B) is excluded by condition (c) in (IV). Hence, (A) must on, and for some instantiations r of P ar(Y )\{X}. Now let be the case. If (A) is the case, then π is a directed path us assume that on is one of the IX -values which are corre- IX → ... → X → ... → Y going through X. Now there lated with x conditional on P ar(Y )\{X} = r, IY = on. are two possible cases: Either (i) at least one of the paths π (The existence of such an IX -value on is guaranteed by d-connecting IX and Y has the form IX → ... → X → Y , condition (c.1) in definition 13.) Then we have Dep(IX = or (ii) all paths π d-connecting IX and Y have the form on, x|r, IY = on) ∧ Dep(x, y|r, IY = on). IX → ... → X → ... → C → ... → Y . We now show that Dep(IX = on, x|r, IY = on) ∧ Assume (ii) is the case, i.e., all paths π d-connecting IX Dep(x, y|r, IY = on) together with IX → X → Y and and Y have the form IX → ... → X → ... → C → the d-separation criterion implies Dep(IX = on, y|r, IY = ... → Y . Let ri be an individual variable ranging over on). We define P ∗ (−) as P 0 (−|r) and proceed as follows: val(P ar(Y )). We define P ∗ (−) as P 0 (−|IZ = on) and proceed as follows: P ∗ (y|IX = on, IY = on) = X P ∗ (y|xi , IX = on, IY = on) · P ∗ (xi |IX = on, IY = on) P ∗ (y|IX = on) = i (9) X P ∗ (y|ri , IX = on) · P ∗ (ri |IX = on) (13) i X P ∗ (y|IY = on) = ∗ ∗ ∗ P (y) = P (y|ri ) · P (ri ) (10) X (14) P ∗ (y|xi , IY = on) · P ∗ (xi |IY = on) i i Equations (9) and (10) are probabilistically valid. Since Equations (13) and (14) are probabilistically valid. From IZ = on forces every non-intervention variable in V0 dif- IX → X → Y and (13) we get with the d-separation crite- ferent from X and Y to take a certain value, IZ = on will rion: also force P ar(Y ) to take a certain value r, meaning that P ∗ (ri ) = 1 in case ri = r, and that P ∗ (ri ) = 0 otherwise. P ∗ (y|IX = on, IY = on) = Since probabilities of 1 do not change after conditionaliza- X tion, we get P ∗ (ri |IX = on) = 1 in case ri = r, and P ∗ (y|xi , IY = on) · P ∗ (xi |IX = on, IY = on) i P ∗ (ri |IX = on) = 0 otherwise. Thus, we get (11) from (15) (9) and (12) from (10): Since IY is exogenous and a causal parent only of Y , X P ∗ (y|IX = on) = P ∗ (y|r, IX = on) · 1 (11) and IY are d-separated by IX , and thus, we get (16) from P ∗ (y) = P ∗ (y|r) · 1 (12) (15) with the d-separation criterion. Since IY and X are d-separated (by the empty set), we get (17) from (14) with Since P ar(Y ) blocks all paths between IX and Y , we get the d-separation criterion: P ∗ (y|r, IX = on) = P ∗ (y|r) with the d-separation cri- terion, and thus, we get P ∗ (y|IX = on) = P ∗ (y) with P ∗ (y|IX = on, IY = on) = (11) and (12). Thus, Indep(Y, IX = on|IZ = on) holds, X (16) P ∗ (y|xi , IY = on) · P ∗ (xi |IX = on) which contradicts the initial assumption that Dep(Y, IX = i on|IZ = on) holds. Therefore, (i) must be the case, i.e., there must be a path π d-connecting IX and Y that has the P ∗ (y|IY = on) = form IX → ... → X → Y . From hV0 , E 0 , P 0 i being an X (17) i-expansion of hV, E, P i it now follows that X → Y in P ∗ (y|xi , IY = on) · P ∗ (xi ) i hV, Ei.  Now either (A) P ∗ (y|IX = on, IY = on) 6= Proof of theorem 3 Assume hV, E, P i is an acyclic P ∗ (y|IY = on), or (B) P ∗ (y|IX = on, IY = on) = causal model and for every X, Y ∈ V (with X 6= Y ) there P ∗ (y|IY = on). If (A) is the case, then Dep(Y, IX = is a stochastic i-expansion hV0 , E 0 , P 0 i of hV, E, P i for X on|P ar(Y )\{X}, IY = on). w.r.t. Y satisfying CMC and Min. Let X and Y be arbitrar- If (B) is the case, then P ∗ (y|IX = on, IY = on) ily chosen elements of V such that X 6= Y . can only equal P ∗ (y|IY = on) due to a fine-tuning of (i) ⇒ (ii): Suppose X → Y in hV, Ei. We assumed P ∗ (xi |IY = on) and P ∗ (xi ) in equations (16) and (17), that there exists a stochastic i-expansion hV0 , E 0 , P 0 i respectively. We already know that X’s value x and IX = on are dependent conditional on P ar(Y )\{X} = dict the assumption of acyclicity. Hence, π must have the r, IY = on, meaning that P ∗ (x|IX = on, IY = on) 6= form IX → ... – X – ... – C → Y (where C and X are P ∗ (x|IY = on) holds. Since X and IY are d-separated possibly identical). Now either (i) C = X or (ii) C 6= X. by IX , P ∗ (x|IX = on, IY = on) = P ∗ (x|IX = on) If (ii) is the case, then C ∈ (P ar(Y )\{X}) ∪ {IY }, and holds. Since X and IY are d-separeted (by the empty thus, (P ar(Y )\{X}) ∪ {IY } blocks π. But then IX and set), P ∗ (x|IY = on) = P ∗ (x) holds. It follows that Y cannot be d-connected given (P ar(Y )\{X}) ∪ {IY } P ∗ (x|IX = on) 6= P ∗ (x) holds. So (i) P ∗ (x|IX = over path π. Hence, (i) must be the case. Then π has the on) > 0 or (ii) P ∗ (x) > 0. Thanks to condition (c.2) form IX → ... – X → Y and from hV0 , E 0 , P 0 i being a in definition 13, every one of the conditional probabili- stochastic i-expansion of hV, E, P i it follows that X → Y ties P ∗ (y|xi , IY = on) can be changed independently in hV, Ei.  by replacing “on” in “P ∗ (y|xi , IY = on)” by some IY - value “on∗ ” (with on∗ 6= on) such that P ∗ (y|xi , IY = on∗ ) > 0. Thus, in both cases ((i) and (ii)) it holds that P ∗ (y|x, IY = on∗ ) · P ∗ (x|IX = on∗ ) 6= P ∗ (y|x, IY = on∗ ) · P ∗ (x), while P ∗ (y|xi , IY = on∗ ) · P ∗ (xi |IX = on∗ ) = P ∗ (y|xi , IY = on∗ ) · P ∗ (xi ) holds for all xi 6= x. It follows that P ∗ (y|IX = on, IY = on∗ ) 6= P ∗ (y|IY = on∗ ). (ii) ⇒ (i): Suppose hV0 , E 0 , P 0 i is one of the above as- sumed stochastic i-expansions for X w.r.t. Y and that Dep(Y, IX = on|P ar(Y )\{X}, IY = on) holds in this stochastic i-expansion. The d-separation criterion and Dep(Y, IX = on|P ar(Y )\{X}, IY = on) imply that IX and Y are d-connected given (P ar(Y )\{X}) ∪ {IY } by a causal path π : IX – ... – Y . π cannot have the form IX ← ... – Y . This is excluded by condition (a) in (IVS ). Thus, π must have the form IX → ... – Y . Now either (A) π goes through X, or (B) π does not go through X. Suppose (B) is the case. Then, because of condition (c) in (IVS ), π cannot be a directed path IX → ... → Y . Thus, π must either (i) have the form IX → ... – C → Y (with a collider on π), or it (ii) must have the form IX → ... – C ← Y . If (i) is the case, then C must be in (P ar(Y )\{X}) ∪ {IY } (since C cannot be X). Hence, π would be blocked by (P ar(Y )\{X}) ∪ {IY } and, thus, would not d-connect IX and Y given (P ar(Y )\{X}) ∪ {IY }. Thus, (ii) must be the case. If (ii) is the case, then there has to be a col- lider C ∗ on π that either is C or that is an effect of C, and thus, also an effect of Y . But then IX and Y can only be d-connected given (P ar(Y )\{X}) ∪ {IY } over π if C ∗ is in (P ar(Y )\{X}) ∪ {IY } or has an effect in (P ar(Y )\{X}) ∪ {IY }. But this would mean that Y is a cause of Y , what is excluded by the initial assumption of acyclicity. Thus, (A) has to be the case. If (A) is the case, then π must have the form IX → ... – X – ... – Y . If π would have the form IX → ... – X – ... – C ← Y (where C and X are possi- bly identical), then there is at least one collider C ∗ ly- ing on π that is an effect of Y . For IX and Y to be d-connected given (P ar(Y )\{X}) ∪ {IY } over path π, (P ar(Y )\{X}) ∪ {IY } must activate π, meaning that C ∗ has to be in (P ar(Y )\{X}) ∪ {IY } or has to have an ef- fect in (P ar(Y )\{X}) ∪ {IY }. But then we would end up with a causal cycle Y → ... → Y , which would contra-