INTRODUCTION

How Occam's Razor Provides a Neat Definition of Direct Causation

Alexander Gebharter

Gerhard Schurz

0 0 Duesseldorf Center for Logic and Philosophy of Science University of Duesseldorf Universitaetsstrasse 1 40225 Duesseldorf , Germany

In this paper we show that the application of Occam's razor to the theory of causal Bayes nets gives us a neat definition of direct causation. In particular we show that Occam's razor implies Woodward's (2003) definition of direct causation, provided suitable intervention variables exist and the causal Markov condition (CMC) is satisfied. We also show how Occam's razor can account for direct causal relationships Woodward style when only stochastic intervention variables are available.

INTRODUCTION

Occam’s razor is typically seen as a methodological principle. There are many possible ways to apply the razor to the theory of causal Bayes nets. It could, for example, simply be interpreted to suggest preferring the simplest causal structure compatible with the given data among all compatible causal structures. The simplest causal structure could, for instance, be the one (or one of the ones) featuring the fewest causal arrows.

In this paper, however, we are interested in a slightly different application of Occam’s razor: Our interpretation of Occam’s razor asserts that given a causal structure is compatible with the data, it should only be chosen if it satisfies the causal minimality condition (Min) in the sense of Spirtes et al. (2000, p. 31), which requires that no causal arrow in the structure can be omitted in such a way that the resulting substructure would still be compatible with the data. When speaking of a causal structure being compatible with the data, we have a causal structure and a probability distribution satisfying the causal Markov condition (CMC) in mind. (For details, see sec. 5.) In the following, applying Occam’s razor always means to assume that the causal minimality condition is satisfied.

In this paper we give a motivation for Occam’s razor that goes beyond its merits as a methodological principle dictating that one should always decide in favor of minimal causal models. In particular, we show that Occam’s razor provides a neat definition of direct causal relatedness in the sense of Woodward (2003), provided suitable intervention variables exist and CMC is satisfied. Note the connection of this enterprise to Zhang and Spirtes’ (2011) project. Zhang and Spirtes prove that CMC and an interventionist definition of direct causation a la Woodward (2003) together imply minimality. So Occam’s razor is well-motivated within a manipulationist framework such as Woodward’s. We show, vice versa, that CMC and minimality together imply Woodward’s definition of direct causation. So if one wants a neat definition of direct causation, it is reasonable to apply Occam’s razor in the sense of assuming minimality.

The paper is structured as follows: In sec. 2 we introduce the notation we use in subsequent sections. In sec. 3 we present Woodward’s (2003) definition of direct causation and his definition of an intervention variable. In sec. 4 we give precise reconstructions of both definitions in terms of causal Bayes nets. We also provide a definition of the notion of an intervention expansion, which is needed to account for direct causal relations in terms of the existence of certain intervention variables. In sec. 5 we show that Occam’s razor gives us Woodward’s definition of direct causation if CMC is assumed and the existence of suitable intervention variables is granted (theorem 2). In sec. 6 we go a step further and show how Occam’s razor allows us to account for direct causation Woodward style when only stochastic intervention variables ( cf. Korb et al., 2004 , sec. 5) are available (theorem 3). We conclude in sec. 7. Note that though the main results of the present paper (i.e., theorems 2 and 3) can be used for causal discovery, the goal of this paper is not to provide a method for uncovering direct causal connections among variables in a set of variables V of interest. The goal of this paper is to establish a connection between Woodward’s (2003) intervention-based notion of direct causation and the presence of a causal arrow in a minimal causal Bayes net, which can be interpreted as support for Occam’s razor. Because of this, the present paper does not discuss the relation of theorems 2 and 3 to results about causal discovery by means of interventions such as, e.g., (Eberhardt and Scheines, 2007) or (Nyberg and Korb, 2007) . 2

NOTATION

We represent causal structures by graphs, i.e., by ordered pairs hV; Ei, where V is a set of variables and E is a binary relation on V (E V V). V’s elements are called the graph’s “vertices” and E’s elements are called its “edges”. “X ! Y ” stands short for “hX; Y i 2 E” and is interpreted as “X is a direct cause of Y in hV; Ei” or as “Y is a direct effect of X in hV; Ei”. P ar(Y ) is the set of all X 2 V with X ! Y in hV; Ei. The elements of P ar(Y ) are called Y ’s parents. We write “X { Y ” for “X ! Y or X Y ”. A path : X { ::: { Y is called a (causal) path connecting X and Y in hV; Ei. A causal path is called a directed causal path from X to Y if and only if (“iff” for short) it has the form X ! ::: ! Y . X is called a cause of Y and Y an effect of X in that case. A causal path is called a common cause path iff it has the form X ::: Z ! ::: ! Y and no variable appears more often than once on . Z is called a common cause of X and Y lying on path in that case. A variable Z lying on a path : X { ::: ! Z ::: { Y is called a collider lying on this path. A variable X is called exogenous iff no arrow is pointing at X; it is called endogenous otherwise. A graph hV; Ei is called a directed graph in case all edges in E are one-headed arrows “!”. It is called cyclic iff it features a causal path of the form X ! ::: ! X and acyclic otherwise. A causal structure hV; Ei together with a probability distribution P over V is called a causal model hV; E; P i. P is intended to provide information about the strengths of causal influences represented by the arrows in hV; Ei. A causal model hV; E; P i is called cyclic iff its graph hV; Ei is cyclic; it is called acyclic otherwise. In the following, we will only be interested in acyclic causal models.

We use the standard notions of (conditional) probabilistic dependence and independence:

Definition 1 (conditional probabilistic (in)dependence)

X and Y are probabilistically dependent conditional on Z iff there are X-, Y -, and Z-values x, y, and z, respectively, such that P (xjy; z) 6= P (xjz) ^ P (y; z) > 0.

X and Y are probabilistically independent conditional on Z iff X and Y are not probabilistically dependent conditional on Z.

Probabilistic independence between X and Y conditional on Z is abbreviated as “Indep(X; Y jZ)”, probabilistic dependence is abbreviated as “Dep(X; Y jZ)”. Unconditional probabilistic (in)dependence between X and Y (In)Dep(X; Y ) is defined as (In)Dep(X; Y j;). X, Y , and Z in definition 1 can be variables or sequences of variables. When X; Y; Z; ::: are sequences of variables, we write them in bold letters. We write also the values x; y; z; ::: of sequences X; Y; Z; ::: in bold letters. The set of values x of a sequence X of variables X1; :::; Xn is val(X1) ::: val(Xn), where val(Xi) is the set of Xi’s possible values. 3

WOODWARD’S DEFINITION OF DIRECT CAUSATION

Woodward’s (2003) interventionist theory of causation aims to explicate direct causation w.r.t. a set of variables V in terms of possible interventions. Woodward (2003, p. 98) provides the following definition of an intervention variable: Definition 2 (IVW ) I is an intervention variable for X with respect to Y if and only if I meets the following conditions: I1. I causes X.

I2. I acts as a switch for all the other variables that cause X. That is, certain values of I are such that when I attains those values, X ceases to depend on the values of other variables that cause X and instead depends only on the value taken by I.

I3. Any directed path from I to Y [if there exists one] goes through X [...].

I4. I is (statistically) independent of any variable Z that causes Y and that is on a directed path that does not go through X. (IVW ) is intended to single out those variables as intervention variables for X w.r.t. Y that allow for correct causal inference according to Woodward’s (2003) definition of direct causation. For I to be an intervention variable for X w.r.t. Y it is required that I is causally relevant to X (condition I1), that X is only under I’s influence when I = on (condition I2), and that a correlation between I and Y can only be due to a directed causal path from I to Y going through X (conditions I3 and I4). For a detailed motivation of I1-I4, see (Woodward, 2003, sec. 3.1.4) . For problems with Woodward’s definitions, see (Gebharter and Schurz, ms).

An intervention on X w.r.t. Y (from now on we refer to X as the intervention’s “target variable” and to Y as the “test variable”) is then straightforwardly defined as an intervention variable I for X w.r.t. Y taking one of its on-values, which forces X to take a certain value x. We will call interventions whose on-values force X to take certain values x “deterministic interventions” ( cf. Korb et al., 2004 , sec. 5).

Note that Woodward’s (2003) notion of an intervention is, on the one hand, strong because it requires interventions to be deterministic interventions. It is, on the other hand, weak in another respect: In contrast to structural or surgical interventions (cf. Eberhardt and Scheines, 2007, p. 984; Pearl, 2009) Woodward’s interventions are allowed to be direct causes of more than one variable as long as the intervention’s direct effects which are non-target variables do not cause the test variable over a path not going through the intervention’s target variable (intervention condition I3). Based on his notion of an intervention, Woodward (2003, p. 59) gives the following definition of direct causation w.r.t. a variable set V: Definition 3 (DCW ) A necessary and sufficient condition for X to be a (type-level) direct cause of Y with respect to a variable set V is that there be a possible intervention on X that will change Y or the probability distribution of Y when one holds fixed at some value all other variables Zi in V. (DCW ) neatly explicates direct causation w.r.t. a variable set V in terms of possible interventions: X is a direct cause of Y w.r.t. V if Y can be wiggled by wiggling X; and if X is a direct cause of Y w.r.t. V, then there are possible interventions by whose means one can influence Y by manipulating X.1 Note that (DCW ) may be too strong because many domains involve variables one cannot control by deterministic interventions. Scenarios of this kind include, for example, the decay of uranium or states of entangled systems in quantum mechanics. The decay of uranium can only be probabilistically influenced, and any attempt to manipulate the state of one of two entangled photons, for example, would destroy the entangled system. Glymour (2004) also considers variables for sex and race as not manipulable by means of intervention variables in the sense of (IVW ).

To avoid all problems that might arise for Woodward’s (2003) account due to variables that are not manipulable by deterministic interventions, we will reconstruct Woodward’s (DCW ) as a partial definition in sec. 4. In particular, we will define direct causation only for sets of variables V for which suitable intervention variables exist. 4

RECONSTRUCTING WOODWARD’S DEFINITION

In this section we reconstruct Woodward’s (2003) definition of direct causation in terms of causal Bayes nets. The reconstruction of (IVW ) is straightforward:

1Note that Woodward (2003) does not require the intervention variables I to be elements of the set of variables V containing the target variable X and the test variable Y .

Definition 4 (IV) IX 2 V is an intervention variable for X 2 V w.r.t. Y 2 V in a causal model hV; E; P i iff (a) IX is exogenous and there is a path : IX ! X in hV; Ei, (b) for every on-value of IX there is an X-value x such that P (xjIX = on) = 1 and Dep(x; IX = onjz) holds for every instantiation z of every Z VnfIX ; Xg, (c) all paths IX ! ::: ! Y in hV; Ei have the form IX ! ::: ! X ! ::: ! Y , (d) IX is independent from every variable C (in V or not in V) which causes Y over a path not going through X. Note that (IV) still allows for intervention variables IX that are common causes of their target variable X and other variables in V. Condition (a) requires IX to be exogenous. This is, though it is a typical assumption made for intervention variables, not explicit in Woodward’s (2003) original definition (IVW ). One problem that might arise for Woodward’s account when not making this assumption is that IX in a causal structure Y ! IX ! X may turn out to be an intervention variable for X w.r.t. Y . If Y then depends on IX = on, (DCW ) would falsely determine X to be a cause of Y (cf. Gebharter and Schurz, ms). IX ! X in condition (a) is a harmless simplification of I1. Condition (b) captures Woodward’s requirement that interventions have to be deterministic, from which I2 follows. X is assumed to be under full control of IX when IX is on. This does not only require that for every on-value of IX there is an X-value x such that P (xjIX = on) = 1, but also that IX = on actually has an influence on x in every possible context, i.e., under conditionalization on arbitrary instantiations z of all kinds of subsets Z of VnfIX ; Xg. Condition (c) directly mirrors I3. Condition (d) mirrors Woodward’s I4. Note that condition (d) requires reference to variables C possibly not contained in V (cf. Woodward, 2008, p. 202) . If we want to account for direct causal connection in a causal model hV; E; P i by means of interventions, we have to add intervention variables to V. In other words: We have to expand hV; E; P i in a certain way. But how do we have to expand hV; E; P i? To answer this question, let us assume that we want to know whether X is a direct cause of Y in the unmanipulated model hV; E; P i. Then the manipulated model hV0; E0; P 0i will have to contain an intervention variable IX for X w.r.t. Y and also intervention variables IZ for all Z 2 V different from X and Y by whose means these Z can be controlled. X is a direct cause of Y if IX has some on-values such that we can influence Y by manipulating X with IX = on when all IZ have taken certain on-values. On the other hand, to guarantee that X is not a direct cause of Y , we have to demonstrate that no one of Y ’s values can be influenced by manipulating some X-value by some intervention. For establishing such a negative causal claim, we require an intervention variable IX by whose means we can control every X-value x. (Otherwise it could be that Y depends only on X-values that are not correlated with IX -values; then IX = on would have no probabilistic influence on Y , though X may be a causal parent of Y .) In addition, we require for every Z 6= X; Y an intervention variable IZ by whose means Z can be forced to take every value z. (Otherwise it could be that we can bring about only such Z-value instantiations which screen X and Y off each other; then IX = on would have no probabilistic influence on Y when Z’s value is fixed by interventions, though X may be a causal parent of Y .) In the unmanipulated model hV; E; P i, all intervention variables I are of f . In the manipulated model hV0; E0; P 0i, all intervention variables’ values are realized for some but not for all individuals in the domain. This move allows us to compute probabilities for variables in V when I = of f as well as probabilities for variables in V for all combinations of on-value realizations of intervention variables I, while the causal structure of the unmanipulated model will be preserved in the manipulated model. (Note that we deviate here from the typical “arrow breaking” representation of interventions in the literature which assumes that in the manipulated model all individuals get manipulated.) This amounts to the following notion of an intervention expansion (“i-expansion” for short): Definition 5 (intervention expansion) hV0; E0; P 0i is an intervention expansion of hV; E; P i w.r.t. Y 2 V iff (a) V0 = V[_ VI, where VI contains for every X 2 V different from Y an intervention variable IX w.r.t. Y (and nothing else), (b) for all Zi; Zj 2 V : Zi ! Zj in E0 iff Zi ! Zj in E, (c) for every X-value x of every X 2 V different from Y there is an on-value of the corresponding intervention variable IX such that P 0(xjIX = on) = 1 and Dep(x; IX = onjz) holds for every instantiation z of every Z VnfIX ; Xg, (d) PI0=o " V = P , (e) P 0(I = on); P 0(I = o ) > 0.

I in conditions (d) and (e) is the set of all newly added intervention variables I. PI0=o " V in (d) is PI0=o := P 0( jI = o ) restricted to V. Hence, “PI0=o " V = P ” means that PI0=o coincides with P on the value space of variables in V. Condition (a) guarantees that the iexpansion contains all the intervention variables required for testing for direct causal relationships in the sense of Woodward’s (2003) definition of direct causation. The assumption that VI contains only intervention variables for X w.r.t. Y is a harmless simplification. Thanks to condition (b), the manipulated model’s causal structure fits to the unmanipulated model’s causal structure. In particular, the i-expansion is only allowed to introduce new causal arrows going from intervention variables to variables in V. Due to condition (c), every X 2 V different from Y can be fully controlled by means of an intervention variable IX for X w.r.t. Y . Condition (d) explains how the manipulated model’s associated probability distribution P 0 fits to the unmanipulated model’s distribution P . Condition (e) says that all values of intervention variables have to be realized by some individuals in the domain.

With help of this notion of an i-expansion we can now reconstruct Woodward’s (2003) definition of direct causation. As already mentioned, Woodward’s definition requires the existence of suitable intervention variables. Thus, we reconstruct (DCW ) as a partial definition whose if-condition presupposes the required intervention variables: Definition 6 (DC) If there exist i-expansions hV0; E0; P 0i of hV; E; P i w.r.t. Y 2 V, then: X 2 V is a direct cause of Y w.r.t. V iff Dep(Y; IX = onjIZ = on) holds in some i-expansions hV0; E0; P 0i of hV; E; P i w.r.t. Y , where IX is an intervention variable for X w.r.t. Y in hV0; E0; P 0i and IZ is the set of all intervention variables in hV0; E0; P 0i different from IX . (DC) mirrors Woodward’s definition restricted to cases in which the required intervention variables (more precisely: the required i-expansions) exist: In case Y can be probabilistically influenced by manipulating X by means of an intervention variable IX for X w.r.t. Y in one of these iexpansions, X is a direct cause of Y in the unmanipulated model. And vice versa: In case X is a direct cause of Y in the unmanipulated model, there will be an intervention variable IX for X w.r.t. Y in one of these i-expansions such that Y is probabilistically sensitive to IX = on. In the next section we show that (DC) can account for all direct causal dependencies in a causal model if suitable iexpansions exist and CMC and Min are assumed to be satisfied. 5

OCCAM’S RAZOR, DETERMINISTIC INTERVENTIONS, AND DIRECT CAUSATION

The theory of causal Bayes nets’ core axiom is the causal Markov condition (CMC) (cf. Spirtes et al., 2000, p. 29) :

Definition 7 (causal Markov condition) A causal model

hV; E; P i satisfies the causal Markov condition iff every X 2 V is probabilistically independent of all its noneffects conditional on its causal parents.

CMC is assumed to hold for causal models whose variable sets are causally sufficient. A variable set V is causally sufficient iff every common cause C of variables X and Y in V is also in V or takes the same value c for all individuals in the domain (cf. Spirtes et al., 2000, p. 22) . From now on we implicitly assume causal sufficiency, i.e., we only consider causal models whose variable sets are causally sufficient.

A finite causal model hV; E; P i satisfies the Markov condition iff P admits the following Markov factorization relative to hV; Ei (cf. Pearl, 2009, p. 16) :

P (X1; :::; Xn) =

Y P (XijP ar(Xi)) i (1) The conditional probabilities P (XijP ar(Xi)) are called Xi’s parameters.

For acyclic causal models, CMC is equivalent to the dseparation criterion (Verma, 1986; Pearl, 1988, pp. 119f) :

Definition 8 (d-separation criterion) hV; E; P i satisfies

the d-separation criterion iff the following holds for all X; Y 2 V and Z VnfX; Y g: If X and Y are dseparated by Z in hV; Ei, then Indep(X; Y jZ).

Definition 9 (d-separation, d-connection) X 2 V and

Y 2 V are d-separated by Z VnfX; Y g in hV; Ei iff X and Y are not d-connected given Z in hV; Ei. X 2 V and Y 2 V are d-connected given Z VnfX; Y g in hV; Ei iff X and Y are connected by a path in hV; Ei such that no non-collider on is in Z, while all colliders on are in Z or have an effect in Z.

The equivalence between CMC and the d-separation criterion reveals the full content of CMC: If a causal model satisfies CMC, then every (conditional) probabilistic independence can be explained by missing (conditional) causal connections, and every (conditional) probabilistic dependence can be explained by some existing (conditional) causal connection.

In case there is a path between X and Y in hV; Ei such that no non-collider on is in Z VnfX; Y g and all colliders on are in Z or have an effect in Z, is said to be activated by Z. We also say that X and Y are d-connected given Z over path in that case. If is not activated by Z, is said to be blocked by Z. We also say that X and Y are d-separated by Z over path in that case.

Occam’s razor (as we understand it in this paper) dictates to prefer from all those causal structures hV; Ei, which together with a given probability distribution P over V satisfy CMC, the ones which also satisfy the causal minimality condition (Min):

Definition 10 (causal minimality condition) A causal

model hV; E; P i satisfying CMC satisfies the causal minimality condition iff no model hV; E0; P i with E0 E also satisfies CMC (cf. Spirtes et al., 2000, p. 31) .

Definition 11 (causal productivity condition) A causal

model hV; E; P i satisfies the causal productivity condition iff Dep(X; Y jP ar(Y )nfXg) holds for all X; Y 2 V with X ! Y in hV; Ei.

Theorem 1 For every acyclic causal model hV; E; P i satisfying CMC, the causal minimality condition and the causal productivity condition are equivalent.

The equivalence of Min and Prod reveals the full content of Min: In minimal causal models, no causal arrow is superfluous, i.e., every causal arrow from X to Y is productive, meaning that it is responsible for some probabilistic dependence between X and Y (when the values of all other parents of Y are fixed).

We can now prove the following theorem: Theorem 2 If hV; E; P i is an acyclic causal model and for every Y 2 V there is an i-expansion hV0; E0; P 0i of hV; E; P i w.r.t. Y satisfying CMC and Min, then for all X; Y 2 V (with X 6= Y ) the following two statements are equivalent: (i) X ! Y in hV; Ei. (ii) Dep(Y; IX = onjIZ = on) holds in some i-expansions hV0; E0; P 0i of hV; E; P i w.r.t. Y , where IX is an intervention variable for X w.r.t. Y in hV0; E0; P 0i and IZ is the set of all intervention variables in hV0; E0; P 0i different from IX .

Theorem 2 shows that direct causation a la Woodward (2003) coincides with the graph theoretical notion of direct causation in systems hV; E; P i with i-expansions w.r.t. every variable Y 2 V satisfying CMC and Min. In particular, theorem 2 says the following: Assume we are interested in a causal model hV; E; P i. Assume further that for every Y in V there is an i-expansion hV0; E0; P 0i of hV; E; P i w.r.t. Y satisfying CMC and Min. This means (among other things) that for every pair of variables hX; Y i there is at least one i-expansion with an intervention variable IX for X w.r.t. Y and intervention variables IZ for every Z 2 V (different from X and Y ) w.r.t. Y by whose means one can force the variables in VnfY g to take any combination of value realizations. Given this setup, theorem 2 tells us for every X and Y (with X 6= Y ) in V that X is a causal parent of Y in hV; Ei iff Dep(Y; IX = onjIZ = on) holds in one of the presupposed i-expansions w.r.t. Y . 6

OCCAM’S RAZOR, STOCHASTIC INTERVENTIONS, AND DIRECT CAUSATION

For acyclic causal models satisfying CMC, the following causal productivity condition (Prod) (cf. Schurz and Gebharter, forthcoming) can be seen as a reformulation of the causal minimality condition: In this section we generalize the main finding of sec. 5 to cases in which only stochastic interventions are available. To account for direct causal relations X ! Y by means of stochastic intervention variables, two intervention variables are needed, one for X and one for Y . (For details, see below.) We define a stochastic intervention variable as follows: Definition 12 (IVS ) IX 2 V is a stochastic intervention variable for X 2 V w.r.t. Y 2 V in hV; E; P i iff (a) IX is exogenous and there is a path : IX ! X in hV; Ei, (b) for every on-value of IX there is an X-value x such that Dep(x; IX = onjz) holds for every instantiation z of every Z VnfIX ; Xg, (c) all paths IX ! ::: ! Y in hV; Ei have the form IX ! ::: ! X ! ::: ! Y , (d) IX is independent from every variable C (in V or not in V) which causes Y over a path not going through X. The only difference between (IVS ) and (IV) is condition (b). For stochastic interventions it is not required that IX = on determines X’s value to be x with probability 1. It suffices that IX = on and x are correlated conditional on every value z of every Z VnfIX ; Xg. This specific constraint guarantees that X can be influenced by IX = on under all circumstances, i.e., under all kinds of conditionalization on instantiations of remainder variables in V. We do also have to modify our notion of an intervention expansion in case we allow for stochastic interventions. We define the following notion of a stochastic intervention expansion: Definition 13 (stochastic intervention expansion) hV0; E0; P 0i is a stochastic intervention expansion of hV; E; P i for X 2 V w.r.t. Y 2 V iff (a) V0 = V [_VI, where VI contains one stochastic intervention variable IX for X w.r.t. Y and one stochastic intervention variable IY for Y w.r.t. Y which is a parent only of Y (and nothing else), (b) for all Zi; Zj 2 V : Zi ! Zj in E0 iff Zi ! Zj in E, (c.1) for every X-value x there is an on-value of IX such that Dep(x; IX = onjz) holds for every instantiation z of every Z V0nfIX ; Xg, (c.2) for every Y -value y, every instantiation r of P ar(Y ), and every on-value of IY there is an on-value on of IY such that P 0(yjIY = on ; r) 6= P 0(yjIY = on; r), P 0(yjIY = on ; r) > 0, and P 0(yjIY = on ; r ) = P 0(yjIY = on; r ) holds for all r 2 val(P ar(Y )) different from r, (d) PI0=o " V = P , (e) P 0(I = on); P 0(I = o ) > 0.

This definition differs from the definition of a (nonstochastic) i-expansion with respect to conditions (a) and (c): A stochastic i-expansion for X w.r.t. Y contains exactly two intervention variables, viz. one stochastic intervention variable IX for X w.r.t. Y and one stochastic intervention variable IY for Y w.r.t. Y (which trivially satisfies conditions (c) and (d) in (IVS )). While IX may have more than one direct effect, the second intervention variable IY is assumed to be a causal parent only of Y . (This is required for accounting for direct causal connections; for details see (i) ) (ii) in the proof of theorem 3 in the appendix.) The second intervention variable IY is required to exclude independence between IX and Y due to a fine-tuning of Y ’s parameters. Such an independence can arise even if CMC and Min are satisfied, X is a causal parent of Y , and IX and Y are each correlated with the same X-values x. For examples of this kind of non-faithfulness, see, e.g., (Neapolitan, 2004, p. 96) or (Naeger, forthcoming). In condition (c.2) we assume that every one of Y ’s parameters can be changed independently of all other Y -parameters (to a value r 2 ]0; 1]) by changing IY ’s on-value. This suffices to exclude non-faithful independencies between IX and Y of the kind described above.

When not presupposing deterministic interventions, it cannot be guaranteed anymore that the value of every variable in our model of interest different from the test variable Y can be fixed by interventions. The values of a causal model’s variables can, however, also be fixed by conditionalization. To account for direct causation between X and Y when only stochastic interventions are available, one has to conditionalize on a suitably chosen set Z VnfX; Y g that (i) blocks all indirect causal paths between X and Y , and that (ii) fixes all X-alternative parents of Y . That Z blocks all indirect paths between X and Y is required to assure that dependence between IX = on and Y cannot be due to an indirect path, and fixing the values of all parents of Y different from X is required to exclude independence of IX = on and Y due to a fine-tuning of Y ’s X-alternative parents that may cancel the influence of IX = on on Y over a path IX ! X ! Y .2 Fortunately, every directed acyclic graph hV; Ei features a set Z satisfying requirement (i), viz. P ar(Y )nfXg (cf. Schurz and Gebharter, forthcoming). Trivially, P ar(Y )nfXg also satisfies requirement (ii).

With the help of (IVS ) and definition 13, we can now define direct causation in terms of stochastic interventions for models for which suitable stochastic i-expansions exist: Definition 14 (DCS ) If there exist stochastic i-expansions hV0; E0; P 0i of hV; E; P i for X w.r.t. Y , then: X is a direct cause of Y w.r.t. V iff Dep(Y; IX = onjP ar(Y )nfXg; IY = on) holds in some i-expansions hV0; E0; P 0i of hV; E; P i for X w.r.t. Y , where IX is a stochastic intervention variable for X w.r.t. Y in hV0; E0; P 0i and IY is a stochastic intervention variable for Y w.r.t. Y in hV0; E0; P 0i.

Now the following theorem can be proven:

2For details on such cases of non-faithfulness due to compensating parents see (Schurz and Gebharter, forthcoming; Pearl, 1988, p. 256) .

Theorem 3 If hV; E; P i is an acyclic causal model and for every X; Y 2 V (with X 6= Y ) there is a stochastic i-expansion hV0; E0; P 0i of hV; E; P i for X w.r.t. Y satisfying CMC and Min, then for all X; Y 2 V (with X 6= Y ) the following two statements are equivalent: (i) X ! Y in hV; Ei. (ii) Dep(Y; IX = onjP ar(Y )nfXg; IY = on) holds in some i-expansions hV0; E0; P 0i of hV; E; P i for X w.r.t. Y , where IX is a stochastic intervention variable for X w.r.t. Y in hV0; E0; P 0i and IY is a stochastic intervention variable for Y w.r.t. Y in hV0; E0; P 0i.

Theorem 3 shows that direct causation a la Woodward (2003) coincides with the graph theoretical notion of direct causation in systems hV; E; P i with stochastic iexpansions for every X 2 V w.r.t. every Y 2 V (with X 6= Y ) satisfying CMC and Min. In particular, theorem 3 says the following: Assume we are interested in a causal model hV; E; P i. Assume further that for every X; Y in V (with X 6= Y ) there is a stochastic i-expansion hV0; E0; P 0i of hV; E; P i for X w.r.t. Y satisfying CMC and Min. This means (among other things) that for every pair of variables hX; Y i there is at least one stochastic iexpansion featuring a stochastic intervention variable IX for X w.r.t. Y and a stochastic intervention variable IY for Y w.r.t. Y . Given this setup, theorem 3 can account for every causal arrow between every X and Y (with X 6= Y ) in V: It says that X is a causal parent of Y in hV; Ei iff Dep(Y; IX = onjP ar(Y )nfXg; IY = on) holds in some of the presupposed stochastic i-expansions for X w.r.t. Y . 7

CONCLUSION

In this paper we investigated the consequences of assuming a certain version of Occam’s razor. If one applies the razor in such a way to the theory of causal Bayes nets that it dictates to prefer only minimal causal models, one can show that Occam’s razor provides a neat definition of direct causation. In particular, we demonstrated that one gets Woodward’s (2003) definition of direct causation translated into causal Bayes nets terminology and restricted to contexts in which suitable i-expansions satisfying the causal Markov condition (CMC) exist. In the last section we showed how Occam’s razor can be used to account for direct causal connections Woodward style even if no deterministic interventions are available. These results can be seen as a motivation of Occam’s razor going beyond its merits as a methodological principle: If one wants a nice and simple interventionist definition of direct causation in the sense of Woodward (or its stochastic counterpart developed in sec. 6), then it is reasonable to apply a version of Occam’s razor that suggests to eliminate non-minimal causal models.

Acknowledgements

This work was supported by DFG, research unit “Causation, Laws, Dispositions, Explanation” (FOR 1063). Our thanks go to Frederick Eberhardt and Paul Naeger for important discussions, to two anonymous referees for helpful comments on an earlier version of the paper, and to Sebastian Maaß for proofreading.

Indep(X; Y W jZ) ) Indep(X; Y jZW ) Indep(IX = on; s = hx; rijIZ = on) ) Indep(IX = on; xjIZ = on; r)

Indep(s = hx; ri; yjIZ = on) )

Indep(x; yjIZ = on; r) With the contrapositions of (3) and (4) it now follows that Dep(IX = on; s = hx; rijIZ = on) ^ Dep(s = hx; ri; yjIZ = on).

We now show that Dep(IX = on; sjIZ = on) ^ Dep(s; yjIZ = on) and the d-separation criterion imply Dep(IX = on; yjIZ = on). We define P ( ) as P 0( jIZ = on) and proceed as follows: The following proof of theorem 1 rests on the equivalence of CMC and the Markov factorization (1). It is, thus, restricted to finite causal structures.

Proof of theorem 1 Suppose hV; E; P i with V = fX1; :::; Xng to be a finite acyclic causal model satisfying CMC.

Prod ) Min: Assume that hV; E; P i does not satisfy Min, meaning that there are X; Y 2 V with X ! Y in hV; Ei such that hV; E0; P i, which results from deleting X ! Y from hV; Ei, still satisfies CMC. But then P ar(Y )nfXg d-separates X and Y in hV; E0i, and thus, the d-separation criterion implies Indep(X; Y jP ar(Y )nfXg), which violates Prod.

Min ) Prod: Assume that hV; E; P i satisfies Min, meaning that there are no X; Y 2 V with X ! Y in hV; Ei such that hV; E0; P i, which results from deleting X ! Y from hV; Ei, still satisfies CMC. The latter is the case iff (*) the parent set P ar(Y ) of every Y 2 V (with P ar(Y ) 6= ;) is minimal in the sense that removing one of Y ’s parents X from P ar(Y ) would make a difference for Y , meaning that P (yjx; P ar(Y )nfXg = r) 6= P (yjP ar(Y )nfXg = r) holds for some X-values x, some Y -values y, and some instantiations r of P ar(Y )nfXg. Otherwise P would admit the Markov factorization relative to hV; Ei and relative to hV; E0i, meaning that also hV; E0; P i, which results from deleting X ! Y from hV; Ei, would satisfy CMC. But then hV; E; P i would not be minimal, which would contradict the assumption. Now (*) entails that Dep(X; Y jP ar(Y )nfXg) holds for all X; Y 2 V with X ! Y , i.e., that hV; E; P i satisfies Prod.

Proof of theorem 2 Assume hV; E; P i is an acyclic causal model and for every Y 2 V there is an i-expansion hV0; E0; P 0i of hV; E; P i w.r.t. Y satisfying CMC and Min. Let X and Y be arbitrarily chosen elements of V such that X 6= Y . (i) ) (ii): Suppose X ! Y in hV; Ei. We assumed that there exists an i-expansion hV0; E0; P 0i of hV; E; P i w.r.t. Y satisfying CMC and Min. From condition (b) of definition 5 it follows that X ! Y in hV0; E0i. Since Min is equivalent to Prod, X and Y are dependent when the values of all parents of Y different from X are fixed to certain values, meaning that there will be an X-value x and a Y -value y such that Dep(x; yjP ar(Y )nfXg = r) holds for an instantiation r of P ar(Y )nfXg. Now there will also be a value of IZ that fixes the set of all parents of Y different from X to r. Let on be this IZ-value. Thus, also Dep(x; yjIZ = on) and also Dep(x; yjIZ = on; r) will hold. Now let us assume that on is one of the IX values which are correlated with x and which force X to take value x. (The existence of such an IX -value is guar(2) (3) (4) (5) (6) (7) (8) Since IX = on forces P ar(Y ) to take value s when IZ = on, P (sijIX = on) = 1 in case si = s, and P (sijIX = on) = 0 otherwise. Thus, we get (7) from (6):

P (yjIX = on) = P (yjs) 1 For reductio, let us assume that Indep(IX = on; yjIZ = on), meaning that P (yjIX = on) = P (y). But then we get (8) from (7):

P (y) = P (yjs) 1 Equation (8) contradicts Dep(s; yjIZ = on) above. Hence, Dep(IX = on; yjIZ = on) has to hold when Dep(IX = on; sjIZ = on) ^ Dep(s; yjIZ = on) holds. Therefore, Dep(Y; IX = onjIZ = on). (ii) ) (i): Suppose hV0; E0; P 0i is one of the presupposed i-expansions such that Dep(Y; IX = onjIZ = on) holds, where IX is an intervention variable for X w.r.t. Y in hV0; E0; P 0i and IZ is the set of all intervention variables in hV0; E0; P 0i different from IX . Then the d-separation criterion implies that there must be a causal path dconnecting IX and Y . cannot be a path featuring colliders, because IX and Y would be d-separated over such Equation (5) is probabilistically valid. Because P ar(Y ) blocks all paths between IX and Y , we get (6) from (5): a path. also cannot have the form IX ::: { Y . This is excluded by condition (a) in (IV). So must have the form IX ! ::: { Y . Since cannot feature colliders, must be a directed path IX ! ::: ! Y . Now either (A) goes through X, or (B) does not go through X. (B) is excluded by condition (c) in (IV). Hence, (A) must be the case. If (A) is the case, then is a directed path IX ! ::: ! X ! ::: ! Y going through X. Now there are two possible cases: Either (i) at least one of the paths d-connecting IX and Y has the form IX ! ::: ! X ! Y , or (ii) all paths d-connecting IX and Y have the form IX ! ::: ! X ! ::: ! C ! ::: ! Y .

Assume (ii) is the case, i.e., all paths d-connecting IX and Y have the form IX ! ::: ! X ! ::: ! C ! ::: ! Y . Let ri be an individual variable ranging over val(P ar(Y )). We define P ( ) as P 0( jIZ = on) and proceed as follows: P (y) = X P (yjri) P (ri) (10)

i Equations (9) and (10) are probabilistically valid. Since IZ = on forces every non-intervention variable in V0 different from X and Y to take a certain value, IZ = on will also force P ar(Y ) to take a certain value r, meaning that P (ri) = 1 in case ri = r, and that P (ri) = 0 otherwise. Since probabilities of 1 do not change after conditionalization, we get P (rijIX = on) = 1 in case ri = r, and P (rijIX = on) = 0 otherwise. Thus, we get (11) from (9) and (12) from (10):

P (yjIX = on) = P (yjr; IX = on) 1

P (y) = P (yjr) 1 (11) (12) Since P ar(Y ) blocks all paths between IX and Y , we get P (yjr; IX = on) = P (yjr) with the d-separation criterion, and thus, we get P (yjIX = on) = P (y) with (11) and (12). Thus, Indep(Y; IX = onjIZ = on) holds, which contradicts the initial assumption that Dep(Y; IX = onjIZ = on) holds. Therefore, (i) must be the case, i.e., there must be a path d-connecting IX and Y that has the form IX ! ::: ! X ! Y . From hV0; E0; P 0i being an i-expansion of hV; E; P i it now follows that X ! Y in hV; Ei.

Proof of theorem 3 Assume hV; E; P i is an acyclic causal model and for every X; Y 2 V (with X 6= Y ) there is a stochastic i-expansion hV0; E0; P 0i of hV; E; P i for X w.r.t. Y satisfying CMC and Min. Let X and Y be arbitrarily chosen elements of V such that X 6= Y . (i) ) (ii): Suppose X ! Y in hV; Ei. We assumed that there exists a stochastic i-expansion hV0; E0; P 0i P (yjIX = on; IY = on) = X P (yjxi; IY = on) P (xijIX = on) i

P (yjIY = on) = X P (yjxi; IY = on) P (xi)

i Now either (A) P (yjIX = on; IY = on) 6= P (yjIY = on), or (B) P (yjIX = on; IY = on) = P (yjIY = on). If (A) is the case, then Dep(Y; IX = onjP ar(Y )nfXg; IY = on).

If (B) is the case, then P (yjIX = on; IY = on) can only equal P (yjIY = on) due to a fine-tuning of P (xijIY = on) and P (xi) in equations (16) and (17), respectively. We already know that X’s value x and of hV; E; P i for X w.r.t. Y satisfying CMC and Min. From condition (b) of definition 13 it follows that X ! Y in hV0; E0i. Since Min is equivalent to Prod, Dep(x; yjP ar(Y )nfXg = r; IY = on) holds for some Xvalues x, for some Y -values y, for some of IY ’s on-values on, and for some instantiations r of P ar(Y )nfXg. Now let us assume that on is one of the IX -values which are correlated with x conditional on P ar(Y )nfXg = r; IY = on. (The existence of such an IX -value on is guaranteed by condition (c.1) in definition 13.) Then we have Dep(IX = on; xjr; IY = on) ^ Dep(x; yjr; IY = on).

We now show that Dep(IX = on; xjr; IY = on) ^ Dep(x; yjr; IY = on) together with IX ! X ! Y and the d-separation criterion implies Dep(IX = on; yjr; IY = on). We define P ( ) as P 0( jr) and proceed as follows: P (yjIX = on; IY = on) = X P (yjxi; IX = on; IY = on) P (xijIX = on; IY = on) i (13) (14) (15) (16) (17) P (yjIY = on) = X P (yjxi; IY = on) P (xijIY = on)

i Equations (13) and (14) are probabilistically valid. From IX ! X ! Y and (13) we get with the d-separation criterion:

P (yjIX = on; IY = on) = X P (yjxi; IY = on) P (xijIX = on; IY = on) i Since IY is exogenous and a causal parent only of Y , X and IY are d-separated by IX , and thus, we get (16) from (15) with the d-separation criterion. Since IY and X are d-separated (by the empty set), we get (17) from (14) with the d-separation criterion: IX = on are dependent conditional on P ar(Y )nfXg = r; IY = on, meaning that P (xjIX = on; IY = on) 6= P (xjIY = on) holds. Since X and IY are d-separated by IX , P (xjIX = on; IY = on) = P (xjIX = on) holds. Since X and IY are d-separeted (by the empty set), P (xjIY = on) = P (x) holds. It follows that P (xjIX = on) 6= P (x) holds. So (i) P (xjIX = on) > 0 or (ii) P (x) > 0. Thanks to condition (c.2) in definition 13, every one of the conditional probabilities P (yjxi; IY = on) can be changed independently by replacing “on” in “P (yjxi; IY = on)” by some IY value “on ” (with on 6= on) such that P (yjxi; IY = on ) > 0. Thus, in both cases ((i) and (ii)) it holds that P (yjx; IY = on ) P (xjIX = on ) 6= P (yjx; IY = on ) P (x), while P (yjxi; IY = on ) P (xijIX = on ) = P (yjxi; IY = on ) P (xi) holds for all xi 6= x. It follows that P (yjIX = on; IY = on ) 6= P (yjIY = on ). (ii) ) (i): Suppose hV0; E0; P 0i is one of the above assumed stochastic i-expansions for X w.r.t. Y and that Dep(Y; IX = onjP ar(Y )nfXg; IY = on) holds in this stochastic i-expansion. The d-separation criterion and Dep(Y; IX = onjP ar(Y )nfXg; IY = on) imply that IX and Y are d-connected given (P ar(Y )nfXg) [ fIY g by a causal path : IX { ::: { Y . cannot have the form IX ::: { Y . This is excluded by condition (a) in (IVS ). Thus, must have the form IX ! ::: { Y . Now either (A) goes through X, or (B) does not go through X.

Suppose (B) is the case. Then, because of condition (c) in (IVS ), cannot be a directed path IX ! ::: ! Y . Thus, must either (i) have the form IX ! ::: { C ! Y (with a collider on ), or it (ii) must have the form IX ! ::: { C Y . If (i) is the case, then C must be in (P ar(Y )nfXg) [ fIY g (since C cannot be X). Hence, would be blocked by (P ar(Y )nfXg) [ fIY g and, thus, would not d-connect IX and Y given (P ar(Y )nfXg) [ fIY g. Thus, (ii) must be the case. If (ii) is the case, then there has to be a collider C on that either is C or that is an effect of C, and thus, also an effect of Y . But then IX and Y can only be d-connected given (P ar(Y )nfXg) [ fIY g over if C is in (P ar(Y )nfXg) [ fIY g or has an effect in (P ar(Y )nfXg) [ fIY g. But this would mean that Y is a cause of Y , what is excluded by the initial assumption of acyclicity. Thus, (A) has to be the case.

If (A) is the case, then must have the form IX ! ::: { X { ::: { Y . If would have the form IX ! ::: { X { ::: { C Y (where C and X are possibly identical), then there is at least one collider C lying on that is an effect of Y . For IX and Y to be d-connected given (P ar(Y )nfXg) [ fIY g over path , (P ar(Y )nfXg) [ fIY g must activate , meaning that C has to be in (P ar(Y )nfXg) [ fIY g or has to have an effect in (P ar(Y )nfXg) [ fIY g. But then we would end up with a causal cycle Y ! ::: ! Y , which would contradict the assumption of acyclicity. Hence, must have the form IX ! ::: { X { ::: { C ! Y (where C and X are possibly identical). Now either (i) C = X or (ii) C 6= X. If (ii) is the case, then C 2 (P ar(Y )nfXg) [ fIY g, and thus, (P ar(Y )nfXg) [ fIY g blocks . But then IX and Y cannot be d-connected given (P ar(Y )nfXg) [ fIY g over path . Hence, (i) must be the case. Then has the form IX ! ::: { X ! Y and from hV0; E0; P 0i being a stochastic i-expansion of hV; E; P i it follows that X ! Y in hV; Ei.

Eberhardt , and

Scheines ( 2007 ). Interventions and causal inference . Philosophy of Science 74 ( 5 ): 981 - 995 .

Glymour ( 2004 ). Critical notice . British Journal for the Philosophy of Science 55 ( 4 ): 779 - 790 .

K. B. Korb , L. R.

Hope , A. E.

Nicholson , and K.

Axnick ( 2004 ). Varieties of causal intervention . In C. Zhang, H. W.

Guesgen , W.-K. Yeap (eds.), Proceedings of the 8th Pacific Rim International Conference on AI 2004: Trends in Artificial Intelligence , 322 - 331 . Berlin: Springer.

Neapolitan ( 2004 ). Learning Bayesian Networks . Upper Saddle River, NJ: Prentice Hall.

E. P.

Nyberg , and K. B. Korb ( 2006 ). Informative interventions . Technical report 2006/204 , Clayton School of Information Technology, Monash University, Melbourne.

Pearl ( 1988 ). Probabilistic Reasoning in Expert Systems .

Pearl ( 2009 ). Causality. Cambridge: Cambridge University Press.

Spirtes ,

Glymour , and

Scheines ( 2000 ). Causation, Prediction, and Search. Cambridge, MA: MIT Press.

T. S.

Verma ( 1986 ). Causal networks: Semantics and expressiveness . Technical report R-65 , Cognitive Systems Laboratory, University of California, Los Angeles.

Woodward ( 2003 ). Making Things Happen . Oxford: Oxford University Press.

Woodward ( 2008 ). Response to Strevens. Philosophy and Phenomenological Research 77 ( 1 ): 193 - 212 .

Zhang , and

Spirtes ( 2011 ). Intervention, determinism, and the causal minimality condition . Synthese 182 ( 3 ): 335 - 347 .

anteed by condition (c) in definition 5.) Then we have Dep(IX = on; xjIZ = on ; r) ^ Dep(x; yjIZ = on; r).

From the axiom of weak union (2) (cf . Pearl, 2009 , p. 11 ), which is probabilistically valid, we get (3) and (4) (in which s = hx; ri is a value realization of P ar(

Y )):