How Occam’s Razor Provides a Neat Definition of Direct Causation


                                       Alexander Gebharter & Gerhard Schurz
                                 Duesseldorf Center for Logic and Philosophy of Science
                                               University of Duesseldorf
                                                 Universitaetsstrasse 1
                                              40225 Duesseldorf, Germany


                        Abstract                               goes beyond its merits as a methodological principle dic-
                                                               tating that one should always decide in favor of minimal
     In this paper we show that the application of Oc-         causal models. In particular, we show that Occam’s ra-
     cam’s razor to the theory of causal Bayes nets            zor provides a neat definition of direct causal relatedness
     gives us a neat definition of direct causation. In        in the sense of Woodward (2003), provided suitable in-
     particular we show that Occam’s razor implies             tervention variables exist and CMC is satisfied. Note the
     Woodward’s (2003) definition of direct causa-             connection of this enterprise to Zhang and Spirtes’ (2011)
     tion, provided suitable intervention variables ex-        project. Zhang and Spirtes prove that CMC and an in-
     ist and the causal Markov condition (CMC) is              terventionist definition of direct causation a la Woodward
     satisfied. We also show how Occam’s razor can             (2003) together imply minimality. So Occam’s razor is
     account for direct causal relationships Woodward          well-motivated within a manipulationist framework such as
     style when only stochastic intervention variables         Woodward’s. We show, vice versa, that CMC and minimal-
     are available.                                            ity together imply Woodward’s definition of direct causa-
                                                               tion. So if one wants a neat definition of direct causation,
                                                               it is reasonable to apply Occam’s razor in the sense of as-
                                                               suming minimality.
1   INTRODUCTION
                                                               The paper is structured as follows: In sec. 2 we introduce
Occam’s razor is typically seen as a methodological prin-      the notation we use in subsequent sections. In sec. 3 we
ciple. There are many possible ways to apply the razor to      present Woodward’s (2003) definition of direct causation
the theory of causal Bayes nets. It could, for example, sim-   and his definition of an intervention variable. In sec. 4 we
ply be interpreted to suggest preferring the simplest causal   give precise reconstructions of both definitions in terms of
structure compatible with the given data among all compat-     causal Bayes nets. We also provide a definition of the no-
ible causal structures. The simplest causal structure could,   tion of an intervention expansion, which is needed to ac-
for instance, be the one (or one of the ones) featuring the    count for direct causal relations in terms of the existence of
fewest causal arrows.                                          certain intervention variables. In sec. 5 we show that Oc-
                                                               cam’s razor gives us Woodward’s definition of direct cau-
In this paper, however, we are interested in a slightly dif-
                                                               sation if CMC is assumed and the existence of suitable in-
ferent application of Occam’s razor: Our interpretation of
                                                               tervention variables is granted (theorem 2). In sec. 6 we
Occam’s razor asserts that given a causal structure is com-
                                                               go a step further and show how Occam’s razor allows us
patible with the data, it should only be chosen if it satis-
                                                               to account for direct causation Woodward style when only
fies the causal minimality condition (Min) in the sense of
                                                               stochastic intervention variables (cf. Korb et al., 2004, sec.
Spirtes et al. (2000, p. 31), which requires that no causal
                                                               5) are available (theorem 3). We conclude in sec. 7.
arrow in the structure can be omitted in such a way that the
resulting substructure would still be compatible with the      Note that though the main results of the present paper
data. When speaking of a causal structure being compat-        (i.e., theorems 2 and 3) can be used for causal discov-
ible with the data, we have a causal structure and a prob-     ery, the goal of this paper is not to provide a method for
ability distribution satisfying the causal Markov condition    uncovering direct causal connections among variables in
(CMC) in mind. (For details, see sec. 5.) In the following,    a set of variables V of interest. The goal of this paper
applying Occam’s razor always means to assume that the         is to establish a connection between Woodward’s (2003)
causal minimality condition is satisfied.                      intervention-based notion of direct causation and the pres-
                                                               ence of a causal arrow in a minimal causal Bayes net, which
In this paper we give a motivation for Occam’s razor that
can be interpreted as support for Occam’s razor. Because of      ditional probabilistic (in)dependence between X and Y
this, the present paper does not discuss the relation of theo-   (In)Dep(X, Y ) is defined as (In)Dep(X, Y |∅). X, Y ,
rems 2 and 3 to results about causal discovery by means of       and Z in definition 1 can be variables or sequences of
interventions such as, e.g., (Eberhardt and Scheines, 2007)      variables. When X, Y, Z, ... are sequences of variables,
or (Nyberg and Korb, 2007).                                      we write them in bold letters. We write also the values
                                                                 x, y, z, ... of sequences X, Y, Z, ... in bold letters. The
                                                                 set of values x of a sequence X of variables X1 , ..., Xn
2   NOTATION
                                                                 is val(X1 ) × ... × val(Xn ), where val(Xi ) is the set of
                                                                 Xi ’s possible values.
We represent causal structures by graphs, i.e., by ordered
pairs hV, Ei, where V is a set of variables and E is a binary
relation on V (E ⊆ V × V). V’s elements are called the           3   WOODWARD’S DEFINITION OF
graph’s “vertices” and E’s elements are called its “edges”.          DIRECT CAUSATION
“X → Y ” stands short for “hX, Y i ∈ E” and is interpreted
as “X is a direct cause of Y in hV, Ei” or as “Y is a direct
effect of X in hV, Ei”. P ar(Y ) is the set of all X ∈ V         Woodward’s (2003) interventionist theory of causation
with X → Y in hV, Ei. The elements of P ar(Y ) are               aims to explicate direct causation w.r.t. a set of variables
called Y ’s parents. We write “X – Y ” for “X → Y or             V in terms of possible interventions. Woodward (2003,
X ← Y ”. A path π : X – ... – Y is called a (causal)             p. 98) provides the following definition of an intervention
path connecting X and Y in hV, Ei. A causal path π is            variable:
called a directed causal path from X to Y if and only if
(“iff” for short) it has the form X → ... → Y . X is called      Definition 2 (IVW ) I is an intervention variable for X
a cause of Y and Y an effect of X in that case. A causal         with respect to Y if and only if I meets the following con-
path π is called a common cause path iff it has the form         ditions:
X ← ... ← Z → ... → Y and no variable appears more               I1. I causes X.
often than once on π. Z is called a common cause of X            I2. I acts as a switch for all the other variables that cause
and Y lying on path π in that case. A variable Z lying on a      X. That is, certain values of I are such that when I attains
path π : X – ... → Z ← ... – Y is called a collider lying        those values, X ceases to depend on the values of other
on this path. A variable X is called exogenous iff no arrow      variables that cause X and instead depends only on the
is pointing at X; it is called endogenous otherwise.             value taken by I.
                                                                 I3. Any directed path from I to Y [if there exists one] goes
A graph hV, Ei is called a directed graph in case all edges      through X [...].
in E are one-headed arrows “→”. It is called cyclic iff          I4. I is (statistically) independent of any variable Z that
it features a causal path of the form X → ... → X and            causes Y and that is on a directed path that does not go
acyclic otherwise. A causal structure hV, Ei together with       through X.
a probability distribution P over V is called a causal model
hV, E, P i. P is intended to provide information about the       (IVW ) is intended to single out those variables as interven-
strengths of causal influences represented by the arrows in      tion variables for X w.r.t. Y that allow for correct causal
hV, Ei. A causal model hV, E, P i is called cyclic iff its       inference according to Woodward’s (2003) definition of di-
graph hV, Ei is cyclic; it is called acyclic otherwise. In       rect causation. For I to be an intervention variable for X
the following, we will only be interested in acyclic causal      w.r.t. Y it is required that I is causally relevant to X (con-
models.                                                          dition I1), that X is only under I’s influence when I = on
We use the standard notions of (conditional) probabilistic       (condition I2), and that a correlation between I and Y can
dependence and independence:                                     only be due to a directed causal path from I to Y going
                                                                 through X (conditions I3 and I4). For a detailed motivation
Definition 1 (conditional probabilistic (in)dependence)          of I1-I4, see (Woodward, 2003, sec. 3.1.4). For problems
X and Y are probabilistically dependent conditional on Z         with Woodward’s definitions, see (Gebharter and Schurz,
iff there are X-, Y -, and Z-values x, y, and z, respectively,   ms).
such that P (x|y, z) 6= P (x|z) ∧ P (y, z) > 0.
                                                                 An intervention on X w.r.t. Y (from now on we refer to X
X and Y are probabilistically independent conditional on         as the intervention’s “target variable” and to Y as the “test
Z iff X and Y are not probabilistically dependent condi-         variable”) is then straightforwardly defined as an interven-
tional on Z.                                                     tion variable I for X w.r.t. Y taking one of its on-values,
                                                                 which forces X to take a certain value x. We will call in-
Probabilistic independence between X and Y conditional           terventions whose on-values force X to take certain values
on Z is abbreviated as “Indep(X, Y |Z)”, probabilistic           x “deterministic interventions” (cf. Korb et al., 2004, sec.
dependence is abbreviated as “Dep(X, Y |Z)”. Uncon-              5).
Note that Woodward’s (2003) notion of an intervention is,             Definition 4 (IV) IX ∈ V is an intervention variable for
on the one hand, strong because it requires interventions             X ∈ V w.r.t. Y ∈ V in a causal model hV, E, P i iff
to be deterministic interventions. It is, on the other hand,          (a) IX is exogenous and there is a path π : IX → X in
weak in another respect: In contrast to structural or surgi-          hV, Ei,
cal interventions (cf. Eberhardt and Scheines, 2007, p. 984;          (b) for every on-value of IX there is an X-value x such
Pearl, 2009) Woodward’s interventions are allowed to be               that P (x|IX = on) = 1 and Dep(x, IX = on|z) holds for
direct causes of more than one variable as long as the in-            every instantiation z of every Z ⊆ V\{IX , X},
tervention’s direct effects which are non-target variables do         (c) all paths IX → ... → Y in hV, Ei have the form IX →
not cause the test variable over a path not going through the         ... → X → ... → Y ,
intervention’s target variable (intervention condition I3).           (d) IX is independent from every variable C (in V or not
                                                                      in V) which causes Y over a path not going through X.
Based on his notion of an intervention, Woodward (2003, p.
59) gives the following definition of direct causation w.r.t.
a variable set V:                                                     Note that (IV) still allows for intervention variables IX that
                                                                      are common causes of their target variable X and other
Definition 3 (DCW ) A necessary and sufficient condition              variables in V. Condition (a) requires IX to be exogenous.
for X to be a (type-level) direct cause of Y with respect to          This is, though it is a typical assumption made for interven-
a variable set V is that there be a possible intervention on          tion variables, not explicit in Woodward’s (2003) original
X that will change Y or the probability distribution of Y             definition (IVW ). One problem that might arise for Wood-
when one holds fixed at some value all other variables Zi             ward’s account when not making this assumption is that IX
in V.                                                                 in a causal structure Y → IX → X may turn out to be an
                                                                      intervention variable for X w.r.t. Y . If Y then depends on
(DCW ) neatly explicates direct causation w.r.t. a variable           IX = on, (DCW ) would falsely determine X to be a cause
set V in terms of possible interventions: X is a direct cause         of Y (cf. Gebharter and Schurz, ms). IX → X in condi-
of Y w.r.t. V if Y can be wiggled by wiggling X; and if               tion (a) is a harmless simplification of I1. Condition (b)
X is a direct cause of Y w.r.t. V, then there are possible            captures Woodward’s requirement that interventions have
interventions by whose means one can influence Y by ma-               to be deterministic, from which I2 follows. X is assumed
nipulating X.1                                                        to be under full control of IX when IX is on. This does
                                                                      not only require that for every on-value of IX there is an
Note that (DCW ) may be too strong because many domains               X-value x such that P (x|IX = on) = 1, but also that
involve variables one cannot control by deterministic inter-          IX = on actually has an influence on x in every possible
ventions. Scenarios of this kind include, for example, the            context, i.e., under conditionalization on arbitrary instanti-
decay of uranium or states of entangled systems in quantum            ations z of all kinds of subsets Z of V\{IX , X}. Condition
mechanics. The decay of uranium can only be probabilis-               (c) directly mirrors I3. Condition (d) mirrors Woodward’s
tically influenced, and any attempt to manipulate the state           I4. Note that condition (d) requires reference to variables C
of one of two entangled photons, for example, would de-               possibly not contained in V (cf. Woodward, 2008, p. 202).
stroy the entangled system. Glymour (2004) also considers
variables for sex and race as not manipulable by means of             If we want to account for direct causal connection in a
intervention variables in the sense of (IVW ).                        causal model hV, E, P i by means of interventions, we
                                                                      have to add intervention variables to V. In other words:
To avoid all problems that might arise for Woodward’s                 We have to expand hV, E, P i in a certain way. But how
(2003) account due to variables that are not manipulable              do we have to expand hV, E, P i? To answer this question,
by deterministic interventions, we will reconstruct Wood-             let us assume that we want to know whether X is a direct
ward’s (DCW ) as a partial definition in sec. 4. In particular,       cause of Y in the unmanipulated model hV, E, P i. Then
we will define direct causation only for sets of variables V          the manipulated model hV0 , E 0 , P 0 i will have to contain an
for which suitable intervention variables exist.                      intervention variable IX for X w.r.t. Y and also interven-
                                                                      tion variables IZ for all Z ∈ V different from X and Y by
4       RECONSTRUCTING WOODWARD’S                                     whose means these Z can be controlled. X is a direct cause
        DEFINITION                                                    of Y if IX has some on-values such that we can influence Y
                                                                      by manipulating X with IX = on when all IZ have taken
                                                                      certain on-values. On the other hand, to guarantee that X
In this section we reconstruct Woodward’s (2003) defini-
                                                                      is not a direct cause of Y , we have to demonstrate that no
tion of direct causation in terms of causal Bayes nets. The
                                                                      one of Y ’s values can be influenced by manipulating some
reconstruction of (IVW ) is straightforward:
                                                                      X-value by some intervention. For establishing such a neg-
    1
     Note that Woodward (2003) does not require the intervention      ative causal claim, we require an intervention variable IX
variables I to be elements of the set of variables V containing the   by whose means we can control every X-value x. (Oth-
target variable X and the test variable Y .                           erwise it could be that Y depends only on X-values that
are not correlated with IX -values; then IX = on would               for X w.r.t. Y . Condition (d) explains how the manipu-
have no probabilistic influence on Y , though X may be               lated model’s associated probability distribution P 0 fits to
a causal parent of Y .) In addition, we require for every            the unmanipulated model’s distribution P . Condition (e)
Z 6= X, Y an intervention variable IZ by whose means Z               says that all values of intervention variables have to be re-
can be forced to take every value z. (Otherwise it could             alized by some individuals in the domain.
be that we can bring about only such Z-value instantia-
                                                                     With help of this notion of an i-expansion we can now re-
tions which screen X and Y off each other; then IX = on
                                                                     construct Woodward’s (2003) definition of direct causation.
would have no probabilistic influence on Y when Z’s value
                                                                     As already mentioned, Woodward’s definition requires the
is fixed by interventions, though X may be a causal parent
                                                                     existence of suitable intervention variables. Thus, we re-
of Y .)
                                                                     construct (DCW ) as a partial definition whose if-condition
In the unmanipulated model hV, E, P i, all interven-                 presupposes the required intervention variables:
tion variables I are of f . In the manipulated model
hV0 , E 0 , P 0 i, all intervention variables’ values are realized   Definition 6 (DC) If there exist i-expansions hV0 , E 0 , P 0 i
for some but not for all individuals in the domain. This             of hV, E, P i w.r.t. Y ∈ V, then: X ∈ V is a direct
move allows us to compute probabilities for variables in V           cause of Y w.r.t. V iff Dep(Y, IX = on|IZ = on) holds
when I = of f as well as probabilities for variables in V            in some i-expansions hV0 , E 0 , P 0 i of hV, E, P i w.r.t. Y ,
for all combinations of on-value realizations of interven-           where IX is an intervention variable for X w.r.t. Y in
tion variables I, while the causal structure of the unmanip-         hV0 , E 0 , P 0 i and IZ is the set of all intervention variables
ulated model will be preserved in the manipulated model.             in hV0 , E 0 , P 0 i different from IX .
(Note that we deviate here from the typical “arrow break-
                                                                     (DC) mirrors Woodward’s definition restricted to cases in
ing” representation of interventions in the literature which
                                                                     which the required intervention variables (more precisely:
assumes that in the manipulated model all individuals get
                                                                     the required i-expansions) exist: In case Y can be proba-
manipulated.) This amounts to the following notion of an
                                                                     bilistically influenced by manipulating X by means of an
intervention expansion (“i-expansion” for short):
                                                                     intervention variable IX for X w.r.t. Y in one of these i-
                                                                     expansions, X is a direct cause of Y in the unmanipulated
Definition 5 (intervention expansion) hV0 , E 0 , P 0 i is an        model. And vice versa: In case X is a direct cause of Y
intervention expansion of hV, E, P i w.r.t. Y ∈ V iff                in the unmanipulated model, there will be an intervention
(a) V0 = V∪V     ˙ I , where VI contains for every X ∈ V
                                                                     variable IX for X w.r.t. Y in one of these i-expansions such
different from Y an intervention variable IX w.r.t. Y (and           that Y is probabilistically sensitive to IX = on.
nothing else),
(b) for all Zi , Zj ∈ V : Zi → Zj in E 0 iff Zi → Zj in E,           In the next section we show that (DC) can account for all
(c) for every X-value x of every X ∈ V different from                direct causal dependencies in a causal model if suitable i-
Y there is an on-value of the corresponding interven-                expansions exist and CMC and Min are assumed to be sat-
tion variable IX such that P 0 (x|IX = on) = 1 and                   isfied.
Dep(x, IX = on|z) holds for every instantiation z of every
Z ⊆ V\{IX , X},                                                      5   OCCAM’S RAZOR, DETERMINISTIC
      0
(d) PI=off  ↑ V = P,
                                                                         INTERVENTIONS, AND DIRECT
(e) P (I = on), P 0 (I = off ) > 0.
      0
                                                                         CAUSATION
I in conditions (d) and (e) is the set of all newly added in-
                            0                        0               The theory of causal Bayes nets’ core axiom is the causal
tervention variables I. PI=off     ↑ V in (d) is PI=off     :=
  0                                           0                      Markov condition (CMC) (cf. Spirtes et al., 2000, p. 29):
P (−|I = off ) restricted to V. Hence, “PI=off ↑ V = P ”
               0
means that PI=off    coincides with P on the value space             Definition 7 (causal Markov condition) A causal model
of variables in V. Condition (a) guarantees that the i-              hV, E, P i satisfies the causal Markov condition iff every
expansion contains all the intervention variables required           X ∈ V is probabilistically independent of all its non-
for testing for direct causal relationships in the sense of          effects conditional on its causal parents.
Woodward’s (2003) definition of direct causation. The as-
sumption that VI contains only intervention variables for            CMC is assumed to hold for causal models whose variable
X w.r.t. Y is a harmless simplification. Thanks to condi-            sets are causally sufficient. A variable set V is causally suf-
tion (b), the manipulated model’s causal structure fits to the       ficient iff every common cause C of variables X and Y in
unmanipulated model’s causal structure. In particular, the           V is also in V or takes the same value c for all individuals
i-expansion is only allowed to introduce new causal arrows           in the domain (cf. Spirtes et al., 2000, p. 22). From now on
going from intervention variables to variables in V. Due             we implicitly assume causal sufficiency, i.e., we only con-
to condition (c), every X ∈ V different from Y can be                sider causal models whose variable sets are causally suffi-
fully controlled by means of an intervention variable IX             cient.
A finite causal model hV, E, P i satisfies the Markov con-      Definition 11 (causal productivity condition) A causal
dition iff P admits the following Markov factorization rel-     model hV, E, P i satisfies the causal productivity condition
ative to hV, Ei (cf. Pearl, 2009, p. 16):                       iff Dep(X, Y |P ar(Y )\{X}) holds for all X, Y ∈ V with
                                Y                               X → Y in hV, Ei.
          P (X1 , ..., Xn ) =       P (Xi |P ar(Xi ))    (1)
                                i                               Theorem 1 For every acyclic causal model hV, E, P i sat-
                                                                isfying CMC, the causal minimality condition and the
The conditional probabilities P (Xi |P ar(Xi )) are called      causal productivity condition are equivalent.
Xi ’s parameters.
                                                                The equivalence of Min and Prod reveals the full content of
For acyclic causal models, CMC is equivalent to the d-
                                                                Min: In minimal causal models, no causal arrow is super-
separation criterion (Verma, 1986; Pearl, 1988, pp. 119f):
                                                                fluous, i.e., every causal arrow from X to Y is productive,
                                                                meaning that it is responsible for some probabilistic depen-
Definition 8 (d-separation criterion) hV, E, P i satisfies
                                                                dence between X and Y (when the values of all other par-
the d-separation criterion iff the following holds for all
                                                                ents of Y are fixed).
X, Y ∈ V and Z ⊆ V\{X, Y }: If X and Y are d-
separated by Z in hV, Ei, then Indep(X, Y |Z).                  We can now prove the following theorem:

Definition 9 (d-separation, d-connection) X ∈ V and             Theorem 2 If hV, E, P i is an acyclic causal model and
Y ∈ V are d-separated by Z ⊆ V\{X, Y } in hV, Ei iff            for every Y ∈ V there is an i-expansion hV0 , E 0 , P 0 i of
X and Y are not d-connected given Z in hV, Ei.                  hV, E, P i w.r.t. Y satisfying CMC and Min, then for all
                                                                X, Y ∈ V (with X 6= Y ) the following two statements are
X ∈ V and Y ∈ V are d-connected given Z ⊆ V\{X, Y }             equivalent:
in hV, Ei iff X and Y are connected by a path π in hV, Ei       (i) X → Y in hV, Ei.
such that no non-collider on π is in Z, while all colliders     (ii) Dep(Y, IX = on|IZ = on) holds in some i-expansions
on π are in Z or have an effect in Z.                           hV0 , E 0 , P 0 i of hV, E, P i w.r.t. Y , where IX is an interven-
                                                                tion variable for X w.r.t. Y in hV0 , E 0 , P 0 i and IZ is the set
The equivalence between CMC and the d-separation cri-           of all intervention variables in hV0 , E 0 , P 0 i different from
terion reveals the full content of CMC: If a causal model       IX .
satisfies CMC, then every (conditional) probabilistic inde-
pendence can be explained by missing (conditional) causal       Theorem 2 shows that direct causation a la Woodward
connections, and every (conditional) probabilistic depen-       (2003) coincides with the graph theoretical notion of direct
dence can be explained by some existing (conditional)           causation in systems hV, E, P i with i-expansions w.r.t. ev-
causal connection.                                              ery variable Y ∈ V satisfying CMC and Min. In particular,
In case there is a path π between X and Y in hV, Ei such        theorem 2 says the following: Assume we are interested in
that no non-collider on π is in Z ⊆ V\{X, Y } and all col-      a causal model hV, E, P i. Assume further that for every
liders on π are in Z or have an effect in Z, π is said to be    Y in V there is an i-expansion hV0 , E 0 , P 0 i of hV, E, P i
activated by Z. We also say that X and Y are d-connected        w.r.t. Y satisfying CMC and Min. This means (among
given Z over path π in that case. If π is not activated by Z,   other things) that for every pair of variables hX, Y i there is
π is said to be blocked by Z. We also say that X and Y are      at least one i-expansion with an intervention variable IX for
d-separated by Z over path π in that case.                      X w.r.t. Y and intervention variables IZ for every Z ∈ V
                                                                (different from X and Y ) w.r.t. Y by whose means one can
Occam’s razor (as we understand it in this paper) dictates      force the variables in V\{Y } to take any combination of
to prefer from all those causal structures hV, Ei, which to-    value realizations. Given this setup, theorem 2 tells us for
gether with a given probability distribution P over V sat-      every X and Y (with X 6= Y ) in V that X is a causal par-
isfy CMC, the ones which also satisfy the causal minimal-       ent of Y in hV, Ei iff Dep(Y, IX = on|IZ = on) holds in
ity condition (Min):                                            one of the presupposed i-expansions w.r.t. Y .

Definition 10 (causal minimality condition) A         causal
model hV, E, P i satisfying CMC satisfies the causal            6    OCCAM’S RAZOR, STOCHASTIC
minimality condition iff no model hV, E 0 , P i with E 0 ⊂ E         INTERVENTIONS, AND DIRECT
also satisfies CMC (cf. Spirtes et al., 2000, p. 31).                CAUSATION

For acyclic causal models satisfying CMC, the following         In this section we generalize the main finding of sec. 5 to
causal productivity condition (Prod) (cf. Schurz and Geb-       cases in which only stochastic interventions are available.
harter, forthcoming) can be seen as a reformulation of the      To account for direct causal relations X → Y by means
causal minimality condition:                                    of stochastic intervention variables, two intervention vari-
ables are needed, one for X and one for Y . (For details,        than one direct effect, the second intervention variable IY
see below.) We define a stochastic intervention variable as      is assumed to be a causal parent only of Y . (This is required
follows:                                                         for accounting for direct causal connections; for details see
                                                                 (i) ⇒ (ii) in the proof of theorem 3 in the appendix.)
Definition 12 (IVS ) IX ∈ V is a stochastic intervention
variable for X ∈ V w.r.t. Y ∈ V in hV, E, P i iff                The second intervention variable IY is required to exclude
(a) IX is exogenous and there is a path π : IX → X in            independence between IX and Y due to a fine-tuning of
hV, Ei,                                                          Y ’s parameters. Such an independence can arise even if
(b) for every on-value of IX there is an X-value x such          CMC and Min are satisfied, X is a causal parent of Y ,
that Dep(x, IX = on|z) holds for every instantiation z of        and IX and Y are each correlated with the same X-values
every Z ⊆ V\{IX , X},                                            x. For examples of this kind of non-faithfulness, see, e.g.,
(c) all paths IX → ... → Y in hV, Ei have the form IX →          (Neapolitan, 2004, p. 96) or (Naeger, forthcoming). In con-
... → X → ... → Y ,                                              dition (c.2) we assume that every one of Y ’s parameters can
(d) IX is independent from every variable C (in V or not         be changed independently of all other Y -parameters (to a
in V) which causes Y over a path not going through X.            value r ∈ ]0, 1]) by changing IY ’s on-value. This suffices
                                                                 to exclude non-faithful independencies between IX and Y
The only difference between (IVS ) and (IV) is condition         of the kind described above.
(b). For stochastic interventions it is not required that        When not presupposing deterministic interventions, it can-
IX = on determines X’s value to be x with probability            not be guaranteed anymore that the value of every vari-
1. It suffices that IX = on and x are correlated conditional     able in our model of interest different from the test variable
on every value z of every Z ⊆ V\{IX , X}. This specific          Y can be fixed by interventions. The values of a causal
constraint guarantees that X can be influenced by IX = on        model’s variables can, however, also be fixed by condition-
under all circumstances, i.e., under all kinds of condition-     alization. To account for direct causation between X and
alization on instantiations of remainder variables in V.         Y when only stochastic interventions are available, one has
We do also have to modify our notion of an intervention ex-      to conditionalize on a suitably chosen set Z ⊆ V\{X, Y }
pansion in case we allow for stochastic interventions. We        that (i) blocks all indirect causal paths between X and Y ,
define the following notion of a stochastic intervention ex-     and that (ii) fixes all X-alternative parents of Y . That Z
pansion:                                                         blocks all indirect paths between X and Y is required to
                                                                 assure that dependence between IX = on and Y cannot be
Definition 13 (stochastic intervention expansion)                due to an indirect path, and fixing the values of all parents
hV0 , E 0 , P 0 i is a stochastic intervention expansion of      of Y different from X is required to exclude independence
hV, E, P i for X ∈ V w.r.t. Y ∈ V iff                            of IX = on and Y due to a fine-tuning of Y ’s X-alternative
(a) V0 = V∪V      ˙ I , where VI contains one stochastic         parents that may cancel the influence of IX = on on Y over
intervention variable IX for X w.r.t. Y and one stochastic       a path IX → X → Y .2 Fortunately, every directed acyclic
intervention variable IY for Y w.r.t. Y which is a parent        graph hV, Ei features a set Z satisfying requirement (i),
only of Y (and nothing else),                                    viz. P ar(Y )\{X} (cf. Schurz and Gebharter, forthcom-
(b) for all Zi , Zj ∈ V : Zi → Zj in E 0 iff Zi → Zj in E,       ing). Trivially, P ar(Y )\{X} also satisfies requirement
(c.1) for every X-value x there is an on-value of IX such        (ii).
that Dep(x, IX = on|z) holds for every instantiation z of
                                                                 With the help of (IVS ) and definition 13, we can now de-
every Z ⊆ V0 \{IX , X},
                                                                 fine direct causation in terms of stochastic interventions for
(c.2) for every Y -value y, every instantiation r of P ar(Y ),
                                                                 models for which suitable stochastic i-expansions exist:
and every on-value of IY there is an on-value on∗ of
IY such that P 0 (y|IY = on∗ , r) 6= P 0 (y|IY = on, r),
                                                                 Definition 14 (DCS ) If there exist stochastic i-expansions
P 0 (y|IY = on∗ , r) > 0, and P 0 (y|IY = on∗ , r∗ ) =
                                                                 hV0 , E 0 , P 0 i of hV, E, P i for X w.r.t. Y , then: X
P 0 (y|IY = on, r∗ ) holds for all r∗ ∈ val(P ar(Y ))
                                                                 is a direct cause of Y w.r.t. V iff Dep(Y, IX =
different from r,
       0                                                         on|P ar(Y )\{X}, IY = on) holds in some i-expansions
(d) PI=off    ↑ V = P,
                                                                 hV0 , E 0 , P 0 i of hV, E, P i for X w.r.t. Y , where IX
(e) P (I = on), P 0 (I = off ) > 0.
       0
                                                                 is a stochastic intervention variable for X w.r.t. Y in
This definition differs from the definition of a (non-           hV0 , E 0 , P 0 i and IY is a stochastic intervention variable
stochastic) i-expansion with respect to conditions (a) and       for Y w.r.t. Y in hV0 , E 0 , P 0 i.
(c): A stochastic i-expansion for X w.r.t. Y contains ex-
                                                                 Now the following theorem can be proven:
actly two intervention variables, viz. one stochastic inter-
vention variable IX for X w.r.t. Y and one stochastic inter-        2
                                                                     For details on such cases of non-faithfulness due to com-
vention variable IY for Y w.r.t. Y (which trivially satisfies    pensating parents see (Schurz and Gebharter, forthcoming; Pearl,
conditions (c) and (d) in (IVS )). While IX may have more        1988, p. 256).
Theorem 3 If hV, E, P i is an acyclic causal model and              Acknowledgements
for every X, Y ∈ V (with X 6= Y ) there is a stochastic
i-expansion hV0 , E 0 , P 0 i of hV, E, P i for X w.r.t. Y satis-   This work was supported by DFG, research unit “Causa-
fying CMC and Min, then for all X, Y ∈ V (with X 6= Y )             tion, Laws, Dispositions, Explanation” (FOR 1063). Our
the following two statements are equivalent:                        thanks go to Frederick Eberhardt and Paul Naeger for im-
(i) X → Y in hV, Ei.                                                portant discussions, to two anonymous referees for helpful
(ii) Dep(Y, IX = on|P ar(Y )\{X}, IY = on) holds in                 comments on an earlier version of the paper, and to Sebas-
some i-expansions hV0 , E 0 , P 0 i of hV, E, P i for X w.r.t.      tian Maaß for proofreading.
Y , where IX is a stochastic intervention variable for X
w.r.t. Y in hV0 , E 0 , P 0 i and IY is a stochastic intervention   References
variable for Y w.r.t. Y in hV0 , E 0 , P 0 i.
                                                                    F. Eberhardt, and R. Scheines (2007). Interventions and
                                                                    causal inference. Philosophy of Science 74(5):981-995.
Theorem 3 shows that direct causation a la Woodward
                                                                    A. Gebharter, and G. Schurz (ms). Woodward’s interven-
(2003) coincides with the graph theoretical notion of di-
                                                                    tionist theory of causation: Problems and proposed solu-
rect causation in systems hV, E, P i with stochastic i-
                                                                    tions.
expansions for every X ∈ V w.r.t. every Y ∈ V (with
X 6= Y ) satisfying CMC and Min. In particular, theo-               C. Glymour (2004). Critical notice. British Journal for the
rem 3 says the following: Assume we are interested in               Philosophy of Science 55(4):779-790.
a causal model hV, E, P i. Assume further that for every
                                                                    K. B. Korb, L. R. Hope, A. E. Nicholson, and K. Axnick
X, Y in V (with X 6= Y ) there is a stochastic i-expansion
                                                                    (2004). Varieties of causal intervention. In C. Zhang, H. W.
hV0 , E 0 , P 0 i of hV, E, P i for X w.r.t. Y satisfying CMC
                                                                    Guesgen, W.-K. Yeap (eds.), Proceedings of the 8th Pacific
and Min. This means (among other things) that for every
                                                                    Rim International Conference on AI 2004: Trends in Arti-
pair of variables hX, Y i there is at least one stochastic i-
                                                                    ficial Intelligence, 322-331. Berlin: Springer.
expansion featuring a stochastic intervention variable IX
for X w.r.t. Y and a stochastic intervention variable IY for        P. Naeger (forthcoming). The causal problem of entangle-
Y w.r.t. Y . Given this setup, theorem 3 can account for ev-        ment. Synthese.
ery causal arrow between every X and Y (with X 6= Y )
                                                                    R. Neapolitan (2004). Learning Bayesian Networks. Upper
in V: It says that X is a causal parent of Y in hV, Ei iff
                                                                    Saddle River, NJ: Prentice Hall.
Dep(Y, IX = on|P ar(Y )\{X}, IY = on) holds in some
of the presupposed stochastic i-expansions for X w.r.t. Y .         E. P. Nyberg, and K. B. Korb (2006). Informative interven-
                                                                    tions. Technical report 2006/204, Clayton School of Infor-
                                                                    mation Technology, Monash University, Melbourne.
                                                                    J. Pearl (1988). Probabilistic Reasoning in Expert Systems.
7   CONCLUSION                                                      San Mateo, MA: Morgan Kaufmann.
                                                                    J. Pearl (2009). Causality. Cambridge: Cambridge Univer-
In this paper we investigated the consequences of assuming          sity Press.
a certain version of Occam’s razor. If one applies the razor
                                                                    G. Schurz, and A. Gebharter (forthcoming). Causality as
in such a way to the theory of causal Bayes nets that it dic-
                                                                    a theoretical concept: Explanatory warrant and empirical
tates to prefer only minimal causal models, one can show
                                                                    content of the theory of causal nets. Synthese.
that Occam’s razor provides a neat definition of direct cau-
sation. In particular, we demonstrated that one gets Wood-          P. Spirtes, C. Glymour, and R. Scheines (2000). Causation,
ward’s (2003) definition of direct causation translated into        Prediction, and Search. Cambridge, MA: MIT Press.
causal Bayes nets terminology and restricted to contexts in
                                                                    T. S. Verma (1986). Causal networks: Semantics and ex-
which suitable i-expansions satisfying the causal Markov
                                                                    pressiveness. Technical report R-65, Cognitive Systems
condition (CMC) exist. In the last section we showed how
                                                                    Laboratory, University of California, Los Angeles.
Occam’s razor can be used to account for direct causal
connections Woodward style even if no deterministic in-             J. Woodward (2003). Making Things Happen. Oxford: Ox-
terventions are available. These results can be seen as a           ford University Press.
motivation of Occam’s razor going beyond its merits as a
                                                                    J. Woodward (2008). Response to Strevens. Philosophy
methodological principle: If one wants a nice and simple
                                                                    and Phenomenological Research 77(1):193-212.
interventionist definition of direct causation in the sense of
Woodward (or its stochastic counterpart developed in sec.           J. Zhang, and P. Spirtes (2011). Intervention, determinism,
6), then it is reasonable to apply a version of Occam’s razor       and the causal minimality condition. Synthese 182(3):335-
that suggests to eliminate non-minimal causal models.               347.
Appendix                                                             anteed by condition (c) in definition 5.) Then we have
                                                                     Dep(IX = on, x|IZ = on, r) ∧ Dep(x, y|IZ = on, r).
The following proof of theorem 1 rests on the equivalence            From the axiom of weak union (2) (cf. Pearl, 2009, p. 11),
of CMC and the Markov factorization (1). It is, thus, re-            which is probabilistically valid, we get (3) and (4) (in which
stricted to finite causal structures.                                s = hx, ri is a value realization of P ar(Y )):
Proof of theorem 1 Suppose hV, E, P i with V =                                Indep(X, Y W |Z) ⇒ Indep(X, Y |ZW )                 (2)
{X1 , ..., Xn } to be a finite acyclic causal model satisfying
CMC.                                                                          Indep(IX = on, s = hx, ri|IZ = on) ⇒
                                                                                                                                  (3)
Prod ⇒ Min: Assume that hV, E, P i does not satisfy Min,                      Indep(IX = on, x|IZ = on, r)
meaning that there are X, Y ∈ V with X → Y in hV, Ei                                Indep(s = hx, ri, y|IZ = on) ⇒
such that hV, E 0 , P i, which results from deleting X → Y                                                                        (4)
from hV, Ei, still satisfies CMC. But then P ar(Y )\{X}                             Indep(x, y|IZ = on, r)
d-separates X and Y in hV, E 0 i, and thus, the d-separation         With the contrapositions of (3) and (4) it now follows
criterion implies Indep(X, Y |P ar(Y )\{X}), which vio-              that Dep(IX = on, s = hx, ri|IZ = on) ∧ Dep(s =
lates Prod.                                                          hx, ri, y|IZ = on).
Min ⇒ Prod: Assume that hV, E, P i satisfies Min, mean-              We now show that Dep(IX = on, s|IZ = on) ∧
ing that there are no X, Y ∈ V with X → Y in hV, Ei                  Dep(s, y|IZ = on) and the d-separation criterion imply
such that hV, E 0 , P i, which results from deleting X → Y           Dep(IX = on, y|IZ = on). We define P ∗ (−) as
from hV, Ei, still satisfies CMC. The latter is the case             P 0 (−|IZ = on) and proceed as follows:
iff (*) the parent set P ar(Y ) of every Y ∈ V (with
P ar(Y ) 6= ∅) is minimal in the sense that removing one                      P ∗ (y|IX = on) =
of Y ’s parents X from P ar(Y ) would make a differ-                                                                              (5)
                                                                              X
                                                                                   P ∗ (y|si , IX = on) · P ∗ (si |IX = on)
ence for Y , meaning that P (y|x, P ar(Y )\{X} = r) 6=                          i
P (y|P ar(Y )\{X} = r) holds for some X-values x, some
Y -values y, and some instantiations r of P ar(Y )\{X}.              Equation (5) is probabilistically valid. Because P ar(Y )
Otherwise P would admit the Markov factorization rela-               blocks all paths between IX and Y , we get (6) from (5):
tive to hV, Ei and relative to hV, E 0 i, meaning that also
hV, E 0 , P i, which results from deleting X → Y from                               P ∗ (y|IX = on) =
                                                                                                                                  (6)
                                                                                    X
hV, Ei, would satisfy CMC. But then hV, E, P i would                                     P ∗ (y|si ) · P ∗ (si |IX = on)
not be minimal, which would contradict the assumption.                                i
Now (*) entails that Dep(X, Y |P ar(Y )\{X}) holds for
all X, Y ∈ V with X → Y , i.e., that hV, E, P i satisfies            Since IX = on forces P ar(Y ) to take value s when
Prod.                                                               IZ = on, P ∗ (si |IX = on) = 1 in case si = s, and
                                                                     P ∗ (si |IX = on) = 0 otherwise. Thus, we get (7) from
Proof of theorem 2 Assume hV, E, P i is an acyclic                   (6):
causal model and for every Y ∈ V there is an i-expansion                           P ∗ (y|IX = on) = P ∗ (y|s) · 1       (7)
hV0 , E 0 , P 0 i of hV, E, P i w.r.t. Y satisfying CMC and          For reductio, let us assume that Indep(IX            =
Min. Let X and Y be arbitrarily chosen elements of V                 on, y|IZ = on), meaning that P ∗ (y|IX = on) = P ∗ (y).
such that X 6= Y .                                                   But then we get (8) from (7):
(i) ⇒ (ii): Suppose X → Y in hV, Ei. We assumed that
there exists an i-expansion hV0 , E 0 , P 0 i of hV, E, P i w.r.t.                        P ∗ (y) = P ∗ (y|s) · 1                 (8)
Y satisfying CMC and Min. From condition (b) of defi-
                                                                     Equation (8) contradicts Dep(s, y|IZ = on) above.
nition 5 it follows that X → Y in hV0 , E 0 i. Since Min
                                                                     Hence, Dep(IX = on, y|IZ = on) has to hold when
is equivalent to Prod, X and Y are dependent when the
                                                                     Dep(IX = on, s|IZ = on) ∧ Dep(s, y|IZ = on) holds.
values of all parents of Y different from X are fixed to
                                                                     Therefore, Dep(Y, IX = on|IZ = on).
certain values, meaning that there will be an X-value x
and a Y -value y such that Dep(x, y|P ar(Y )\{X} = r)                (ii) ⇒ (i): Suppose hV0 , E 0 , P 0 i is one of the presupposed
holds for an instantiation r of P ar(Y )\{X}. Now there              i-expansions such that Dep(Y, IX = on|IZ = on) holds,
will also be a value of IZ that fixes the set of all parents of      where IX is an intervention variable for X w.r.t. Y in
Y different from X to r. Let on be this IZ -value. Thus,             hV0 , E 0 , P 0 i and IZ is the set of all intervention variables
also Dep(x, y|IZ = on) and also Dep(x, y|IZ = on, r)                 in hV0 , E 0 , P 0 i different from IX . Then the d-separation
will hold. Now let us assume that on is one of the IX -              criterion implies that there must be a causal path π d-
values which are correlated with x and which force X to              connecting IX and Y . π cannot be a path featuring col-
take value x. (The existence of such an IX -value is guar-           liders, because IX and Y would be d-separated over such
a path. π also cannot have the form IX ← ... – Y . This             of hV, E, P i for X w.r.t. Y satisfying CMC and Min.
is excluded by condition (a) in (IV). So π must have the            From condition (b) of definition 13 it follows that X →
form IX → ... – Y . Since π cannot feature colliders,               Y in hV0 , E 0 i.     Since Min is equivalent to Prod,
π must be a directed path IX → ... → Y . Now either                 Dep(x, y|P ar(Y )\{X} = r, IY = on) holds for some X-
(A) π goes through X, or (B) π does not go through X.               values x, for some Y -values y, for some of IY ’s on-values
(B) is excluded by condition (c) in (IV). Hence, (A) must           on, and for some instantiations r of P ar(Y )\{X}. Now let
be the case. If (A) is the case, then π is a directed path          us assume that on is one of the IX -values which are corre-
IX → ... → X → ... → Y going through X. Now there                   lated with x conditional on P ar(Y )\{X} = r, IY = on.
are two possible cases: Either (i) at least one of the paths π      (The existence of such an IX -value on is guaranteed by
d-connecting IX and Y has the form IX → ... → X → Y ,               condition (c.1) in definition 13.) Then we have Dep(IX =
or (ii) all paths π d-connecting IX and Y have the form             on, x|r, IY = on) ∧ Dep(x, y|r, IY = on).
IX → ... → X → ... → C → ... → Y .
                                                                    We now show that Dep(IX = on, x|r, IY = on) ∧
Assume (ii) is the case, i.e., all paths π d-connecting IX          Dep(x, y|r, IY = on) together with IX → X → Y and
and Y have the form IX → ... → X → ... → C →                        the d-separation criterion implies Dep(IX = on, y|r, IY =
... → Y . Let ri be an individual variable ranging over             on). We define P ∗ (−) as P 0 (−|r) and proceed as follows:
val(P ar(Y )). We define P ∗ (−) as P 0 (−|IZ = on) and
proceed as follows:                                                 P ∗ (y|IX = on, IY = on) =
                                                                    X
                                                                         P ∗ (y|xi , IX = on, IY = on) · P ∗ (xi |IX = on, IY = on)
         P ∗ (y|IX = on) =
                                                                     i
                                                             (9)
         X
              P ∗ (y|ri , IX = on) · P ∗ (ri |IX = on)                                                                        (13)
           i
                         X                                                    P ∗ (y|IY = on) =
                 ∗               ∗             ∗
               P (y) =         P (y|ri ) · P (ri )          (10)              X
                                                                                                                              (14)
                                                                                   P ∗ (y|xi , IY = on) · P ∗ (xi |IY = on)
                           i
                                                                                i
Equations (9) and (10) are probabilistically valid. Since
                                                                    Equations (13) and (14) are probabilistically valid. From
IZ = on forces every non-intervention variable in V0 dif-
                                                                    IX → X → Y and (13) we get with the d-separation crite-
ferent from X and Y to take a certain value, IZ = on will
                                                                    rion:
also force P ar(Y ) to take a certain value r, meaning that
P ∗ (ri ) = 1 in case ri = r, and that P ∗ (ri ) = 0 otherwise.          P ∗ (y|IX = on, IY = on) =
Since probabilities of 1 do not change after conditionaliza-             X
tion, we get P ∗ (ri |IX = on) = 1 in case ri = r, and                        P ∗ (y|xi , IY = on) · P ∗ (xi |IX = on, IY = on)
                                                                          i
P ∗ (ri |IX = on) = 0 otherwise. Thus, we get (11) from
                                                                                                                              (15)
(9) and (12) from (10):
                                                                    Since IY is exogenous and a causal parent only of Y , X
         P ∗ (y|IX = on) = P ∗ (y|r, IX = on) · 1           (11)
                                                                    and IY are d-separated by IX , and thus, we get (16) from
                     P ∗ (y) = P ∗ (y|r) · 1                (12)    (15) with the d-separation criterion. Since IY and X are
                                                                    d-separated (by the empty set), we get (17) from (14) with
Since P ar(Y ) blocks all paths between IX and Y , we get           the d-separation criterion:
P ∗ (y|r, IX = on) = P ∗ (y|r) with the d-separation cri-
terion, and thus, we get P ∗ (y|IX = on) = P ∗ (y) with                       P ∗ (y|IX = on, IY = on) =
(11) and (12). Thus, Indep(Y, IX = on|IZ = on) holds,                         X
                                                                                                                              (16)
                                                                                   P ∗ (y|xi , IY = on) · P ∗ (xi |IX = on)
which contradicts the initial assumption that Dep(Y, IX =
                                                                               i
on|IZ = on) holds. Therefore, (i) must be the case, i.e.,
there must be a path π d-connecting IX and Y that has the                            P ∗ (y|IY = on) =
form IX → ... → X → Y . From hV0 , E 0 , P 0 i being an                              X
                                                                                                                              (17)
i-expansion of hV, E, P i it now follows that X → Y in                                    P ∗ (y|xi , IY = on) · P ∗ (xi )
                                                                                       i
hV, Ei.                                                 
                                                                    Now either (A) P ∗ (y|IX = on, IY = on) 6=
Proof of theorem 3 Assume hV, E, P i is an acyclic                  P ∗ (y|IY = on), or (B) P ∗ (y|IX = on, IY = on) =
causal model and for every X, Y ∈ V (with X 6= Y ) there            P ∗ (y|IY = on). If (A) is the case, then Dep(Y, IX =
is a stochastic i-expansion hV0 , E 0 , P 0 i of hV, E, P i for X   on|P ar(Y )\{X}, IY = on).
w.r.t. Y satisfying CMC and Min. Let X and Y be arbitrar-
                                                                    If (B) is the case, then P ∗ (y|IX = on, IY = on)
ily chosen elements of V such that X 6= Y .
                                                                    can only equal P ∗ (y|IY = on) due to a fine-tuning of
(i) ⇒ (ii): Suppose X → Y in hV, Ei. We assumed                     P ∗ (xi |IY = on) and P ∗ (xi ) in equations (16) and (17),
that there exists a stochastic i-expansion hV0 , E 0 , P 0 i        respectively. We already know that X’s value x and
IX = on are dependent conditional on P ar(Y )\{X} =                 dict the assumption of acyclicity. Hence, π must have the
r, IY = on, meaning that P ∗ (x|IX = on, IY = on) 6=                form IX → ... – X – ... – C → Y (where C and X are
P ∗ (x|IY = on) holds. Since X and IY are d-separated               possibly identical). Now either (i) C = X or (ii) C 6= X.
by IX , P ∗ (x|IX = on, IY = on) = P ∗ (x|IX = on)                  If (ii) is the case, then C ∈ (P ar(Y )\{X}) ∪ {IY }, and
holds. Since X and IY are d-separeted (by the empty                 thus, (P ar(Y )\{X}) ∪ {IY } blocks π. But then IX and
set), P ∗ (x|IY = on) = P ∗ (x) holds. It follows that              Y cannot be d-connected given (P ar(Y )\{X}) ∪ {IY }
P ∗ (x|IX = on) 6= P ∗ (x) holds. So (i) P ∗ (x|IX =                over path π. Hence, (i) must be the case. Then π has the
on) > 0 or (ii) P ∗ (x) > 0. Thanks to condition (c.2)              form IX → ... – X → Y and from hV0 , E 0 , P 0 i being a
in definition 13, every one of the conditional probabili-           stochastic i-expansion of hV, E, P i it follows that X → Y
ties P ∗ (y|xi , IY = on) can be changed independently              in hV, Ei.                                               
by replacing “on” in “P ∗ (y|xi , IY = on)” by some IY -
value “on∗ ” (with on∗ 6= on) such that P ∗ (y|xi , IY =
on∗ ) > 0. Thus, in both cases ((i) and (ii)) it holds that
P ∗ (y|x, IY = on∗ ) · P ∗ (x|IX = on∗ ) 6= P ∗ (y|x, IY =
on∗ ) · P ∗ (x), while P ∗ (y|xi , IY = on∗ ) · P ∗ (xi |IX =
on∗ ) = P ∗ (y|xi , IY = on∗ ) · P ∗ (xi ) holds for all xi 6= x.
It follows that P ∗ (y|IX = on, IY = on∗ ) 6= P ∗ (y|IY =
on∗ ).
(ii) ⇒ (i): Suppose hV0 , E 0 , P 0 i is one of the above as-
sumed stochastic i-expansions for X w.r.t. Y and that
Dep(Y, IX = on|P ar(Y )\{X}, IY = on) holds in
this stochastic i-expansion. The d-separation criterion and
Dep(Y, IX = on|P ar(Y )\{X}, IY = on) imply that IX
and Y are d-connected given (P ar(Y )\{X}) ∪ {IY } by
a causal path π : IX – ... – Y . π cannot have the form
IX ← ... – Y . This is excluded by condition (a) in (IVS ).
Thus, π must have the form IX → ... – Y . Now either (A)
π goes through X, or (B) π does not go through X.
Suppose (B) is the case. Then, because of condition (c) in
(IVS ), π cannot be a directed path IX → ... → Y . Thus,
π must either (i) have the form IX → ... – C → Y (with a
collider on π), or it (ii) must have the form IX → ... – C ←
Y . If (i) is the case, then C must be in (P ar(Y )\{X}) ∪
{IY } (since C cannot be X). Hence, π would be blocked
by (P ar(Y )\{X}) ∪ {IY } and, thus, would not d-connect
IX and Y given (P ar(Y )\{X}) ∪ {IY }. Thus, (ii) must
be the case. If (ii) is the case, then there has to be a col-
lider C ∗ on π that either is C or that is an effect of C,
and thus, also an effect of Y . But then IX and Y can
only be d-connected given (P ar(Y )\{X}) ∪ {IY } over
π if C ∗ is in (P ar(Y )\{X}) ∪ {IY } or has an effect in
(P ar(Y )\{X}) ∪ {IY }. But this would mean that Y is a
cause of Y , what is excluded by the initial assumption of
acyclicity. Thus, (A) has to be the case.
If (A) is the case, then π must have the form IX →
... – X – ... – Y . If π would have the form IX →
... – X – ... – C ← Y (where C and X are possi-
bly identical), then there is at least one collider C ∗ ly-
ing on π that is an effect of Y . For IX and Y to be
d-connected given (P ar(Y )\{X}) ∪ {IY } over path π,
(P ar(Y )\{X}) ∪ {IY } must activate π, meaning that C ∗
has to be in (P ar(Y )\{X}) ∪ {IY } or has to have an ef-
fect in (P ar(Y )\{X}) ∪ {IY }. But then we would end up
with a causal cycle Y → ... → Y , which would contra-