Constructing Separators and
                                Adjustment Sets in Ancestral Graphs


            Benito van der Zander, Maciej Liśkiewicz                         Johannes Textor
                  Theoretical Computer Science                      Theoretical Biology & Bioinformatics
                 University of Lübeck, Germany                       Utrecht University, The Netherlands
              {benito,liskiewi}@tcs.uni-luebeck.de                        johannes.textor@gmx.de


                        Abstract                               tistical properties of e.g. front-door estimation is still con-
                                                               siderably lacking (VanderWeele, 2009; Glynn and Kashin,
     Ancestral graphs (AGs) are graphical causal               2013)1 . Unfortunately, the back-door criterion is not com-
     models that can represent uncertainty about the           plete, i.e., it does not find all possible options for covari-
     presence of latent confounders, and can be in-            ate adjustment that are allowed by a given graphical causal
     ferred from data. Here, we present an algo-               model.
     rithmic framework for efficiently testing, con-           In this paper, we aim to efficiently find a definitive an-
     structing, and enumerating m-separators in AGs.           swer for the following question: Given a causal graph G,
     Moreover, we present a new constructive crite-            which covariates Z do we need to adjust for to estimate the
     rion for covariate adjustment in directed acyclic         causal effect of the exposures X on the outcomes Y? To our
     graphs (DAGs) and maximal ancestral graphs                knowledge, no efficient algorithm has been shown to an-
     (MAGs) that characterizes adjustment sets as m-           swer this question, not even when G is a directed acyclic
     separators in a subgraph. Jointly, these results          graph (DAG), though constructive solutions do exist for
     allow to find all adjustment sets that can iden-          special cases like singleton X = {X} (Pearl, 2009), and a
     tify a desired causal effect with multivariate ex-        subclass of DAGs (Textor and Liśkiewicz, 2011). Here, we
     posures and outcomes in the presence of latent            provide algorithms for adjustment sets in DAGs as well as
     confounding. Our results generalize and improve           in maximal ancestral graphs (MAGs), which extend DAGs
     upon several existing solutions for special cases         allowing to account for unspecified latent variables. Our
     of these problems.                                        algorithms are guaranteed to find all valid adjustment sets
                                                               for a given DAG or MAG with polynomial delay, and we
                                                               also provide variants to list only those sets that minimize a
1   INTRODUCTION                                               user-supplied cost function or to quickly construct a sim-
                                                               ple adjustment set if one exists. Modelling multiple, pos-
Graphical causal models endow researchers with a lan-          sibly interrelated exposures X is important e.g. in case-
guage to codify assumptions about a data generating pro-       control studies that screen several putative causes of a dis-
cess (Pearl, 2009; Elwert, 2013). Using graphical criteria,    ease (Greenland, 1994). Likewise, the presence of unspeci-
one can asses whether the assumptions encoded in such a        fied latent variables often cannot be excluded in real-world
model allow estimation of a causal effect from observa-        settings, and the causal structure between the observed
tional data, which is a key issue in Epidemiology (Roth-       variables may not be completely known. We hope that
man et al., 2008), the Social Sciences (Elwert, 2013) and      the ability to quickly deduce from a given DAG or MAG
other fields where controlled experimentation is typically     whether and how covariate adjustment can render a causal
impossible. Specifically, the famous back-door criterion by    effect identifiable will benefit researchers in such areas.
Pearl (2009) can identify cases where causal effect identi-
                                                               We have two main contributions. First, in Section 3, we
fication is possible by standard covariate adjustment, and
                                                               present algorithms for verifying, constructing, and listing
other methods like the front-door criterion or do-calculus
                                                               m-separating sets in AGs. This subsumes a number of
can even permit identification even if the back-door crite-
                                                               earlier solutions for special cases of these problems, e.g.
rion fails (Pearl, 2009). In current practice, however, co-
variate adjustment is highly preferred to such alternatives       1
                                                                    Quoting VanderWeele (2009), “Time will perhaps tell
because its statistical properties are well understood, giv-   whether results like Pearl’s front-door path adjustment theorem
ing access to useful methodology like robust estimators and    and its generalizations are actually useful for epidemiologic re-
confidence intervals. In contrast, knowledge about the sta-    search or whether the results are simply of theoretical interest.”
the Bayes-Ball algorithm for verification of d-separating          from U to V, then U is called an anterior of V. All ances-
sets (Shachter, 1998), the use of network flow calculations        tors of V are anteriors of V. Every node is its own ancestor,
to find minimal d-separating sets in DAGs (Tian et al.,            descendant, and anterior. For a node set X, the set of all of
1998; Acid and de Campos, 2003), and an algorithm to               its ancestors is written as An(X). The descendant and ante-
list minimal adjustment sets for a certain subclass of DAGs        rior sets De(X), Ant(X) are analogously defined. Also, we
(Textor and Liśkiewicz, 2011). Our verification and con-          denote by Pa(X), (Ch(X)), the set of parents (children) of
struction algorithms for single separators are asymptoti-          X.
cally runtime-optimal. Although we apply our algorithms
only to adjustment set construction, they are likely useful in     m-Separation. A node V on a walk w is called a collider
other settings as separating sets are involved in most graph-      if two arrowheads of w meet at V, e.g. if w contains U ↔
ical criteria for causal effect identification. Moreover, the      V ← Q. There can be no collider if w is shorter than
separators themselves constitute statistically testable impli-     2. Two nodes U, V are called collider connected if there
cations of the causal assumptions encoded in the graph.            is a path between them on which all nodes except U and
Second, we give a graphical criterion that characterizes           V are colliders. Adjacent vertices are collider connected.
adjustment sets in terms of separating sets, and is sound          Two nodes U, V are called m-connected by a set Z if there
and complete for DAGs and MAGs without selection vari-             is a path π between them on which every node that is a
ables. This generalizes the sound and complete criterion           collider is in An(Z) and every node that is not a collider
for DAGs by Shpitser et al. (2010), and the sound but in-          is not in Z. Then π is called an m-connecting path. The
complete adjustment criterion for MAGs without selection           same definition can be stated simpler using walks: U, V are
variables by Maathuis and Colombo (2013). Our criterion            called m-connected by Z if there is a walk between them
exhaustively addresses adjustment set construction in the          on which all colliders and only colliders are in Z. If U, V
presence of latent covariates and with incomplete knowl-           are m-connected by the empty set, we simply say they are
edge of causal structure if at least a MAG can be specified.       m-connected. If U, V are not m-connected by Z, we say
We give the criterion separately for DAGs (Section 4) and          that Z m-separates them or blocks all paths between them.
MAGs (Section 5) because the same graph usually admits             Two node sets X, Y are m-separated by Z if all their nodes
more adjustment options if viewed as a DAG than if viewed          are pairwise m-separated by Z. In DAGs, m-separation is
as a MAG.                                                          equivalent to the well-known d-separation criterion (Pearl,
                                                                   2009).

2   PRELIMINARIES                                                  Ancestral graphs and DAGs. A mixed graph G = (V, E)
                                                                   is called an ancestral graph (AG) if the following two con-
We denote sets by bold upper case letters (S), and some-           ditions hold: (1) For each edge A ← B or A ↔ B, A is
times abbreviate singleton sets as {S} = S. Graphs are writ-       not an ancestor of B. (2) For each edge A − B, there are no
ten calligraphically (G), and variables in upper-case (X).         edges A ← C, A ↔ C, B ← C or B ↔ C. There can be
                                                                   at most one edge between two nodes in an AG (Richard-
Mixed graphs and paths. We consider mixed graphs                   son and Spirtes, 2002). Syntactically, all DAGs are AGs
G = (V, E) with nodes (vertices, variables) V and directed         and all AGs containing only directed edges are DAGs. An
(A → B), undirected (A−B), and bidirected (A ↔ B) edges            AG G = (V, E) is a maximal ancestral graph (MAG) if
E. Nodes linked by an edge are adjacent. A walk of length          every non-adjacent pair of nodes U, V can be m-separated
n is a node sequence V1 , . . . , Vn+1 such that there exists an   by some Z ⊆ V \ {U, V}. Every AG G can be turned into
edge sequence E1 , E2 , . . . , En for which every edge Ei con-    a MAG M by adding bidirected edges between node pairs
nects Vi , Vi+1 . Then V1 is called the start node and Vn+1        that cannot be m-separated (Richardson and Spirtes, 2002).
the end node of the walk. A path is a walk in which no node
occurs more than once. Given a node set X and a node set
Y, a walk from X ∈ X to Y ∈ Y is called proper if only its         3   ALGORITHMS FOR M-SEPARATION
start node is in X. Given a graph G = (V, E) and a node
set V0 , the induced subgraph GV0 = (V0 , E0 ) contains the        In this section, we compile an algorithmic framework for
edges E0 from G that are adjacent only to nodes in V0 .            solving a host of problems related to verification, con-
                                                                   struction, and enumeration of m-separating sets in AGs.
Ancestry. A walk of the form V1 → . . . → Vn is di-                The problems are defined in Fig. 1, which also shows
rected, or causal. If there is a directed walk from U to V,        the asymptotic runtime of their solutions. Throughout, n
then U is called an ancestor of V and V a descendant of U.         stands for the number of nodes and m for the number of
A graph is acyclic if no directed walk from a node to itself       edges in a graph. All of these problems except L IST S EP
is longer than 0. All directed walks in an acyclic graph are       can be solved by rather straightforward modifications of
paths. A walk is anterior if it were directed after replacing      existing algorithms (Acid and de Campos, 1996; Shachter,
all edges U − V by U → V. If there is an anterior path             1998; Tian et al., 1998; Textor and Liśkiewicz, 2011).
Pseudocodes of these algorithms are shown for reference          m-separators containing I is a subset of Ant(X ∪ Y ∪ I).
and implementation in the Appendix of this paper, as are
proof details omitted from the main text.                        Proof. Assume there is a minimal separator Z with Z *
An important tool for solving similar problems for d-            Ant(X ∪ Y ∪ I). According to Lemma 3.4 we have that
separation is moralization, by which d-separation can be         Z0 = Ant(X ∪ Y ∪ I) ∩ Z is a separator with I ⊆ Z0 . But
reduced to a vertex cut in an undirected graph. This re-         Z0 ⊆ Ant(X ∪ Y ∪ I) and Z0 ⊆ Z, so Z , Z0 and Z is not a
duction allows to solve problems like F IND M IN S EP using      minimal separator.                                    
standard network flow algorithms (Acid and de Campos,
1996). Moralization can be generalized to AGs in the fol-        Corollary 3.5 applies to minimum-cost separators as well
lowing manner.                                                   because every minimum-cost separator must be minimal.
Definition 3.1 (Moralization of AGs (Richardson and              Now we can solve F IND M IN C OST S EP and F IND M IN -
Spirtes, 2002)). Given an AG G, the augmented graph (G)a         S IZE S EP by using weighted min-cut, which takes time
is an undirected graph with the same node set as G such          O(n3 ) using practical algorithms, and L IST M IN S EP by us-
that X − Y is an edge in (G)a if and only if X and Y are         ing Takata’s algorithm to enumerate minimal vertex cuts
collider connected in G.                                         with delay O(n3 ) (Takata, 2010).

Theorem 3.2 (Reduction of m-Separation to vertex cuts            However, for F IND M IN S EP and T EST M IN S EP, we can do
(Richardson and Spirtes, 2002)). Given an AG G and three         better than using standard vertex cuts.
node sets X,Y and Z, Z m-separates X and Y if and only if        Proposition 3.6. The task F IND M IN S EP can be solved in
Z is an X-Y node cut in (GAnt(X∪Y∪Z )a .                         time O(n2 ).
A direct implementation of Definition 3.1 would lead to a
                                                                 Proof. Two algorithms are given in the appendix, one with
suboptimal algorithm. Therefore, we first give an asymp-
                                                                 runtime O(nm) (Algorithm 8) and one with runtime O(n2 )
totically optimal (linear time in output size) moralization
                                                                 (Algorithm 9).                                         
algorithm for AGs. We then solve T EST M IN S EP, F IND -
M IN S EP, F IND M IN C OST S EP and L IST M IN S EP by gener-   Corollary 3.7. The task T EST M IN S EP can be solved in
alizing existing correctness proofs of the moralization ap-      time O(n2 ).
proach for d-separation (Tian et al., 1998).
Not all our solutions are based on moralization, however.        Proof. First verify whether Z is an m-separator using mor-
Moralization takes time O(n2 ), and T EST S EP and F IND -       alization. If not, return “no”. Otherwise, set S = Z and
S EP can be solved faster, i.e. in asymptotically optimal        solve F IND M IN S EP. Return “yes” if the output is Z and
time O(n + m).                                                   “no”, otherwise.                                         
Lemma 3.3 (Efficient AG moralization). Given an AG G,
the augmented graph (G)a can be computed in time O(n2 ).         Moralization can in the worst case quadratically increase
                                                                 the size of a graph. Therefore, in some cases, it may be
Proof. The algorithm proceeds in four steps. (1) Start by        preferable to avoid moralization if the task at hand is rather
setting (G)a to G replacing all edges by undirected ones.        simple, as are the two tasks considered below.
(2) Identify all connected components in G with respect          Proposition 3.8. The task F IND S EP can be solved in time
to bidirected edges (two nodes are in the same such com-         O(n + m).
ponent if they are connected by a path consisting only of
bidirected edges). Nodes without adjacent bidirected edges
                                                                 Proof. This follows directly from Lemma 3.4, and the fact
form singleton components. (3) For each pair U, V of nodes
                                                                 that the set Ant(X ∪ Y ∪ I) ∩ R can be found in linear
from the same component, add the edge U − V to (G)a if it
                                                                 time from the MAG without moralization. Note that un-
did not exist already. (4) For each component, identify all
                                                                 like in DAGs, two non-adjacent nodes cannot always be
its parents (nodes U with an edge U → V where U is in the
                                                                 m-separated in ancestral graphs.                       
component) and link them all by undirected edges in (G)a .
Now two nodes are adjacent in (G)a if and only if they are
collider connected in G. All four steps can be performed in      By modifying the Bayes-Ball algorithm (Shachter, 1998)
time O(n2 ).                                                    appropriately, we get the following.
                                                                 Proposition 3.9. The task T EST S EP can be solved in time
Lemma 3.4. Let X, Y, I, R be sets of nodes with I ⊆ R,
                                                                 O(n + m).
R ∩ (X ∪ Y) = ∅. If there exists an m-separator Z0 , with
I ⊆ Z0 ⊆ R then Z = Ant(X∪Y∪I)∩R is an m-separator.              Lastly, we consider the problem of listing all m-separators.
Corollary 3.5 (Ancestry of minimal separators). Given an         Here is an algorithm to solve that problem with polynomial
AG G, and three sets X, Y, I, every minimal set Z over all       delay.
             Verification: For given X, Y and Z decide if . . .
                    T EST S EP             Z m-separates X, Y                                   O(n + m)
                    T EST M IN S EP        Z m-separates X, Y but no Z0 ( Z does                O(n2 )
             Construction: For given X, Y and auxiliary I, R, output . . .
                    F IND S EP             an m-separator Z with I ⊆ Z ⊆ R                      O(n + m)
                    F IND M IN S EP        a minimal m-separator Z with I ⊆ Z ⊆ R               O(n2 )
                    F IND M IN C OST S EP a minimum-cost m-separator Z with I ⊆ Z ⊆ R           O(n3 )
             Enumeration: For given X, Y, I, R enumerate all . . .
                    L IST S EP             m-separators Z with I ⊆ Z ⊆ R                        O(n(n + m)) delay
                    L IST M IN S EP        minimal m-separators Z with I ⊆ Z ⊆ R                O(n3 ) delay

Table 1: Definitions of algorithmic tasks related to m-separation. Throughout, X, Y, R are pairwise disjoint node sets, Z is
disjoint with X, Y which are nonempty, and I, R, Z can be empty. By a minimal m-separator Z, with I ⊆ Z ⊆ R, we mean a
set such that no proper subset Z0 of Z, with I ⊆ Z0 , m-separates the pair X and Y. Analogously, we define a minimal and a
minimum-cost m-separator. The construction algorithms will output ⊥ if no set fulfilling the listed condition exists. Delay
complexity for e.g. L IST M IN S EP refers to the time needed to output one solution when there can be exponentially many
solutions (see Takata (2010)).

    function L IST S EP(G, X, Y, I, R)                           ables V = {X1 , . . . , Xn } as p(v) = nj=1 p(x j |pa j ), where
                                                                                                           Q
       if F IND S EP(G, X, Y, I, R) , ⊥ then                     pa j denotes a particular realization of the parent variables
           if I = R then Output I                                of X j in G. When interpreted causally, an edge Xi → X j
           else                                                  is taken to represent a direct causal effect of Xi on X j . For
                V ← an arbitrary node of R \ I                   disjoint X, Y ⊆ V, the (total) causal effect of X on Y is
                L IST S EP(G, X, Y, I ∪ {V}, R)                  p(y|do(x)) where do(x) represents an intervention that sets
                L IST S EP(G, X, Y, I, R \ {V})                  X = x. In a DAG, this intervention corresponds to remov-
                      Figure 1: ListSep                          ing all edges into X, disconnecting X from its parents. We
                                                                 denote the resulting graph as GX . Given DAG G and a joint
                                                                 probability density p for V the post-intervention distribu-
Proposition 3.10. The task L IST S EP can be solved with         tion can be expressed in a truncated factorization formula:
polynomial delay O(n(n + m)).                                                   Y
                                                                               
                                                                                         p(x j |pa j ) for V consistent with x
                                                                 p(v|do(x)) = 
                                                                               
                                                                               
Proof. Algorithm L IST S EP performs backtracking to enu-                         X ∈V\X
                                                                                j
                                                                               
merate all Z with I ⊆ Z ⊆ R aborting branches that will not
                                                                                 0                     otherwise.
find a valid separator. Since every leaf will output a sepa-     Definition 4.1 (Adjustment (Pearl, 2009)). Given a DAG
rator, the tree height is at most n and the existence check      G = (V, E) and pairwise disjoint X, Y, Z ⊆ V, Z is called
needs O(n + m), the delay time is O(n(n + m)). The al-           covariate adjustment for estimating the causal effect of X
gorithm generates every separator exactly once: if initially
I ( R, with V ∈ R \ I, then the first recursive call returns
                                                                 on Y, or simply adjustment, if for everyP distribution p con-
                                                                 sistent with G we have p(y | do(x)) = z p(y | x, z)p(z).
all separators Z with V ∈ Z and the second call returns all
                                                                 Definition 4.2 (Adjustment criterion (Shpitser et al., 2010;
Z0 with V < Z0 . Thus the generated separators are pairwise
                                                                 Shpitser, 2012)). Let G = (V, E) be a DAG, and X, Y, Z ⊆
disjoint. This is a modification of the enumeration algo-
                                                                 V be pairwise disjoint subsets of variables. The set Z sat-
rithm for minimal vertex separators (Takata, 2010).       
                                                                 isfies the adjustment criterion relative to (X, Y) in G if

4     ADJUSTMENT IN DAGS                                          (a) no element in Z is a descendant in G of any W ∈ V\X
                                                                      which lies on a proper causal path from X to Y and
In this section, we leverage the algorithmic framework of
                                                                  (b) all proper non-causal paths in G from X to Y are
the last section together with a new constructive, sound
                                                                      blocked by Z.
and complete criterion for covariate adjustment in DAGs
to solve all problems listed in Table 1 for adjustment sets      Remark 4.3. In (Shpitser et al., 2010; Shpitser, 2012) the
instead of m-separators in the same asymptotic time. First,      criterion is stated in a slightly different way, namely using
however, we need to introduce some more notation pertain-        in the condition (a) GX instead of G. However, the two
ing to the causal interpretation DAGs.                           statements are equivalent.

Do-operator and adjustment sets. A DAG G encodes                 Proof. First note that if Z satisfies the condition (a) then
the factorization of joint distribution p for the set of vari-   Z satisfies (a) with GX instead of G, too. Since condi-
tions (b) in Definition 4.2 and in (Shpitser et al., 2010; Sh-   Theorem 4.6. The constructive back-door criterion is
pitser, 2012) are identical, the adjustment criterion above      equivalent to the adjustment criterion.
implies the criterion of Shpitser et al.
Now assume Z satisfies the condition (a) with GX instead of      Proof. First observe that the conditions (a) of both criteria
G and the condition (b). We show that Z then satisfies the       are identical. Assume conditions (a) and (b) of the adjust-
condition (a), or there must exist some W ∈ V \ X, which         ment criterion hold. We show that (b) of the constructive
lies on a proper causal path from X to Y, and a causal path      back-door criterion follows. Let π be any proper path from
                                                                             pbd            pbd
from W to Z which intersects X.                                  X to Y in GXY . Because GXY does not contain causal paths
                                                                 from X to Y, π is not causal and has to be blocked by Z in
Let W → . . . → Y denote the suffix of the path from X to        G by the assumption. Since removing edges cannot open
Y starting in W. Note that this path can consist only of the                                    pbd
                                                                 paths, π is blocked by Z in GXY as well.
vertex W. Additionally, for the causal path from W to Z,
let W → . . . → X be its shortest prefix which intersects        Now we show that (a) and (b) of the constructive back-door
X. Then, from the condition (a), with GX instead of G,           criterion together imply (b) of the adjustment criterion. If
we know that no vertex of W → . . . → X belongs to Z.            that were not the case, then there could exist a proper non-
                                                                                                                  pbd
This leads to a contradiction with the condition (b) since       causal path π from X to Y that is blocked in GXY but open
X ← . . . ← W → . . . → Y is a proper non-causal path in         in G. There can be two reasons why π is blocked in GXY :
                                                                                                                           pbd
G from X to Y that is not blocked by Z.                         (1) The path starts with an edge X → D that does not exist
                                                                      pbd
                                                                 in GXY . Then we have D ∈ PCP(X, Y). For π to be non-
Analogously to GX , by GX we denote a DAG obtained from          causal, it would have to contain a collider C ∈ An(Z) ∩
G by removing all edges leaving X.                               De(D) ⊆ An(Z) ∩ Dpcp(X, Y). But because of (a), An(Z) ∩
                                                                 Dpcp(X, Y) is empty. (2) A collider C on π is an ancestor
4.1   CONSTRUCTIVE BACK-DOOR CRITERION                                                    pbd
                                                                 of Z in G, but not in GXY . Then there must be a directed
                                                                 path from C to Z via an edge X → D with D ∈ An(Z) ∩
Definition 4.4 (Proper back-door graph). Let G = (V, E)
                                                                 PCP(X, Y), contradicting (a).                           
be a DAG, and X, Y ⊆ V be pairwise disjoint subsets of
                                                     pbd
variables. The proper back-door graph, denoted as GXY , is
obtained from G by removing the first edge of every proper       4.2   ADJUSTING FOR MULTIPLE EXPOSURES
causal path form X to Y.
                                                                 For a singleton set X = {X} of exposures we know that if
Note the difference between the back-door graph GX and           a set of variables Y is disjoint from {X} ∪ Pa(X) then one
                                 pbd                             obtains easily an adjustment set with respect to X and Y
the proper back-door graph GXY : in GX all edges leaving         as Z = Pa(X) (Pearl, 2009, Theorem 3.2.2). The situation
                           pbd
X are removed while in GXY only those that lie on a proper       changes drastically if the effect of multiple exposures is es-
                                       pbd                       timated. Theorem 3.2.5 in Pearl (2009) claims that the ex-
causal path. However, to construct GXY still only elemen-
tary operations are sufficient. Indeed, we remove all edges      pression for P(y | do(x)) is obtained by adjusting for Pa(X)
X → D in E such that X ∈ X and D is in the subset, which         if Y is disjoint from X ∪ Pa(X), but, as the DAG in Fig. 2
we call PCP(X, Y), obtained as follows:                          shows, this is not true: the set Z = Pa(X1 , X2 ) = {Z2 }
                                                                 is not an adjustment set according to {X1 , X2 } and Y. In
           PCP(X, Y) = (DeX (X) \ X) ∩ AnX (Y)            (1)    this case one can identify the causal effect by adjusting for
                                                                 Z = {Z1 , Z2 } only. Indeed, for more than one exposure, no
where DeX (W) denotes descendants of W in GX . AnX (W)           adjustment set may exist at all even without latent covari-
is defined analogously for GX . Hence, the proper back-door      ates and even though Y ∩ (X ∪ Pa(X)) = ∅, e.g. in the DAG
graph can be constructed from G in linear time O(m + n).         X1     X2     Z       Y.
Now we propose the following adjustment criterion. For
short, we will denote the set De(PCP(X, Y)) as Dpcp(X, Y).       Using our criterion, we can construct a simple adjustment
                                                                 set explicitly if one exists. For a DAG G = (V, E) we define
Definition 4.5 (Constructive back-door criterion (CBC)).
                                                                 the set
Let G = (V, E) be a DAG, and let X, Y, Z ⊆ V be pair-
wise disjoint subsets of variables. The set Z satisfies the            Adj(X, Y) = An(X ∪ Y) \ (X ∪ Y ∪ Dpcp(X, Y)).
constructive back-door criterion relative to (X, Y) in G if
                                                                 Theorem 4.7. Let G = (V, E) be a DAG and let X, Y ⊆ V
 (a) Z ⊆ V \ Dpcp(X, Y) and                                      be distinct node sets. Then the following statements are
                                                                 equivalent:
(b) Z d-separates X and Y in the proper back-door graph
     pbd
    GXY .                                                         1. There exists an adjustment in G w.r.t. X and Y.
                                                 pbd
              G              GX                 GXY               T EST M INA DJ can be solved with an algorithm that itera-
      X1           Y1   X1         Y1    X1            Y1         tively removes nodes from Z and tests if the resulting set
                                                                  remains an adjustment set w.r.t. X and Y. This can be done
              Z1              Z1                 Z1               in time O(n(n + m)). Alternatively, one can construct the
                                                                                              pbd
                                                                  proper back-door graph GXY from G and test if Z is a min-
           Z2                Z2                 Z2                imal d-separator, with Z ⊆ V \ Dpcp(X, Y) between X and
                                                                  Y. This can be computed in time O(n2 ). The correctness of
      X2           Y2   X2         Y2    X2            Y2         these algorithms follows from the proposition below, which
                                                                  is a generalization of the result in Tian et al. (1998).
                                                                  Proposition 4.9. If no single node Z can be removed from
Figure 2: A DAG where for X = {X1 , X2 } and Y = {Y1 , Y2 },
                                                                  an adjustment set Z such that the resulting set Z0 = Z \ Z
Z = {Z1 , Z2 } is a valid and minimal adjustment, but no
                                                                  is no longer an adjustment set, then Z is minimal.
set fulfills the back-door criterion (Pearl, 2009), and the
parents of X are not a valid adjustment set either.               The remaining problems like F INDA DJ, F IND M INA DJ etc.
                                                                  can be solved using corresponding algorithms for finding,
 2. Adj(X, Y) is an adjustment w.r.t. X and Y.                    resp. listing m-separations applied for proper back-door
                                                                  graphs. Since the proper back-door graph can be con-
 3. Adj(X, Y) d-separates X and Y in the proper back-             structed in linear time the time complexities to solve the
                 pbd
    door graph GXY .                                              problems above are as listed in Table 1.


Proof. The implication (3) ⇒ (2) follows directly from            5 ADJUSTMENT IN MAGS
the criterion Def. 4.5 and the definition of Adj(X, Y). Since
the implication (2) ⇒ (1) is obvious, it remains to prove         We now generalize the results from the previous section
(1) ⇒ (3).                                                        to MAGs. Two examples may illustrate why this gener-
                                                                  alization is not trivial. First, take G = X → Y. If G is
Assume there exists an adjustment set Z0 w.r.t. X and Y.          interpreted as a DAG, then the empty set is valid for adjust-
From Theorem 4.6 we know that Z0 ∩ Dpcp(X, Y) = ∅ and             ment. If G is however taken as a MAG, then there exists
                                pbd
that Z0 d-separates X and Y in GXY . Our task is to show          no adjustment set as G represents among others the DAG
                                          pbd
that Adj(X, Y) d-separates X and Y in GXY . This follows           U X Y where U is an unobserved confounder. Sec-
from Lemma 3.4 used for the proper back-door graph GXY
                                                            pbd
                                                                  ond, take G = A → X → Y. In that case, the empty set
if we take I = ∅, R = V \ (X ∪ Y ∪ Dpcp(X, Y)).                  is an adjustment set regardless of whether G is interpreted
                                                                  as a DAG or a MAG. The reasons will become clear as we
                                                                  move on. First, let us recall the semantics of a MAG. The
From Equation 1 and the definition Dpcp(X, Y)                =
                                                                  following definition can easily be given for AGs in general,
De(PCP(X, Y)) we then obtain immediately:
                                                                  but we do not need this generality for our purpose.
Corollary 4.8. Given two distinct sets X, Y ⊆ V, Adj(X, Y)
                                                                  Definition 5.1 (DAG representation by MAGs (Richardson
can be found in O(n + m) time.
                                                                  and Spirtes, 2002)). Let G = (V, E) be a DAG, and let
                                                                  S, L ⊆ V. The MAG M = G[LS is a graph with nodes
4.3   TESTING, COMPUTING, AND                                     V \ (S ∪ L) and defined as follows. (1) Two nodes U and V
      ENUMERATING ADJUSTMENT SETS                                 are adjacent in G[LS if they cannot be m-separated by any
                                                                  Z with S ⊆ Z ⊆ V \ L in G. (2) The edge between U and
Using our criterion, every algorithm for m-separating sets
                                                                  V is
Z between X and Y can be used for adjustment sets with
respect to X and Y, by requiring that Z not contain any
node in Dpcp(X, Y). This allows solving all problems               U − V if U ∈ An(S ∪ V) and V ∈ An(S ∪ U);
listed in Table 1 for adjustment sets in DAGs instead of m-        U → V if U ∈ An(S ∪ V) and V < An(S ∪ U);
separators. Below, we name those problems analogously as
for m-separation, e.g. the problem to decide whether Z is          U ↔ V if U < An(S ∪ V) and V < An(S ∪ U).
an adjustment set w.r.t. X, Y is named T ESTA DJ in analogy
to T EST S EP.                                                    We call L latent variables and S selection variables. We
T ESTA DJ can be solved by testing if Z ∩ Dpcp(X, Y) = ∅          say there is selection bias if S , ∅.
                                                      pbd
and Z is a d-separator in the proper back-door graph GXY .        Hence, every MAG represents an infinite set of underlying
        pbd
Since GXY can be constructed from G in linear time, the           DAGs that all share the same ancestral relationships. For a
total time complexity of this algorithm is O(n + m).              given MAG M, we can construct a represented DAG G by
replacing every edge X − Y by a path X → S ← Y, and                   there is a collider path A ↔ V1 ↔ . . . ↔ Vn ↔ X or
every edge X ↔ Y by X ← L → Y, where S and L are new                  A → V1 ↔ . . . ↔ Vn ↔ X where all Vi are parents of D.
nodes; then M = G[LS where S and L are all new nodes. G
is called the canonical DAG of M (Richardson and Spirtes,             Definition 5.6. We call a MAG M = (V, E) adjustment
2002), which we write as C(M).                                        amenable w.r.t. X, Y ⊆ V if all proper causal paths from X
Lemma 5.2 (Preservation of separating sets (Richardson                to Y start with a visible directed edge.
and Spirtes, 2002)). Z m-separates X, Y in G[LS if and only           Lemma 5.7. If a MAG M = (V, E) is not adjustment
if Z ∪ S m-separates X, Y in G.                                       amenable w.r.t. X, Y ⊆ V then there exists no adjustment
                                                                      set W for X, Y in M.
We now extend the concept of adjustment to MAGs in the
usual way (Maathuis and Colombo, 2013).
                                                                      Proof. If the first edge X → D on some causal path to
Definition 5.3 (Adjustment in MAGs). Given a MAG M =                  Y in M is not visible, then there exists a consistent DAG
(V, E) and two variable sets X, Y ⊆ V, Z ⊆ V is an adjust-            G where there is a non-causal path between X and Y via
ment set for X, Y in M if for every probability distribu-             V that could only be blocked in M by conditioning on D
tion p(v0 ) consistent with a DAG G = (V0 , E0 ) for which            or some of its descendants. But such conditioning would
G[LS = M for some S, L ⊆ V0 \ V, we have                              violate the adjustment criterion in G.                  
                          X
           p(y | do(x)) =     p(y | x, z, s)p(z | s) . (2)            5.2   ADJUSTMENT CRITERION FOR MAGS
                             z
                                                                      We now show that DAG adjustment criterion generalizes to
Selection bias (i.e., S , ∅) substantially complicates ad-            adjustment amenable MAGs. The adjustment criterion and
justment, and in fact nonparametric causal inference in gen-          the constructive back-door criterion are defined like their
eral (Zhang, 2008)2 . Due to these limitations, we restrict           DAG counterparts (Definitions 4.2 and 4.4), replacing d-
ourselves to the case S = ∅ in the rest of this section.              with m-separation for the latter.
Note however that recovery from selection bias is some-               Theorem 5.8. Given an adjustment amenable MAG M =
times possible with additional population data, and graphi-           (V, E) and three disjoint node sets X, Y, Z ⊆ V, the follow-
cal conditions exist to identify such cases (Barenboim et al.,        ing statements are equivalent:
2014).
                                                                       (i) Z is an adjustment relative to X, Y in M.
5.1   ADJUSTMENT AMENABILITY
                                                                      (ii) Z fulfills the adjustment criterion (AC) w.r.t. (X, Y) in
In this section we first identify a class of MAGs in which                 M.
adjustment is impossible because of causal ambiguities –
e.g., the simple MAG X → Y falls into this class, but the             (iii) Z fulfills the constructive backdoor criterion (CBC)
larger MAG A → X → Y does not.                                              w.r.t. (X, Y) in M.
Definition 5.4 (Visible edge (Zhang, 2008)). Given a MAG
M = (V, E), an edge X → Y ∈ E is called visible if in all             Proof. The equivalence of (ii) and (iii) is established by
DAGs G = (V0 , E0 ) with G[LS = M for some S, L ⊆ V0 , all            observing that the proof of Theorem 4.6 generalizes to m-
d-connected walks between X and Y in G that contain only              separation. Below we establish equivalence of (i) and (ii).
nodes of S ∪ L ∪ X ∪ Y are directed paths.
                                                                      ¬(ii) ⇒ ¬(i): If Z violates the adjustment criterion in M,
Intuitively, an invisible directed edge X → Y means that              it does so in the canonical DAG C(M), and thus is not an
there may still hidden confounding factors between X and              adjustment in M.
Y, which is guaranteed not to be the case if the edge is
                                                                      ¬(i) ⇒ ¬(ii): Let G be a DAG with G[L∅ = M in which Z
visible.
                                                                      violates the AC. We show that (a) if Z ∩ Dpcp(X, Y) , ∅ in
Lemma 5.5 (Graphical conditions for edge visibility                   G then Z ∩ Dpcp(X, Y) , ∅ in M as well, or there exists
(Zhang, 2008)). In a MAG M = (V, E), an edge X → D                    a proper non-causal path in M that cannot be m-separated;
is visible if and only if there is a node A not adjacent              and (b) if Z ∩ Dpcp(X, Y) = ∅ in G and Z d-connects a
to D where (1) A → X ∈ E or A ↔ X ∈ E, or (2)                         proper non-causal path in G, then it m-connects a proper
    2
      A counterexample is the graph A ← X → Y, where we can           non-causal path in M.
safely assume that A is the ancestor of a selection variable. A       (a) Suppose that in G, Z contains a node Z in Dpcp(X, Y),
sufficient and necessary condition for adjustment under selection
bias is Y y S | X (Barenboim et al., 2014), which is so restrictive   and let W = PCP(X, Y)∩An(Z). If M still contains at least
that most statisticians would probably not even speak of “selec-      one node W1 ∈ W, then W1 lies on a proper causal path
tion bias” anymore in such a case.                                    in M and Z is a descendant of W1 in M. Otherwise, M
                                                                                                             1 ,L2 }
             DAG G               MAG M = G[W
                                           ∅
                                             1
                                                                               DAG G             MAG M = G[{L
                                                                                                           ∅
       X                 Y      X            Y                            L1       Z     L2                 Z

       Z       W1       W2       Z               W2                     X                  Y      X                  Y
                                                                               A                        A
Figure 3: Illustration of the case in the proof of Theorem
5.8 where Z descends from W1 which in a DAG G is on a           Figure 4: Case (b) in the proof of Theorem 5.8: A proper
proper causal path from X to Y, but is not a descendant of      non-causal path wG = X ← L1 → Z ← Ls → Y in a
a node on a proper causal path from X to Y in the MAG M         DAG is d-connected by Z, but the corresponding proper
after marginalizing W1 . In such cases, conditioning on Z       non-casual path wM = X ← Z → Y is not m-connected in
will m-connect X and Y in M via a proper non-causal path.       the MAG, and its m-connected subpath w0M = X → Y is
                                                                proper causal. However, this also renders the edge X → Y
                                                                invisible, because otherwise A could be m-separated from
must contain a node W2 ∈ PCPG (X, Y) \ An(Z) (possibly          Y by U = {X, Z} in M but not in G.
W2 ∈ Y) such that W2 ↔ A, X → W2 , and X → A are
edges in M, where A ∈ An(Z) (possibly A = Z; see Fig. 3).
Then M contains an m-connected proper non-causal path           arrow out of X, and must contain a collider Z ∈ De(X)
X → A ↔ W → W2 → . . . → Y.                                     because wG is not causal. (a) Z ∈ De(D). This contra-
                                                                dicts Z ∩ Dpcp(X, Y) = ∅. So (b) Z < De(D). Then
(b) Suppose that in G, Z ∩ Dpcp(X, Y) = ∅, and there exists
                                                                by construction of w0M (Lemma A.9), wM must start with
an open proper non-causal path from X to Y. Then there
                                                                an inducing Z-trail X → Z, Z1 , . . . , Zn , D, which is also
must then also be a proper non-causal walk wG from some
                                                                an inducing path from X to D in G w.r.t. ∅, L. Then
X ∈ X to some Y ∈ Y (Lemma A.1), which is d-connected
                                                                Z, Z1 , . . . , Zn , D must also be an inducing path in G w.r.t.
by Z in G. Let wM denote the subsequence of wG formed
                                                                ∅, L because An(X) ⊆ An(Z). Hence Z and D are adjacent.
by nodes in M, which includes all colliders on wG . The
                                                                We distinguish cases on the path X → Z, D in M. (i) If
sequence wM is a path in M, but is not necessarily m-
                                                                X → Z → D, then Z lies on a proper causal path, con-
connected by Z; all colliders on wM are in Z because every
                                                                tradicting Z ∩ Dpcp(X, Y) = ∅. (ii) If X → Z ↔ D, or
non-Z must be a parent of at least one of its neighbours, but
                                                                X → Z ← D, then we get an m-connected proper non-
there can subsequences U, Z1 , . . . , Zk , V on wM where all
                                                                causal walk along Z and D.                                   
Zi ∈ Z but some of the Zi are not colliders on wM . How-
ever, then we can form from wM an m-connected walk by
bypassing some sequences of Z-nodes (Lemma A.9). Let
w0M be the resulting walk.                                      5.3   ADJUSTMENT SET CONSTRUCTION
If w0M is a proper non-causal walk, then there must also ex-    In the previous section, we have already shown that the
ist a proper non-causal path in M (Lemma A.1), violating        CBC is equivalent to the AC for MAGs as well; hence, ad-
the AC. It therefore remains to show that w0M is not a proper   justment sets for a given MAG M can be found by forming
causal path. This must be the case if wG does not contain                                     pbd
                                                                the proper back-door graph MXY and then applying the al-
colliders, because then the first edge of wM = w0M cannot       gorithms from the previous section. In principle, care must
be a visible directed edge out of X. Otherwise, the only        be taken when removing edges from MAGs as the result
way for w0M to be proper causal is if all Z-nodes in wM         might not be a MAG; however, this is not the case when
have been bypassed in w0M by edges pointing away from           removing only directed edges.
X. In that case, one can show by several case distinctions
that the first edge X → D of w0M , where D < Z, cannot be       Lemma 5.9 (Closure of maximality under removal of di-
visible (see Figure 4 for an example of such a case).           rected edges). Given a MAG M, every graph M0 formed
                                                                by removing only directed edges from M is also a MAG.
For simplicity, assume that M contains a subpath A →
X → D where A is not adjacent to D; the other cases
of edge visibility like A ↔ X → D (Lemma 5.5). are              Proof. Suppose the converse, i.e. M is no longer a MAG
treated analogously. In G, there are inducing paths (pos-       after removal of some edge X → D. Then X and D cannot
sibly several) πAX from A to X and πXD from X to D              be m-separated even after the edge is removed because X
w.r.t ∅, L; πAX must have an arrowhead at X. We dis-            and D are collider connected via a path whose nodes are all
tinguish several cases on the shape of πXD . (1) A path         ancestors of X or D (Richardson and Spirtes, 2002). The
πXD has an arrowhead at X as well. Then A, D are adja-          last edge on this path must be C ↔ D or C ← D, hence C <
cent (Lemma A.13), a contradiction. (2) No inducing path        An(D), and thus we must have C ∈ An(X). But then we get
πXD has an arrowhead at X. Then wG must start with an           C ∈ An(D) in M via the edge X → V, a contradiction. 
Corollary 5.10. For every MAG M, the proper back-door            References
        pbd
graph MXY is also a MAG.
                                                                 Silvia Acid and Luis M. de Campos. An algorithm for find-
                                                                    ing minimum d-separating sets in belief networks. In
For MAGs that are not adjustment amenable, the CBC                  Proceedings of UAI 1996, pages 3–10, 1996.
might falsely indicate that an adjustment set exists even        Silvia Acid and Luis M. de Campos. Searching for
though that set may not be valid for some represented               bayesian network structures in the space of restricted
graph. Fortunately, adjustment amenability is easily tested         acyclic partially directed graphs. Journal of Artificial
using the graphical criteria of Lemma 5.5. For each child           Intelligence Research (JAIR), 18:445–490, 2003.
D of X in PCP(X, Y), we can test the visibility of all edges
X → D simultaneously using depth first search. This              Elias Barenboim, Jin Tian, and Judea Pearl. Recovering
means that we can check all potentially problematic edges          from selection bias in causal and statistical inference. In
in time O(n + m). If all tests pass, we are licensed to apply      Proceedings of AAAI-14, 2014.
the CBC, as shown above. Hence, we can solve all algo-           Thomas H. Cormen, Charles E. Leiserson, Ronald L.
rithmic tasks in Table 1 for MAGs in the same way as for           Rivest, and Clifford Stein. Introduction to Algorithms,
DAGs after an O(k(n + m)) check of adjustment amenabil-            Second Edition. The MIT Press, 2nd edition, September
ity, where k ≤ |Ch(X)|.                                            2001. ISBN 0262032937.
                                                                 Felix Elwert. Graphical Causal Models, pages 245–273.
                                                                   Handbooks of Sociology and Social Research. Springer,
                                                                   2013.
6   DISCUSSION
                                                                 Adam Glynn and Konstantin Kashin. Front-door ver-
                                                                   sus back-door adjustment with unmeasured confound-
We have compiled efficient algorithms for solving several          ing: Bias formulas for front-door and hybrid adjust-
tasks related to m-separators in ancestral graphs, and ap-         ments. Technical report, Harvard University, 2013.
plied those together with a new, constructive adjustment         Sander Greenland. Hierarchical regression for epidemi-
criterion to provide a complete and informative answer to          ologic analyses of multiple exposures. Environmental
the question when, and how, a desired causal effect can be         Health Perspectives, 102 Suppl 8:33–39, Nov 1994.
estimated by covariate adjustment. Our results fully gener-
alize to MAGs in the absence of selection bias. One may ar-      Marloes H. Maathuis and Diego Colombo. A generalized
gue that the MAG result is more useful for exploratory ap-        backdoor criterion. arXiv:1307.5636, 2013.
plications (inferring a graph from data) than confirmatory       Judea Pearl. Causality. Cambridge University Press, 2009.
ones (drawing a graph based on theory), as researchers will        ISBN 0-521-77362-8.
prefer drawing DAGs instead of MAGs due to the easier            Thomas Richardson and Peter Spirtes. Ancestral graph
causal interpretation of the former. Nevertheless, in such         markov models. Annals of Statistics, 30:927–1223,
settings the results can provide a means to construct more         2002.
“robust” adjustment sets: If there are several options for co-
variate adjustment in a DAG, then one can by interpreting        Kenneth J. Rothman, Sander Greenland, and Timothy L.
the same graph as a MAG possibly generate an adjustment            Lash. Modern Epidemiology. Wolters Kluwer, 2008.
set that is provably valid for a much larger class of DAGs.        ISBN 0781755646.
This might partially address the typical criticism that com-     Ross D. Shachter. Bayes-ball: The rational pastime. In
plete knowledge of the causal structure is unrealistic.            Proceedings of UAI 1998, pages 480–487, 1998.
Our adjustment criterion generalizes the work of Shpitser        Ilya Shpitser. Appendum to on the validity of covariate
et al. (2010) to MAGs and therefore now completely char-            adjustment for estimating causal effects, 2012. unpub-
acterizes when causal effects are estimable by covariate ad-        lished manuscript.
justment in the presence of unmeasured confounders with          Ilya Shpitser, Tyler VanderWeele, and James Robins. On
multivariate exposures and outcomes. This also general-             the validity of covariate adjustment for estimating causal
izes recent work by Maathuis and Colombo (2013) who                 effects. In Proceedings of UAI 2010, pages 527–536.
provide a criterion which, for DAGs and MAGs without                AUAI Press, 2010.
selection bias, is stronger than the back-door criterion but
weaker than ours. They moreover show their criterion to          Ken Takata. Space-optimal, backtracking algorithms to list
hold also for CPDAGs and PAGs, which represent equiva-             the minimal vertex separators of a graph. Discrete Ap-
lence classes of DAGs and MAGs as they are constructed             plied Mathematics, 158:1660–1667, 2010.
by causal discovery algorithms. It is possible that the con-     Johannes Textor and Maciej Liśkiewicz. Adjustment crite-
structive back-door criterion could be generalized further         ria in causal diagrams: An algorithmic perspective. In
to those cases, which we leave for future work.                    Proceedings of UAI, pages 681–688, 2011.
Jin Tian, Azaria Paz, and Judea Pearl. Finding min-                 . . . → Zi ← . . . ← Ci for every collider Ci on π and a
   imal d-separators.    Technical Report R-254, Uni-               corresponding Zi ∈ Z. Since no edges are removed from
   versity of California, Los Angeles, 1998.     URL                π, w is non-causal, but not necessarily proper, since the
   ftp.cs.ucla.edu/pub/stat_ser/r254.pdf.                           inserted walks might contain nodes of X. However, in that
Tyler J. VanderWeele. On the relative nature of overadjust-         case, w can be truncated to a proper walk w0 starting at
  ment and unnecessary adjustment. Epidemiology, 20(4):             the last node of X on w. Then w0 is non-causal, since it
  496–499, Jul 2009.                                                contains the subpath X ← . . . ← Ci .                   

Jiji Zhang. Causal reasoning with ancestral graphs. Journal         In all of the below, G = (V, E) is a DAG, Z, L ⊆ V are
   of Machine Learning Research, 9:1437–1474, 2008.                 disjoint, and M = G[L∅ .
                                                                    Definition A.2 (Inducing path (Richardson and Spirtes,
                                                                    2002)). A path π = V1 , . . . , Vn+1 is called inducing with
A     APPENDIX                                                      respect to Z, L if all non-colliders on π except V1 and Vn+1
                                                                    are in L, and all colliders on π are in An({V1 , Vn+1 } ∪ Z).
A.1   AUXILIARY LEMMAS AND PROOFS
                                                                    Every inducing path w.r.t. Z, L is m-connected by Z.
In this section, we prove Lemma 3.4 and several auxiliary
Lemmas that are necessary for the proof of Theorem 5.8.             Lemma A.3 (Richardson and Spirtes (2002)). If there is
                                                                    an inducing path w from U ∈ V to V ∈ V with respect to
                                                                    Z, L, then there exists no set Z0 with Z ⊆ Z0 ⊆ (V \ L) such
Proof of Lemma 3.4. Let us consider a proper walk w =
                                                                    that Z0 d-separates U and V in G or m-separates U and V
X, V1 , . . . , Vn , Y with X ∈ X, Y ∈ Y. If w does not con-
                                                                    in G[L∅ .
tain a collider, all nodes Vi are in Ant(X ∪ Y) and the walk
is blocked by Z, unless {V1 , . . . , Vn } ∩ R = ∅ in which
                                                                    Proof. This is Theorem 4.2, cases (v) and (vi), in Richard-
case the walk is not blocked by Z0 either. If the walk
                                                                    son and Spirtes (2002).                                  
contains colliders C, it is blocked, unless C ⊆ Z ⊆ R.
Then all nodes Vi are in Ant(X ∪ Y ∪ I) and the walk is             Lemma A.4. Two nodes U, V are adjacent in G[L∅ if and
blocked, unless {V1 , . . . , Vn } ∩ R = C. Since C ⊆ Z is a        only if G contains an inducing path π between U and V
set of anteriors, there exists a shortest (possible containing      with respect to ∅, L. Moreover, the edge between U, V in
0 edges) path π j = V j → . . . → W j for each V j ∈ C with         G[L∅ can only have an arrowhead at U (V) if all such π
W j ∈ X ∪ Y ∪ I (it cannot contain an undirected edge, since        have an arrowhead at U (V) in G.
there is an arrow pointing to V j ). Let π0j = V j → . . . → W 0j
be the shortest subpath of π j that is not blocked by Z0 .          Proof. The first part on adjacency is proved in (Richardson
Let w0 be the walk w after replacing each V j by the walk           and Spirtes, 2002). For the second part on arrowheads, sup-
V j → . . . → W 0j ← . . . ← V j . If any of the W j is in          pose π does not have an arrowhead at U, then π starts with
X ∪ Y we truncate the walk, such that we get the shortest           an edge U → D. Hence D < An(U), so D ∈ An(V) be-
walk between nodes of X and Y. Since π0j is not blocked,            cause π is an inducing path and therefore also U ∈ An(V).
w0 contains no colliders except w0j and all other nodes of w0       Hence, the edge between U and V in G[L∅ must be U → V.
are not in R, w0 is not blocked and Z0 is not a separator.         The argument for V is identical.                          

Lemma A.1. Given a DAG G and sets X, Y, Z ⊆ V satisfy-              Lemma A.5. Suppose Z0 , Z1 , Z2 is a path in G[L∅ on which
ing Z∩Dpcp(X, Y) = ∅, Z m-connects a proper non-causal              Z1 is a non-collider. Suppose an inducing path π01 from Z0
path between X and Y if and only if it m-connects a proper          to Z1 w.r.t. ∅, L in G has an arrowhead at Z1 , and an in-
non-causal walk between X and Y.                                    ducing path π12 from Z1 to Z2 w.r.t. ∅, L has an arrowhead
                                                                    at Z1 . Then the walk w012 = π01 π12 can be truncated to an
                                                                    inducing path from Z0 to Z2 w.r.t. ∅, L in G.
Proof. ⇐: Let w be the m-connected proper non-causal
walk. It can be transformed to an m-connected path π by
                                                                    Proof. The walk w012 does not contain more non-colliders
removing loops of nodes that are visited multiple times.
                                                                    than those on π01 or π12 , so they must all be in L. It re-
Since no nodes have been added, π remains proper, and
                                                                    mains to show that the colliders on w012 are in An(Z0 ∪ Z2 ).
the first edges of π and w are the same. So if w does not
                                                                    Because Z1 is not a collider on Z0 , Z1 , Z2 , at least one of
start with a → edge, π is non-causal. If w starts with an
                                                                    the edges Z0 , Z1 and Z1 , Z2 must be a directed edge point-
edge X → D, there exists a collider with a descendant in Z
                                                                    ing away from Z1 . Assume without loss of generality that
which is in De(D). So π has to be non-causal, or it would
                                                                    Z0 ← Z1 is that edge. Then all colliders on π01 are in
contradict Z ∩ Dpcp(X, Y) = ∅.
                                                                    An(Z0 ∪ Z1 ) = An(Z0 ) ⊆ An(Z0 ∪ Z2 ), and all colliders on
⇒: Let π be an m-connected proper non-causal path. It can           π12 are in An(Z1 ∪ Z2 ) ⊆ An(Z0 ∪ Z2 ). Z1 itself is a col-
be changed to an m-connected walk w by inserting Ci →               lider on w012 and is also in An(Z0 ). Hence, the walk w012
is d-connected, and can be truncated to an inducing path            Corollary A.10. Each edge on w0M as defined above cor-
that starts with the first arrow of π01 and ends with the last      responds to an inducing path w.r.t ∅, L in G along nodes on
arrow of π12 .                                                     wG .
                                                                    Lemma A.11. Suppose there exists an inducing path π01
Definition A.6 (Inducing Z-trail). Let π = V1 , . . . , Vn+1
                                                                    from Z0 to Z1 w.r.t. S, L with an arrowhead at Z1 and an
be a path in G[L∅ such that V2 , . . . , Vn ∈ Z, V1 , Vn+1 < Z,
                                                                    inducing path from Z1 to Z2 w.r.t. S0 , L with an arrowhead
and for each i ∈ {1, . . . , n}, there is an inducing path w.r.t.
                                                                    at Z1 . Then the walk w012 = π01 π12 can be truncated to an
∅, L linking Vi , Vi+1 that has an arrowhead at Vi (Vi+1 ) if
                                                                    inducing path from Z0 to Z2 w.r.t. S ∪ S0 ∪ {Z1 }, L in G.
Vi ∈ Z (Vi+1 ∈ Z). Then π is called an inducing Z-trail.
Lemma A.7. Let π = V1 , . . . , Vn+1 be an inducing Z-trail,        Proof. The walk w012 does not contain more non-colliders
and let π0 be a subsequence of π formed by removing one             than those on π01 or π12 , so they must all be in L.
node Vi of π such that Vi ∈ Z is a non-collider on π. Then          All colliders on π0,1 and π1,2 as well as Z1 are in
π0 is an inducing Z-trail.                                          An(Z0 , Z1 , Z2 , S, S0 ), and therefore also all colliders of
                                                                    w012 .
Proof. According to Lemma A.5, if Vi is a non-collider on           Hence, the walk w012 is d-connected, and can be truncated
π, then Vi−1 and Vi+1 are linked by an inducing path π that         to an inducing path that starts with the first arrow of π01
contains an arrowhead at Vi−1 (Vi+1 ) if Vi−1 ∈ Z (Vi+1 ∈           and ends with the last arrow of π12 .                    
Z). Therefore, Vi−1 and Vi+1 are themselves adjacent, π0 is
a path, and is a Z-trail.                                          Lemma A.12. Suppose Z0 , Z1 , . . . , Zk+1 is a path in G[L∅
                                                                    with an arrowhead at Zk+1 on which all Z1 , . . . , Zk are col-
Corollary A.8. Every inducing Z-trail π = V1 , . . . , Vn+1         liders. Then there exists an inducing path from Z0 to Zk+1
has a subpath π0 that is m-connected by Z.                          w.r.t. {Z1 , . . . , Zk }, L with an arrowhead at Zk+1 .

Proof. Transform π into π0 by replacing non-collider                Proof. Because all Zi , Zi+1 are adjacent and all Z1 , . . . , Zk
nodes in Z by the direct edge linking their neighbours un-          are colliders there exist inducing paths πi,i+1 w.r.t. ∅, L from
til no such node exists anymore. By inductively applying            Zi to Zi+1 that have arrowheads at Z1 , . . . , Zk (Lemma A.4).
Lemma A.7, we see that π0 is also an inducing Z-trail, and          The claim follows by repeatedly applying Lemma A.11 to
every node in Z is a collider because otherwise we would            the πi,i+1 ’s.                                                  
have continued transforming. So π0 must be m-connected
by Z.                                                              Lemma A.13. Suppose A → V1 ↔ . . . ↔ Vk ↔ X →
                                                                    D or A ↔ V1 ↔ . . . ↔ Vk ↔ X → D is a path in
Lemma A.9. Let wG be a walk from X to Y in G, X, Y < L,             G[L∅ (possibly k = 0), each Vi is a parent of D and there
that is d-connected by Z. Let wM = V1 , . . . , Vn+1 be the         exists an inducing path πXD from X to D w.r.t ∅, L that has
subsequence of wG consisting only of the nodes in M =               arrowheads on both ends. Then A and D cannot be m-
G[L∅ . Then Z m-connects X and Y in M via a path along a            separated in G[L∅ .
subsequence w0M formed from wM by removing some nodes
in Z (possibly w0M = wM ).                                          Proof. Assume the path is A → V1 ↔ . . . ↔ Vk ↔ X →
                                                                    D. The case where the path starts with A ↔ V1 can be
Proof. First, truncate from wM all subwalks between                 handled identically, since the first arrowhead does not af-
nodes in Z that occur more than once. Now consider                  fect m-separation.
all subsequences V1 , . . . , Vn+1 , n > 1, of wM where             Assume A and D can be m-separated in G[L∅ , and let Z
V2 , . . . , Vn ∈ Z, V1 , Vn+1 < Z, which now are all paths         be such a separator. If V1 is not in Z then the path A →
in wM . On those subsequences, every Vi must be adjacent            V1 → D is not blocked, so V1 ∈ Z. Inductively it follows,
in G to Vi+1 via a path containing no colliders, and all non-       if Vi is not in Z, but all ∀j < i : V j ∈ Z then the path
endpoints on that path must be in L. So there are inducing          A → V1 ↔ . . . ↔ Vi−1 ↔ Vi → D is not blocked, so
paths w.r.t. ∅, L between all Vi , Vi+1 , which have arrow-         Vi ∈ Z for all i.
heads at Vi (Vi+1 ) if Vi ∈ Z (Vi+1 ∈ Z). So V1 , . . . , Vn+1
is an inducing Z-trail, and has a subpath which m-connects          There exist an inducing path πAX from A to X with an ar-
V1 , Vn+1 given Z. Transform wM to w0M by replacing all             rowhead at X w.r.t. to {V1 , . . . , Vk }, L (Lemma A.12) which
inducing Z-trails by their m-connected subpaths. Accord-            can be combined with πXD to an inducing path from A to
ing to Lemma A.4, non-colliders on wM cannot be collid-             D w.r.t. to {V1 , . . . , Vk , X}, L (Lemma A.11).
ers on w0M , as bypassing inducing paths can remove but             Hence no m-separator of A, D can contain {X, V1 , . . . , Vk }
not create arrowheads. Moreover, all nodes in Z on w0M are          (Lemma A.3). Then there cannot exist an m-separator, be-
colliders. Hence w0M is m-connected by Z.                          cause every separator must include V1 , . . . , Vk and the path
A → V1 ↔ V2 ↔ . . . ↔ Vk ↔ X → D is open without                 function F IND S EP(G, X, Y, I, R)
X ∈ Z.                                                             R0 ← R \ (X ∪ Y)
                                                                    Z ← Ant(X, Y, I) ∩ R0
                                                                    if T EST S EP(G, X, Y, Z) then
A.2     ALGORITHMS                                                      return Z
                                                                    else
This section contains algorithm pseudocodes and parts of
                                                                        return ⊥
their correctness proofs that were omitted from the main
text for space reasons.                                                            Figure 7: FindSep


A.2.1    TESTING                                               A.2.3   FINDING A MINIMAL M-SEPARATOR
For a given ancestral graph G the problem T EST S EP can       For a given AG G the problem F IND M IN S EP can
be solved with a modified Bayes-Ball algorithm in time         be solved with algorithm 8 F IND M IN S EP NAIVE in
O(n+m). In the algorithm every bi-directed edge A ↔ B is       O(|Ant(X ∪ Y)||EAn |) = O(n(n + m)) or algorithm 9
considered as a pair of edges A ← · → B and an undirected      F IND M IN S EP M ORAL in O(|Em  |) = O(n2 ) time.
                                                                                             An
edge A − B as a directed edge pointing to the currently vis-
ited node.                                                       function F IND M IN S EP NAIVE(G, X, Y, I, R)
                                                                    G0 ← GAnt(X∪Y∪I)
  function T EST S EP(G, X, Y, Z)
                                                                    Z ← R ∩ Ant(X ∪ Y ∪ I)
     Run Bayes-Ball from X
                                                                    if not T EST S EP(G0 , X, Y, Z) then
     return (Y not reachable)
                                                                        return ⊥
                       Figure 5: TestSep                            for all U in Z \ I do
                                                                        if T EST S EP(G0 , X, Y, Z \ {U}) then
The problem T EST M IN S EP can be solved using Algorithm                   Z ← Z \ {U}
6 T EST M IN S EP in O(|Em
                         An
                            |) = O(n2 ) time. Alternatively,        return Z
the problem can be solved with an algorithm that iteratively                  Figure 8: FindMinSepNaive
removes from Z nodes and tests if the resulting set remains
an m-separator. This can be done in time O(n(n + m)).          Algorithm 8 F IND M IN S EP NAIVE depends on an implicit
The correctness of the algorithms for T EST M IN S EP can      moral graph and the fact that in an undirected graph every
be shown by generalizing the results presented in (Tian        node that cannot be removed from a separating set has to
et al., 1998) for m-separation. 6 T EST M IN S EP, runs in     be in separating subsets, and runs in O(|Ant(X ∪ Y)||EAn |).
O(|EmAn
        |) because Rx and R y can be computed with an ordi-
nary search that aborts when a node in Z is reached.             function F IND M IN S EP M ORAL(G, X, Y, I, R)
                                                                     G0 ← GAnt(X∪Y∪I)
  function T EST M IN S EP(G, X, Y, Z)                               G0a ← GaAnt(X∪Y∪I)
     if Z \ Ant(X ∪ Y) , ∅ then return false                         Z0 ← R ∩ Ant(X ∪ Y)
     if not T EST S EP(G, X, Y, Z) then                              Remove from G0a all nodes of I
         return false                                                if not T EST S EP(G0 , X, Y, Z) then
     G0a ← GaAnt(X∪Y)                                                    return ⊥
     Rx ← {Z ∈ Z | ∃ path X − Z in G0a                               Run BFS from X. Whenever a node in Z0 is met,
                 not intersecting Z \ {Z}}                       mark it, if it is not already marked and do not continue
     if Z * Rx then return false                                 along the path. When BFS stops, let Z00 be the set of all
     R y ← {Z ∈ Z | ∃ path Y − Z in G0a                          marked nodes. Remove all markings
                 not intersecting Z \ {Z}}                           Run BFS from Y. Whenever a node in Z00 is met,
     if Z * R y then return false                                mark it, if it is not already marked and do not continue
        return true                                              along the path. When BFS stops, let Z be the set of all
                                                                 marked nodes.
                      Figure 6: TestMinSep                           return Z ∪ I
                                                                              Figure 9: FindMinSepMoral
A.2.2    FINDING AN M-SEPARATOR
                                                               Algorithm 9 F IND M IN S EP M ORAL begins with the sep-
The problem can be solved using Algorithm 7 F IND S EP         arating set R ∩ Ant(X ∪ Y) and finds a subset satisfy-
in O(n + m) time. The correctness follows directly from        ing the conditions tested by algorithm 6 T EST M IN S EP, in
Lemma 3.4.                                                     O(|EmAn
                                                                       |).
A.2.4   FINDING A MINIMUM COST                                A.2.6   TESTING FOR ADJUSTMENT
        M-SEPARATOR                                                   AMENABILITY

The problem M IN C OST S EP can be solved with algorithm      Let N(V) denote all nodes adjacent to V, and Sp(V) denote
10 F IND M IN C OST S EP in O(n3 ).                           all spouses of V, i.e., nodes W such that W ↔ V ∈ E. The
                                                              adjustment amenability of a graph G w.r.t sets X, Y can be
                                                              tested with the following algorithm:
  function F IND M IN C OST S EP(G, X, Y, I, R, w)
     G0 ← GAnt(X∪Y∪I)                                           function T ESTA DJUSTMENTA MENABILITY(G, X, Y)
     G0a ← GaAnt(X∪Y∪I)                                            for all D in Ch(X) ∩ PCP(X, Y) do
     Add a node Xm connected to all nodes in                           C←∅
     X, and a node Ym connected to all nodes                           A←∅
     in Y.                                                             function CHECK(V)
     Assign infinite cost to all nodes in                                  if C[V] then return A[V]
     X ∪ Y ∪ (V \ R) and cost w(Z) to every                                C[V] ← true
     other node Z.                                                         A[V] ← ((Pa(V) ∪ Sp(V)) \ N(D) , ∅)
     Remove all nodes of I from G0a .                                      for all W ∈ Sp(V) ∩ Pa(D) do
     Change the graph to a flow network as                                     if CHECK(W) then A[V] ← true
     described in Cormen et al. (2001) and return a                        return A[V]
     minimum cutset Z.                                                 for all X in X ∩ Pa(D) do
               Figure 10: FindMinCostSep                                   if ¬CHECK(X) then
                                                                               return false
The correctness without I follows from the fact that a min-             Figure 12: TestAdjustmentAmenability
imum set is a minimal set and the minimal cut found in the
ancestor moral graph is therefore the minimal separating      The algorithm checks for every edge X → D on a proper
set. The handling of I is shown in Acid and de Campos         causal path to Y whether it satisfies the amenability condi-
(1996).                                                       tions of Lemma 5.5 by searching a collider path through the
                                                              parents of D to a node Z not connected to D; note that con-
                                                              dition (1) of Lemma 5.5 is identical to condition (2) with an
A.2.5   ENUMERATING ALL MINIMAL                               empty collider path. Since CHECK performs a depth-first-
        M-SEPARATORS                                          search by checking every node only once and then contin-
                                                              uing to its neighbors, each iteration of the outer for-loop in
The problem L IST M IN S EP can be solved with algorithm      the algorithm runs in linear time O(n + m). Therefore, the
11 L IST M IN S EP with O(n3 ) delay between every out-       entire algorithm runs in O(k(n + m)) where k ≤ |Ch(X)|.
putted Z.

  function L IST M IN S EP(G, X, Y, I, R)
     G0 ← GAnt(X∪Y∪I)
     G0a ← GaAnt(X∪Y∪I)
     Add a node Xm connected to all X nodes.
     Add a node Ym connected to all Y nodes.
     Remove all nodes of I.
     Remove all nodes of V \ R, but insert
     additional edges connecting the neighbours.
     of all removed nodes.
     Use the algorithm in Takata (2010) to list all sets
     separating Xm and Ym .
                 Figure 11: ListMinSep

The correctness is shown by Textor and Liśkiewicz
(2011) for adjustment sets and generalizes directly to m-
separators, because after moralization, both problems are
equivalent to enumerating vertex cuts of an undirected
graph. The handling of I is shown by Acid and de Cam-
pos (1996).