Constructing Separators and Adjustment Sets in Ancestral Graphs Benito van der Zander, Maciej Liśkiewicz Johannes Textor Theoretical Computer Science Theoretical Biology & Bioinformatics University of Lübeck, Germany Utrecht University, The Netherlands {benito,liskiewi}@tcs.uni-luebeck.de johannes.textor@gmx.de Abstract tistical properties of e.g. front-door estimation is still con- siderably lacking (VanderWeele, 2009; Glynn and Kashin, Ancestral graphs (AGs) are graphical causal 2013)1 . Unfortunately, the back-door criterion is not com- models that can represent uncertainty about the plete, i.e., it does not find all possible options for covari- presence of latent confounders, and can be in- ate adjustment that are allowed by a given graphical causal ferred from data. Here, we present an algo- model. rithmic framework for efficiently testing, con- In this paper, we aim to efficiently find a definitive an- structing, and enumerating m-separators in AGs. swer for the following question: Given a causal graph G, Moreover, we present a new constructive crite- which covariates Z do we need to adjust for to estimate the rion for covariate adjustment in directed acyclic causal effect of the exposures X on the outcomes Y? To our graphs (DAGs) and maximal ancestral graphs knowledge, no efficient algorithm has been shown to an- (MAGs) that characterizes adjustment sets as m- swer this question, not even when G is a directed acyclic separators in a subgraph. Jointly, these results graph (DAG), though constructive solutions do exist for allow to find all adjustment sets that can iden- special cases like singleton X = {X} (Pearl, 2009), and a tify a desired causal effect with multivariate ex- subclass of DAGs (Textor and Liśkiewicz, 2011). Here, we posures and outcomes in the presence of latent provide algorithms for adjustment sets in DAGs as well as confounding. Our results generalize and improve in maximal ancestral graphs (MAGs), which extend DAGs upon several existing solutions for special cases allowing to account for unspecified latent variables. Our of these problems. algorithms are guaranteed to find all valid adjustment sets for a given DAG or MAG with polynomial delay, and we also provide variants to list only those sets that minimize a 1 INTRODUCTION user-supplied cost function or to quickly construct a sim- ple adjustment set if one exists. Modelling multiple, pos- Graphical causal models endow researchers with a lan- sibly interrelated exposures X is important e.g. in case- guage to codify assumptions about a data generating pro- control studies that screen several putative causes of a dis- cess (Pearl, 2009; Elwert, 2013). Using graphical criteria, ease (Greenland, 1994). Likewise, the presence of unspeci- one can asses whether the assumptions encoded in such a fied latent variables often cannot be excluded in real-world model allow estimation of a causal effect from observa- settings, and the causal structure between the observed tional data, which is a key issue in Epidemiology (Roth- variables may not be completely known. We hope that man et al., 2008), the Social Sciences (Elwert, 2013) and the ability to quickly deduce from a given DAG or MAG other fields where controlled experimentation is typically whether and how covariate adjustment can render a causal impossible. Specifically, the famous back-door criterion by effect identifiable will benefit researchers in such areas. Pearl (2009) can identify cases where causal effect identi- We have two main contributions. First, in Section 3, we fication is possible by standard covariate adjustment, and present algorithms for verifying, constructing, and listing other methods like the front-door criterion or do-calculus m-separating sets in AGs. This subsumes a number of can even permit identification even if the back-door crite- earlier solutions for special cases of these problems, e.g. rion fails (Pearl, 2009). In current practice, however, co- variate adjustment is highly preferred to such alternatives 1 Quoting VanderWeele (2009), “Time will perhaps tell because its statistical properties are well understood, giv- whether results like Pearl’s front-door path adjustment theorem ing access to useful methodology like robust estimators and and its generalizations are actually useful for epidemiologic re- confidence intervals. In contrast, knowledge about the sta- search or whether the results are simply of theoretical interest.” the Bayes-Ball algorithm for verification of d-separating from U to V, then U is called an anterior of V. All ances- sets (Shachter, 1998), the use of network flow calculations tors of V are anteriors of V. Every node is its own ancestor, to find minimal d-separating sets in DAGs (Tian et al., descendant, and anterior. For a node set X, the set of all of 1998; Acid and de Campos, 2003), and an algorithm to its ancestors is written as An(X). The descendant and ante- list minimal adjustment sets for a certain subclass of DAGs rior sets De(X), Ant(X) are analogously defined. Also, we (Textor and Liśkiewicz, 2011). Our verification and con- denote by Pa(X), (Ch(X)), the set of parents (children) of struction algorithms for single separators are asymptoti- X. cally runtime-optimal. Although we apply our algorithms only to adjustment set construction, they are likely useful in m-Separation. A node V on a walk w is called a collider other settings as separating sets are involved in most graph- if two arrowheads of w meet at V, e.g. if w contains U ↔ ical criteria for causal effect identification. Moreover, the V ← Q. There can be no collider if w is shorter than separators themselves constitute statistically testable impli- 2. Two nodes U, V are called collider connected if there cations of the causal assumptions encoded in the graph. is a path between them on which all nodes except U and Second, we give a graphical criterion that characterizes V are colliders. Adjacent vertices are collider connected. adjustment sets in terms of separating sets, and is sound Two nodes U, V are called m-connected by a set Z if there and complete for DAGs and MAGs without selection vari- is a path π between them on which every node that is a ables. This generalizes the sound and complete criterion collider is in An(Z) and every node that is not a collider for DAGs by Shpitser et al. (2010), and the sound but in- is not in Z. Then π is called an m-connecting path. The complete adjustment criterion for MAGs without selection same definition can be stated simpler using walks: U, V are variables by Maathuis and Colombo (2013). Our criterion called m-connected by Z if there is a walk between them exhaustively addresses adjustment set construction in the on which all colliders and only colliders are in Z. If U, V presence of latent covariates and with incomplete knowl- are m-connected by the empty set, we simply say they are edge of causal structure if at least a MAG can be specified. m-connected. If U, V are not m-connected by Z, we say We give the criterion separately for DAGs (Section 4) and that Z m-separates them or blocks all paths between them. MAGs (Section 5) because the same graph usually admits Two node sets X, Y are m-separated by Z if all their nodes more adjustment options if viewed as a DAG than if viewed are pairwise m-separated by Z. In DAGs, m-separation is as a MAG. equivalent to the well-known d-separation criterion (Pearl, 2009). 2 PRELIMINARIES Ancestral graphs and DAGs. A mixed graph G = (V, E) is called an ancestral graph (AG) if the following two con- We denote sets by bold upper case letters (S), and some- ditions hold: (1) For each edge A ← B or A ↔ B, A is times abbreviate singleton sets as {S} = S. Graphs are writ- not an ancestor of B. (2) For each edge A − B, there are no ten calligraphically (G), and variables in upper-case (X). edges A ← C, A ↔ C, B ← C or B ↔ C. There can be at most one edge between two nodes in an AG (Richard- Mixed graphs and paths. We consider mixed graphs son and Spirtes, 2002). Syntactically, all DAGs are AGs G = (V, E) with nodes (vertices, variables) V and directed and all AGs containing only directed edges are DAGs. An (A → B), undirected (A−B), and bidirected (A ↔ B) edges AG G = (V, E) is a maximal ancestral graph (MAG) if E. Nodes linked by an edge are adjacent. A walk of length every non-adjacent pair of nodes U, V can be m-separated n is a node sequence V1 , . . . , Vn+1 such that there exists an by some Z ⊆ V \ {U, V}. Every AG G can be turned into edge sequence E1 , E2 , . . . , En for which every edge Ei con- a MAG M by adding bidirected edges between node pairs nects Vi , Vi+1 . Then V1 is called the start node and Vn+1 that cannot be m-separated (Richardson and Spirtes, 2002). the end node of the walk. A path is a walk in which no node occurs more than once. Given a node set X and a node set Y, a walk from X ∈ X to Y ∈ Y is called proper if only its 3 ALGORITHMS FOR M-SEPARATION start node is in X. Given a graph G = (V, E) and a node set V0 , the induced subgraph GV0 = (V0 , E0 ) contains the In this section, we compile an algorithmic framework for edges E0 from G that are adjacent only to nodes in V0 . solving a host of problems related to verification, con- struction, and enumeration of m-separating sets in AGs. Ancestry. A walk of the form V1 → . . . → Vn is di- The problems are defined in Fig. 1, which also shows rected, or causal. If there is a directed walk from U to V, the asymptotic runtime of their solutions. Throughout, n then U is called an ancestor of V and V a descendant of U. stands for the number of nodes and m for the number of A graph is acyclic if no directed walk from a node to itself edges in a graph. All of these problems except L IST S EP is longer than 0. All directed walks in an acyclic graph are can be solved by rather straightforward modifications of paths. A walk is anterior if it were directed after replacing existing algorithms (Acid and de Campos, 1996; Shachter, all edges U − V by U → V. If there is an anterior path 1998; Tian et al., 1998; Textor and Liśkiewicz, 2011). Pseudocodes of these algorithms are shown for reference m-separators containing I is a subset of Ant(X ∪ Y ∪ I). and implementation in the Appendix of this paper, as are proof details omitted from the main text. Proof. Assume there is a minimal separator Z with Z * An important tool for solving similar problems for d- Ant(X ∪ Y ∪ I). According to Lemma 3.4 we have that separation is moralization, by which d-separation can be Z0 = Ant(X ∪ Y ∪ I) ∩ Z is a separator with I ⊆ Z0 . But reduced to a vertex cut in an undirected graph. This re- Z0 ⊆ Ant(X ∪ Y ∪ I) and Z0 ⊆ Z, so Z , Z0 and Z is not a duction allows to solve problems like F IND M IN S EP using minimal separator.  standard network flow algorithms (Acid and de Campos, 1996). Moralization can be generalized to AGs in the fol- Corollary 3.5 applies to minimum-cost separators as well lowing manner. because every minimum-cost separator must be minimal. Definition 3.1 (Moralization of AGs (Richardson and Now we can solve F IND M IN C OST S EP and F IND M IN - Spirtes, 2002)). Given an AG G, the augmented graph (G)a S IZE S EP by using weighted min-cut, which takes time is an undirected graph with the same node set as G such O(n3 ) using practical algorithms, and L IST M IN S EP by us- that X − Y is an edge in (G)a if and only if X and Y are ing Takata’s algorithm to enumerate minimal vertex cuts collider connected in G. with delay O(n3 ) (Takata, 2010). Theorem 3.2 (Reduction of m-Separation to vertex cuts However, for F IND M IN S EP and T EST M IN S EP, we can do (Richardson and Spirtes, 2002)). Given an AG G and three better than using standard vertex cuts. node sets X,Y and Z, Z m-separates X and Y if and only if Proposition 3.6. The task F IND M IN S EP can be solved in Z is an X-Y node cut in (GAnt(X∪Y∪Z )a . time O(n2 ). A direct implementation of Definition 3.1 would lead to a Proof. Two algorithms are given in the appendix, one with suboptimal algorithm. Therefore, we first give an asymp- runtime O(nm) (Algorithm 8) and one with runtime O(n2 ) totically optimal (linear time in output size) moralization (Algorithm 9).  algorithm for AGs. We then solve T EST M IN S EP, F IND - M IN S EP, F IND M IN C OST S EP and L IST M IN S EP by gener- Corollary 3.7. The task T EST M IN S EP can be solved in alizing existing correctness proofs of the moralization ap- time O(n2 ). proach for d-separation (Tian et al., 1998). Not all our solutions are based on moralization, however. Proof. First verify whether Z is an m-separator using mor- Moralization takes time O(n2 ), and T EST S EP and F IND - alization. If not, return “no”. Otherwise, set S = Z and S EP can be solved faster, i.e. in asymptotically optimal solve F IND M IN S EP. Return “yes” if the output is Z and time O(n + m). “no”, otherwise.  Lemma 3.3 (Efficient AG moralization). Given an AG G, the augmented graph (G)a can be computed in time O(n2 ). Moralization can in the worst case quadratically increase the size of a graph. Therefore, in some cases, it may be Proof. The algorithm proceeds in four steps. (1) Start by preferable to avoid moralization if the task at hand is rather setting (G)a to G replacing all edges by undirected ones. simple, as are the two tasks considered below. (2) Identify all connected components in G with respect Proposition 3.8. The task F IND S EP can be solved in time to bidirected edges (two nodes are in the same such com- O(n + m). ponent if they are connected by a path consisting only of bidirected edges). Nodes without adjacent bidirected edges Proof. This follows directly from Lemma 3.4, and the fact form singleton components. (3) For each pair U, V of nodes that the set Ant(X ∪ Y ∪ I) ∩ R can be found in linear from the same component, add the edge U − V to (G)a if it time from the MAG without moralization. Note that un- did not exist already. (4) For each component, identify all like in DAGs, two non-adjacent nodes cannot always be its parents (nodes U with an edge U → V where U is in the m-separated in ancestral graphs.  component) and link them all by undirected edges in (G)a . Now two nodes are adjacent in (G)a if and only if they are collider connected in G. All four steps can be performed in By modifying the Bayes-Ball algorithm (Shachter, 1998) time O(n2 ).  appropriately, we get the following. Proposition 3.9. The task T EST S EP can be solved in time Lemma 3.4. Let X, Y, I, R be sets of nodes with I ⊆ R, O(n + m). R ∩ (X ∪ Y) = ∅. If there exists an m-separator Z0 , with I ⊆ Z0 ⊆ R then Z = Ant(X∪Y∪I)∩R is an m-separator. Lastly, we consider the problem of listing all m-separators. Corollary 3.5 (Ancestry of minimal separators). Given an Here is an algorithm to solve that problem with polynomial AG G, and three sets X, Y, I, every minimal set Z over all delay. Verification: For given X, Y and Z decide if . . . T EST S EP Z m-separates X, Y O(n + m) T EST M IN S EP Z m-separates X, Y but no Z0 ( Z does O(n2 ) Construction: For given X, Y and auxiliary I, R, output . . . F IND S EP an m-separator Z with I ⊆ Z ⊆ R O(n + m) F IND M IN S EP a minimal m-separator Z with I ⊆ Z ⊆ R O(n2 ) F IND M IN C OST S EP a minimum-cost m-separator Z with I ⊆ Z ⊆ R O(n3 ) Enumeration: For given X, Y, I, R enumerate all . . . L IST S EP m-separators Z with I ⊆ Z ⊆ R O(n(n + m)) delay L IST M IN S EP minimal m-separators Z with I ⊆ Z ⊆ R O(n3 ) delay Table 1: Definitions of algorithmic tasks related to m-separation. Throughout, X, Y, R are pairwise disjoint node sets, Z is disjoint with X, Y which are nonempty, and I, R, Z can be empty. By a minimal m-separator Z, with I ⊆ Z ⊆ R, we mean a set such that no proper subset Z0 of Z, with I ⊆ Z0 , m-separates the pair X and Y. Analogously, we define a minimal and a minimum-cost m-separator. The construction algorithms will output ⊥ if no set fulfilling the listed condition exists. Delay complexity for e.g. L IST M IN S EP refers to the time needed to output one solution when there can be exponentially many solutions (see Takata (2010)). function L IST S EP(G, X, Y, I, R) ables V = {X1 , . . . , Xn } as p(v) = nj=1 p(x j |pa j ), where Q if F IND S EP(G, X, Y, I, R) , ⊥ then pa j denotes a particular realization of the parent variables if I = R then Output I of X j in G. When interpreted causally, an edge Xi → X j else is taken to represent a direct causal effect of Xi on X j . For V ← an arbitrary node of R \ I disjoint X, Y ⊆ V, the (total) causal effect of X on Y is L IST S EP(G, X, Y, I ∪ {V}, R) p(y|do(x)) where do(x) represents an intervention that sets L IST S EP(G, X, Y, I, R \ {V}) X = x. In a DAG, this intervention corresponds to remov- Figure 1: ListSep ing all edges into X, disconnecting X from its parents. We denote the resulting graph as GX . Given DAG G and a joint probability density p for V the post-intervention distribu- Proposition 3.10. The task L IST S EP can be solved with tion can be expressed in a truncated factorization formula: polynomial delay O(n(n + m)).  Y   p(x j |pa j ) for V consistent with x p(v|do(x)) =    Proof. Algorithm L IST S EP performs backtracking to enu- X ∈V\X  j  merate all Z with I ⊆ Z ⊆ R aborting branches that will not  0 otherwise. find a valid separator. Since every leaf will output a sepa- Definition 4.1 (Adjustment (Pearl, 2009)). Given a DAG rator, the tree height is at most n and the existence check G = (V, E) and pairwise disjoint X, Y, Z ⊆ V, Z is called needs O(n + m), the delay time is O(n(n + m)). The al- covariate adjustment for estimating the causal effect of X gorithm generates every separator exactly once: if initially I ( R, with V ∈ R \ I, then the first recursive call returns on Y, or simply adjustment, if for everyP distribution p con- sistent with G we have p(y | do(x)) = z p(y | x, z)p(z). all separators Z with V ∈ Z and the second call returns all Definition 4.2 (Adjustment criterion (Shpitser et al., 2010; Z0 with V < Z0 . Thus the generated separators are pairwise Shpitser, 2012)). Let G = (V, E) be a DAG, and X, Y, Z ⊆ disjoint. This is a modification of the enumeration algo- V be pairwise disjoint subsets of variables. The set Z sat- rithm for minimal vertex separators (Takata, 2010).  isfies the adjustment criterion relative to (X, Y) in G if 4 ADJUSTMENT IN DAGS (a) no element in Z is a descendant in G of any W ∈ V\X which lies on a proper causal path from X to Y and In this section, we leverage the algorithmic framework of (b) all proper non-causal paths in G from X to Y are the last section together with a new constructive, sound blocked by Z. and complete criterion for covariate adjustment in DAGs to solve all problems listed in Table 1 for adjustment sets Remark 4.3. In (Shpitser et al., 2010; Shpitser, 2012) the instead of m-separators in the same asymptotic time. First, criterion is stated in a slightly different way, namely using however, we need to introduce some more notation pertain- in the condition (a) GX instead of G. However, the two ing to the causal interpretation DAGs. statements are equivalent. Do-operator and adjustment sets. A DAG G encodes Proof. First note that if Z satisfies the condition (a) then the factorization of joint distribution p for the set of vari- Z satisfies (a) with GX instead of G, too. Since condi- tions (b) in Definition 4.2 and in (Shpitser et al., 2010; Sh- Theorem 4.6. The constructive back-door criterion is pitser, 2012) are identical, the adjustment criterion above equivalent to the adjustment criterion. implies the criterion of Shpitser et al. Now assume Z satisfies the condition (a) with GX instead of Proof. First observe that the conditions (a) of both criteria G and the condition (b). We show that Z then satisfies the are identical. Assume conditions (a) and (b) of the adjust- condition (a), or there must exist some W ∈ V \ X, which ment criterion hold. We show that (b) of the constructive lies on a proper causal path from X to Y, and a causal path back-door criterion follows. Let π be any proper path from pbd pbd from W to Z which intersects X. X to Y in GXY . Because GXY does not contain causal paths from X to Y, π is not causal and has to be blocked by Z in Let W → . . . → Y denote the suffix of the path from X to G by the assumption. Since removing edges cannot open Y starting in W. Note that this path can consist only of the pbd paths, π is blocked by Z in GXY as well. vertex W. Additionally, for the causal path from W to Z, let W → . . . → X be its shortest prefix which intersects Now we show that (a) and (b) of the constructive back-door X. Then, from the condition (a), with GX instead of G, criterion together imply (b) of the adjustment criterion. If we know that no vertex of W → . . . → X belongs to Z. that were not the case, then there could exist a proper non- pbd This leads to a contradiction with the condition (b) since causal path π from X to Y that is blocked in GXY but open X ← . . . ← W → . . . → Y is a proper non-causal path in in G. There can be two reasons why π is blocked in GXY : pbd G from X to Y that is not blocked by Z.  (1) The path starts with an edge X → D that does not exist pbd in GXY . Then we have D ∈ PCP(X, Y). For π to be non- Analogously to GX , by GX we denote a DAG obtained from causal, it would have to contain a collider C ∈ An(Z) ∩ G by removing all edges leaving X. De(D) ⊆ An(Z) ∩ Dpcp(X, Y). But because of (a), An(Z) ∩ Dpcp(X, Y) is empty. (2) A collider C on π is an ancestor 4.1 CONSTRUCTIVE BACK-DOOR CRITERION pbd of Z in G, but not in GXY . Then there must be a directed path from C to Z via an edge X → D with D ∈ An(Z) ∩ Definition 4.4 (Proper back-door graph). Let G = (V, E) PCP(X, Y), contradicting (a).  be a DAG, and X, Y ⊆ V be pairwise disjoint subsets of pbd variables. The proper back-door graph, denoted as GXY , is obtained from G by removing the first edge of every proper 4.2 ADJUSTING FOR MULTIPLE EXPOSURES causal path form X to Y. For a singleton set X = {X} of exposures we know that if Note the difference between the back-door graph GX and a set of variables Y is disjoint from {X} ∪ Pa(X) then one pbd obtains easily an adjustment set with respect to X and Y the proper back-door graph GXY : in GX all edges leaving as Z = Pa(X) (Pearl, 2009, Theorem 3.2.2). The situation pbd X are removed while in GXY only those that lie on a proper changes drastically if the effect of multiple exposures is es- pbd timated. Theorem 3.2.5 in Pearl (2009) claims that the ex- causal path. However, to construct GXY still only elemen- tary operations are sufficient. Indeed, we remove all edges pression for P(y | do(x)) is obtained by adjusting for Pa(X) X → D in E such that X ∈ X and D is in the subset, which if Y is disjoint from X ∪ Pa(X), but, as the DAG in Fig. 2 we call PCP(X, Y), obtained as follows: shows, this is not true: the set Z = Pa(X1 , X2 ) = {Z2 } is not an adjustment set according to {X1 , X2 } and Y. In PCP(X, Y) = (DeX (X) \ X) ∩ AnX (Y) (1) this case one can identify the causal effect by adjusting for Z = {Z1 , Z2 } only. Indeed, for more than one exposure, no where DeX (W) denotes descendants of W in GX . AnX (W) adjustment set may exist at all even without latent covari- is defined analogously for GX . Hence, the proper back-door ates and even though Y ∩ (X ∪ Pa(X)) = ∅, e.g. in the DAG graph can be constructed from G in linear time O(m + n). X1 X2 Z Y. Now we propose the following adjustment criterion. For short, we will denote the set De(PCP(X, Y)) as Dpcp(X, Y). Using our criterion, we can construct a simple adjustment set explicitly if one exists. For a DAG G = (V, E) we define Definition 4.5 (Constructive back-door criterion (CBC)). the set Let G = (V, E) be a DAG, and let X, Y, Z ⊆ V be pair- wise disjoint subsets of variables. The set Z satisfies the Adj(X, Y) = An(X ∪ Y) \ (X ∪ Y ∪ Dpcp(X, Y)). constructive back-door criterion relative to (X, Y) in G if Theorem 4.7. Let G = (V, E) be a DAG and let X, Y ⊆ V (a) Z ⊆ V \ Dpcp(X, Y) and be distinct node sets. Then the following statements are equivalent: (b) Z d-separates X and Y in the proper back-door graph pbd GXY . 1. There exists an adjustment in G w.r.t. X and Y. pbd G GX GXY T EST M INA DJ can be solved with an algorithm that itera- X1 Y1 X1 Y1 X1 Y1 tively removes nodes from Z and tests if the resulting set remains an adjustment set w.r.t. X and Y. This can be done Z1 Z1 Z1 in time O(n(n + m)). Alternatively, one can construct the pbd proper back-door graph GXY from G and test if Z is a min- Z2 Z2 Z2 imal d-separator, with Z ⊆ V \ Dpcp(X, Y) between X and Y. This can be computed in time O(n2 ). The correctness of X2 Y2 X2 Y2 X2 Y2 these algorithms follows from the proposition below, which is a generalization of the result in Tian et al. (1998). Proposition 4.9. If no single node Z can be removed from Figure 2: A DAG where for X = {X1 , X2 } and Y = {Y1 , Y2 }, an adjustment set Z such that the resulting set Z0 = Z \ Z Z = {Z1 , Z2 } is a valid and minimal adjustment, but no is no longer an adjustment set, then Z is minimal. set fulfills the back-door criterion (Pearl, 2009), and the parents of X are not a valid adjustment set either. The remaining problems like F INDA DJ, F IND M INA DJ etc. can be solved using corresponding algorithms for finding, 2. Adj(X, Y) is an adjustment w.r.t. X and Y. resp. listing m-separations applied for proper back-door graphs. Since the proper back-door graph can be con- 3. Adj(X, Y) d-separates X and Y in the proper back- structed in linear time the time complexities to solve the pbd door graph GXY . problems above are as listed in Table 1. Proof. The implication (3) ⇒ (2) follows directly from 5 ADJUSTMENT IN MAGS the criterion Def. 4.5 and the definition of Adj(X, Y). Since the implication (2) ⇒ (1) is obvious, it remains to prove We now generalize the results from the previous section (1) ⇒ (3). to MAGs. Two examples may illustrate why this gener- alization is not trivial. First, take G = X → Y. If G is Assume there exists an adjustment set Z0 w.r.t. X and Y. interpreted as a DAG, then the empty set is valid for adjust- From Theorem 4.6 we know that Z0 ∩ Dpcp(X, Y) = ∅ and ment. If G is however taken as a MAG, then there exists pbd that Z0 d-separates X and Y in GXY . Our task is to show no adjustment set as G represents among others the DAG pbd that Adj(X, Y) d-separates X and Y in GXY . This follows U X Y where U is an unobserved confounder. Sec- from Lemma 3.4 used for the proper back-door graph GXY pbd ond, take G = A → X → Y. In that case, the empty set if we take I = ∅, R = V \ (X ∪ Y ∪ Dpcp(X, Y)).  is an adjustment set regardless of whether G is interpreted as a DAG or a MAG. The reasons will become clear as we move on. First, let us recall the semantics of a MAG. The From Equation 1 and the definition Dpcp(X, Y) = following definition can easily be given for AGs in general, De(PCP(X, Y)) we then obtain immediately: but we do not need this generality for our purpose. Corollary 4.8. Given two distinct sets X, Y ⊆ V, Adj(X, Y) Definition 5.1 (DAG representation by MAGs (Richardson can be found in O(n + m) time. and Spirtes, 2002)). Let G = (V, E) be a DAG, and let S, L ⊆ V. The MAG M = G[LS is a graph with nodes 4.3 TESTING, COMPUTING, AND V \ (S ∪ L) and defined as follows. (1) Two nodes U and V ENUMERATING ADJUSTMENT SETS are adjacent in G[LS if they cannot be m-separated by any Z with S ⊆ Z ⊆ V \ L in G. (2) The edge between U and Using our criterion, every algorithm for m-separating sets V is Z between X and Y can be used for adjustment sets with respect to X and Y, by requiring that Z not contain any node in Dpcp(X, Y). This allows solving all problems U − V if U ∈ An(S ∪ V) and V ∈ An(S ∪ U); listed in Table 1 for adjustment sets in DAGs instead of m- U → V if U ∈ An(S ∪ V) and V < An(S ∪ U); separators. Below, we name those problems analogously as for m-separation, e.g. the problem to decide whether Z is U ↔ V if U < An(S ∪ V) and V < An(S ∪ U). an adjustment set w.r.t. X, Y is named T ESTA DJ in analogy to T EST S EP. We call L latent variables and S selection variables. We T ESTA DJ can be solved by testing if Z ∩ Dpcp(X, Y) = ∅ say there is selection bias if S , ∅. pbd and Z is a d-separator in the proper back-door graph GXY . Hence, every MAG represents an infinite set of underlying pbd Since GXY can be constructed from G in linear time, the DAGs that all share the same ancestral relationships. For a total time complexity of this algorithm is O(n + m). given MAG M, we can construct a represented DAG G by replacing every edge X − Y by a path X → S ← Y, and there is a collider path A ↔ V1 ↔ . . . ↔ Vn ↔ X or every edge X ↔ Y by X ← L → Y, where S and L are new A → V1 ↔ . . . ↔ Vn ↔ X where all Vi are parents of D. nodes; then M = G[LS where S and L are all new nodes. G is called the canonical DAG of M (Richardson and Spirtes, Definition 5.6. We call a MAG M = (V, E) adjustment 2002), which we write as C(M). amenable w.r.t. X, Y ⊆ V if all proper causal paths from X Lemma 5.2 (Preservation of separating sets (Richardson to Y start with a visible directed edge. and Spirtes, 2002)). Z m-separates X, Y in G[LS if and only Lemma 5.7. If a MAG M = (V, E) is not adjustment if Z ∪ S m-separates X, Y in G. amenable w.r.t. X, Y ⊆ V then there exists no adjustment set W for X, Y in M. We now extend the concept of adjustment to MAGs in the usual way (Maathuis and Colombo, 2013). Proof. If the first edge X → D on some causal path to Definition 5.3 (Adjustment in MAGs). Given a MAG M = Y in M is not visible, then there exists a consistent DAG (V, E) and two variable sets X, Y ⊆ V, Z ⊆ V is an adjust- G where there is a non-causal path between X and Y via ment set for X, Y in M if for every probability distribu- V that could only be blocked in M by conditioning on D tion p(v0 ) consistent with a DAG G = (V0 , E0 ) for which or some of its descendants. But such conditioning would G[LS = M for some S, L ⊆ V0 \ V, we have violate the adjustment criterion in G.  X p(y | do(x)) = p(y | x, z, s)p(z | s) . (2) 5.2 ADJUSTMENT CRITERION FOR MAGS z We now show that DAG adjustment criterion generalizes to Selection bias (i.e., S , ∅) substantially complicates ad- adjustment amenable MAGs. The adjustment criterion and justment, and in fact nonparametric causal inference in gen- the constructive back-door criterion are defined like their eral (Zhang, 2008)2 . Due to these limitations, we restrict DAG counterparts (Definitions 4.2 and 4.4), replacing d- ourselves to the case S = ∅ in the rest of this section. with m-separation for the latter. Note however that recovery from selection bias is some- Theorem 5.8. Given an adjustment amenable MAG M = times possible with additional population data, and graphi- (V, E) and three disjoint node sets X, Y, Z ⊆ V, the follow- cal conditions exist to identify such cases (Barenboim et al., ing statements are equivalent: 2014). (i) Z is an adjustment relative to X, Y in M. 5.1 ADJUSTMENT AMENABILITY (ii) Z fulfills the adjustment criterion (AC) w.r.t. (X, Y) in In this section we first identify a class of MAGs in which M. adjustment is impossible because of causal ambiguities – e.g., the simple MAG X → Y falls into this class, but the (iii) Z fulfills the constructive backdoor criterion (CBC) larger MAG A → X → Y does not. w.r.t. (X, Y) in M. Definition 5.4 (Visible edge (Zhang, 2008)). Given a MAG M = (V, E), an edge X → Y ∈ E is called visible if in all Proof. The equivalence of (ii) and (iii) is established by DAGs G = (V0 , E0 ) with G[LS = M for some S, L ⊆ V0 , all observing that the proof of Theorem 4.6 generalizes to m- d-connected walks between X and Y in G that contain only separation. Below we establish equivalence of (i) and (ii). nodes of S ∪ L ∪ X ∪ Y are directed paths. ¬(ii) ⇒ ¬(i): If Z violates the adjustment criterion in M, Intuitively, an invisible directed edge X → Y means that it does so in the canonical DAG C(M), and thus is not an there may still hidden confounding factors between X and adjustment in M. Y, which is guaranteed not to be the case if the edge is ¬(i) ⇒ ¬(ii): Let G be a DAG with G[L∅ = M in which Z visible. violates the AC. We show that (a) if Z ∩ Dpcp(X, Y) , ∅ in Lemma 5.5 (Graphical conditions for edge visibility G then Z ∩ Dpcp(X, Y) , ∅ in M as well, or there exists (Zhang, 2008)). In a MAG M = (V, E), an edge X → D a proper non-causal path in M that cannot be m-separated; is visible if and only if there is a node A not adjacent and (b) if Z ∩ Dpcp(X, Y) = ∅ in G and Z d-connects a to D where (1) A → X ∈ E or A ↔ X ∈ E, or (2) proper non-causal path in G, then it m-connects a proper 2 A counterexample is the graph A ← X → Y, where we can non-causal path in M. safely assume that A is the ancestor of a selection variable. A (a) Suppose that in G, Z contains a node Z in Dpcp(X, Y), sufficient and necessary condition for adjustment under selection bias is Y y S | X (Barenboim et al., 2014), which is so restrictive and let W = PCP(X, Y)∩An(Z). If M still contains at least that most statisticians would probably not even speak of “selec- one node W1 ∈ W, then W1 lies on a proper causal path tion bias” anymore in such a case. in M and Z is a descendant of W1 in M. Otherwise, M 1 ,L2 } DAG G MAG M = G[W ∅ 1 DAG G MAG M = G[{L ∅ X Y X Y L1 Z L2 Z Z W1 W2 Z W2 X Y X Y A A Figure 3: Illustration of the case in the proof of Theorem 5.8 where Z descends from W1 which in a DAG G is on a Figure 4: Case (b) in the proof of Theorem 5.8: A proper proper causal path from X to Y, but is not a descendant of non-causal path wG = X ← L1 → Z ← Ls → Y in a a node on a proper causal path from X to Y in the MAG M DAG is d-connected by Z, but the corresponding proper after marginalizing W1 . In such cases, conditioning on Z non-casual path wM = X ← Z → Y is not m-connected in will m-connect X and Y in M via a proper non-causal path. the MAG, and its m-connected subpath w0M = X → Y is proper causal. However, this also renders the edge X → Y invisible, because otherwise A could be m-separated from must contain a node W2 ∈ PCPG (X, Y) \ An(Z) (possibly Y by U = {X, Z} in M but not in G. W2 ∈ Y) such that W2 ↔ A, X → W2 , and X → A are edges in M, where A ∈ An(Z) (possibly A = Z; see Fig. 3). Then M contains an m-connected proper non-causal path arrow out of X, and must contain a collider Z ∈ De(X) X → A ↔ W → W2 → . . . → Y. because wG is not causal. (a) Z ∈ De(D). This contra- dicts Z ∩ Dpcp(X, Y) = ∅. So (b) Z < De(D). Then (b) Suppose that in G, Z ∩ Dpcp(X, Y) = ∅, and there exists by construction of w0M (Lemma A.9), wM must start with an open proper non-causal path from X to Y. Then there an inducing Z-trail X → Z, Z1 , . . . , Zn , D, which is also must then also be a proper non-causal walk wG from some an inducing path from X to D in G w.r.t. ∅, L. Then X ∈ X to some Y ∈ Y (Lemma A.1), which is d-connected Z, Z1 , . . . , Zn , D must also be an inducing path in G w.r.t. by Z in G. Let wM denote the subsequence of wG formed ∅, L because An(X) ⊆ An(Z). Hence Z and D are adjacent. by nodes in M, which includes all colliders on wG . The We distinguish cases on the path X → Z, D in M. (i) If sequence wM is a path in M, but is not necessarily m- X → Z → D, then Z lies on a proper causal path, con- connected by Z; all colliders on wM are in Z because every tradicting Z ∩ Dpcp(X, Y) = ∅. (ii) If X → Z ↔ D, or non-Z must be a parent of at least one of its neighbours, but X → Z ← D, then we get an m-connected proper non- there can subsequences U, Z1 , . . . , Zk , V on wM where all causal walk along Z and D.  Zi ∈ Z but some of the Zi are not colliders on wM . How- ever, then we can form from wM an m-connected walk by bypassing some sequences of Z-nodes (Lemma A.9). Let w0M be the resulting walk. 5.3 ADJUSTMENT SET CONSTRUCTION If w0M is a proper non-causal walk, then there must also ex- In the previous section, we have already shown that the ist a proper non-causal path in M (Lemma A.1), violating CBC is equivalent to the AC for MAGs as well; hence, ad- the AC. It therefore remains to show that w0M is not a proper justment sets for a given MAG M can be found by forming causal path. This must be the case if wG does not contain pbd the proper back-door graph MXY and then applying the al- colliders, because then the first edge of wM = w0M cannot gorithms from the previous section. In principle, care must be a visible directed edge out of X. Otherwise, the only be taken when removing edges from MAGs as the result way for w0M to be proper causal is if all Z-nodes in wM might not be a MAG; however, this is not the case when have been bypassed in w0M by edges pointing away from removing only directed edges. X. In that case, one can show by several case distinctions that the first edge X → D of w0M , where D < Z, cannot be Lemma 5.9 (Closure of maximality under removal of di- visible (see Figure 4 for an example of such a case). rected edges). Given a MAG M, every graph M0 formed by removing only directed edges from M is also a MAG. For simplicity, assume that M contains a subpath A → X → D where A is not adjacent to D; the other cases of edge visibility like A ↔ X → D (Lemma 5.5). are Proof. Suppose the converse, i.e. M is no longer a MAG treated analogously. In G, there are inducing paths (pos- after removal of some edge X → D. Then X and D cannot sibly several) πAX from A to X and πXD from X to D be m-separated even after the edge is removed because X w.r.t ∅, L; πAX must have an arrowhead at X. We dis- and D are collider connected via a path whose nodes are all tinguish several cases on the shape of πXD . (1) A path ancestors of X or D (Richardson and Spirtes, 2002). The πXD has an arrowhead at X as well. Then A, D are adja- last edge on this path must be C ↔ D or C ← D, hence C < cent (Lemma A.13), a contradiction. (2) No inducing path An(D), and thus we must have C ∈ An(X). But then we get πXD has an arrowhead at X. Then wG must start with an C ∈ An(D) in M via the edge X → V, a contradiction.  Corollary 5.10. For every MAG M, the proper back-door References pbd graph MXY is also a MAG. Silvia Acid and Luis M. de Campos. An algorithm for find- ing minimum d-separating sets in belief networks. In For MAGs that are not adjustment amenable, the CBC Proceedings of UAI 1996, pages 3–10, 1996. might falsely indicate that an adjustment set exists even Silvia Acid and Luis M. de Campos. Searching for though that set may not be valid for some represented bayesian network structures in the space of restricted graph. Fortunately, adjustment amenability is easily tested acyclic partially directed graphs. Journal of Artificial using the graphical criteria of Lemma 5.5. For each child Intelligence Research (JAIR), 18:445–490, 2003. D of X in PCP(X, Y), we can test the visibility of all edges X → D simultaneously using depth first search. This Elias Barenboim, Jin Tian, and Judea Pearl. Recovering means that we can check all potentially problematic edges from selection bias in causal and statistical inference. In in time O(n + m). If all tests pass, we are licensed to apply Proceedings of AAAI-14, 2014. the CBC, as shown above. Hence, we can solve all algo- Thomas H. Cormen, Charles E. Leiserson, Ronald L. rithmic tasks in Table 1 for MAGs in the same way as for Rivest, and Clifford Stein. Introduction to Algorithms, DAGs after an O(k(n + m)) check of adjustment amenabil- Second Edition. The MIT Press, 2nd edition, September ity, where k ≤ |Ch(X)|. 2001. ISBN 0262032937. Felix Elwert. Graphical Causal Models, pages 245–273. Handbooks of Sociology and Social Research. Springer, 2013. 6 DISCUSSION Adam Glynn and Konstantin Kashin. Front-door ver- sus back-door adjustment with unmeasured confound- We have compiled efficient algorithms for solving several ing: Bias formulas for front-door and hybrid adjust- tasks related to m-separators in ancestral graphs, and ap- ments. Technical report, Harvard University, 2013. plied those together with a new, constructive adjustment Sander Greenland. Hierarchical regression for epidemi- criterion to provide a complete and informative answer to ologic analyses of multiple exposures. Environmental the question when, and how, a desired causal effect can be Health Perspectives, 102 Suppl 8:33–39, Nov 1994. estimated by covariate adjustment. Our results fully gener- alize to MAGs in the absence of selection bias. One may ar- Marloes H. Maathuis and Diego Colombo. A generalized gue that the MAG result is more useful for exploratory ap- backdoor criterion. arXiv:1307.5636, 2013. plications (inferring a graph from data) than confirmatory Judea Pearl. Causality. Cambridge University Press, 2009. ones (drawing a graph based on theory), as researchers will ISBN 0-521-77362-8. prefer drawing DAGs instead of MAGs due to the easier Thomas Richardson and Peter Spirtes. Ancestral graph causal interpretation of the former. Nevertheless, in such markov models. Annals of Statistics, 30:927–1223, settings the results can provide a means to construct more 2002. “robust” adjustment sets: If there are several options for co- variate adjustment in a DAG, then one can by interpreting Kenneth J. Rothman, Sander Greenland, and Timothy L. the same graph as a MAG possibly generate an adjustment Lash. Modern Epidemiology. Wolters Kluwer, 2008. set that is provably valid for a much larger class of DAGs. ISBN 0781755646. This might partially address the typical criticism that com- Ross D. Shachter. Bayes-ball: The rational pastime. In plete knowledge of the causal structure is unrealistic. Proceedings of UAI 1998, pages 480–487, 1998. Our adjustment criterion generalizes the work of Shpitser Ilya Shpitser. Appendum to on the validity of covariate et al. (2010) to MAGs and therefore now completely char- adjustment for estimating causal effects, 2012. unpub- acterizes when causal effects are estimable by covariate ad- lished manuscript. justment in the presence of unmeasured confounders with Ilya Shpitser, Tyler VanderWeele, and James Robins. On multivariate exposures and outcomes. This also general- the validity of covariate adjustment for estimating causal izes recent work by Maathuis and Colombo (2013) who effects. In Proceedings of UAI 2010, pages 527–536. provide a criterion which, for DAGs and MAGs without AUAI Press, 2010. selection bias, is stronger than the back-door criterion but weaker than ours. They moreover show their criterion to Ken Takata. Space-optimal, backtracking algorithms to list hold also for CPDAGs and PAGs, which represent equiva- the minimal vertex separators of a graph. Discrete Ap- lence classes of DAGs and MAGs as they are constructed plied Mathematics, 158:1660–1667, 2010. by causal discovery algorithms. It is possible that the con- Johannes Textor and Maciej Liśkiewicz. Adjustment crite- structive back-door criterion could be generalized further ria in causal diagrams: An algorithmic perspective. In to those cases, which we leave for future work. Proceedings of UAI, pages 681–688, 2011. Jin Tian, Azaria Paz, and Judea Pearl. Finding min- . . . → Zi ← . . . ← Ci for every collider Ci on π and a imal d-separators. Technical Report R-254, Uni- corresponding Zi ∈ Z. Since no edges are removed from versity of California, Los Angeles, 1998. URL π, w is non-causal, but not necessarily proper, since the ftp.cs.ucla.edu/pub/stat_ser/r254.pdf. inserted walks might contain nodes of X. However, in that Tyler J. VanderWeele. On the relative nature of overadjust- case, w can be truncated to a proper walk w0 starting at ment and unnecessary adjustment. Epidemiology, 20(4): the last node of X on w. Then w0 is non-causal, since it 496–499, Jul 2009. contains the subpath X ← . . . ← Ci .  Jiji Zhang. Causal reasoning with ancestral graphs. Journal In all of the below, G = (V, E) is a DAG, Z, L ⊆ V are of Machine Learning Research, 9:1437–1474, 2008. disjoint, and M = G[L∅ . Definition A.2 (Inducing path (Richardson and Spirtes, 2002)). A path π = V1 , . . . , Vn+1 is called inducing with A APPENDIX respect to Z, L if all non-colliders on π except V1 and Vn+1 are in L, and all colliders on π are in An({V1 , Vn+1 } ∪ Z). A.1 AUXILIARY LEMMAS AND PROOFS Every inducing path w.r.t. Z, L is m-connected by Z. In this section, we prove Lemma 3.4 and several auxiliary Lemmas that are necessary for the proof of Theorem 5.8. Lemma A.3 (Richardson and Spirtes (2002)). If there is an inducing path w from U ∈ V to V ∈ V with respect to Z, L, then there exists no set Z0 with Z ⊆ Z0 ⊆ (V \ L) such Proof of Lemma 3.4. Let us consider a proper walk w = that Z0 d-separates U and V in G or m-separates U and V X, V1 , . . . , Vn , Y with X ∈ X, Y ∈ Y. If w does not con- in G[L∅ . tain a collider, all nodes Vi are in Ant(X ∪ Y) and the walk is blocked by Z, unless {V1 , . . . , Vn } ∩ R = ∅ in which Proof. This is Theorem 4.2, cases (v) and (vi), in Richard- case the walk is not blocked by Z0 either. If the walk son and Spirtes (2002).  contains colliders C, it is blocked, unless C ⊆ Z ⊆ R. Then all nodes Vi are in Ant(X ∪ Y ∪ I) and the walk is Lemma A.4. Two nodes U, V are adjacent in G[L∅ if and blocked, unless {V1 , . . . , Vn } ∩ R = C. Since C ⊆ Z is a only if G contains an inducing path π between U and V set of anteriors, there exists a shortest (possible containing with respect to ∅, L. Moreover, the edge between U, V in 0 edges) path π j = V j → . . . → W j for each V j ∈ C with G[L∅ can only have an arrowhead at U (V) if all such π W j ∈ X ∪ Y ∪ I (it cannot contain an undirected edge, since have an arrowhead at U (V) in G. there is an arrow pointing to V j ). Let π0j = V j → . . . → W 0j be the shortest subpath of π j that is not blocked by Z0 . Proof. The first part on adjacency is proved in (Richardson Let w0 be the walk w after replacing each V j by the walk and Spirtes, 2002). For the second part on arrowheads, sup- V j → . . . → W 0j ← . . . ← V j . If any of the W j is in pose π does not have an arrowhead at U, then π starts with X ∪ Y we truncate the walk, such that we get the shortest an edge U → D. Hence D < An(U), so D ∈ An(V) be- walk between nodes of X and Y. Since π0j is not blocked, cause π is an inducing path and therefore also U ∈ An(V). w0 contains no colliders except w0j and all other nodes of w0 Hence, the edge between U and V in G[L∅ must be U → V. are not in R, w0 is not blocked and Z0 is not a separator.  The argument for V is identical.  Lemma A.1. Given a DAG G and sets X, Y, Z ⊆ V satisfy- Lemma A.5. Suppose Z0 , Z1 , Z2 is a path in G[L∅ on which ing Z∩Dpcp(X, Y) = ∅, Z m-connects a proper non-causal Z1 is a non-collider. Suppose an inducing path π01 from Z0 path between X and Y if and only if it m-connects a proper to Z1 w.r.t. ∅, L in G has an arrowhead at Z1 , and an in- non-causal walk between X and Y. ducing path π12 from Z1 to Z2 w.r.t. ∅, L has an arrowhead at Z1 . Then the walk w012 = π01 π12 can be truncated to an inducing path from Z0 to Z2 w.r.t. ∅, L in G. Proof. ⇐: Let w be the m-connected proper non-causal walk. It can be transformed to an m-connected path π by Proof. The walk w012 does not contain more non-colliders removing loops of nodes that are visited multiple times. than those on π01 or π12 , so they must all be in L. It re- Since no nodes have been added, π remains proper, and mains to show that the colliders on w012 are in An(Z0 ∪ Z2 ). the first edges of π and w are the same. So if w does not Because Z1 is not a collider on Z0 , Z1 , Z2 , at least one of start with a → edge, π is non-causal. If w starts with an the edges Z0 , Z1 and Z1 , Z2 must be a directed edge point- edge X → D, there exists a collider with a descendant in Z ing away from Z1 . Assume without loss of generality that which is in De(D). So π has to be non-causal, or it would Z0 ← Z1 is that edge. Then all colliders on π01 are in contradict Z ∩ Dpcp(X, Y) = ∅. An(Z0 ∪ Z1 ) = An(Z0 ) ⊆ An(Z0 ∪ Z2 ), and all colliders on ⇒: Let π be an m-connected proper non-causal path. It can π12 are in An(Z1 ∪ Z2 ) ⊆ An(Z0 ∪ Z2 ). Z1 itself is a col- be changed to an m-connected walk w by inserting Ci → lider on w012 and is also in An(Z0 ). Hence, the walk w012 is d-connected, and can be truncated to an inducing path Corollary A.10. Each edge on w0M as defined above cor- that starts with the first arrow of π01 and ends with the last responds to an inducing path w.r.t ∅, L in G along nodes on arrow of π12 .  wG . Lemma A.11. Suppose there exists an inducing path π01 Definition A.6 (Inducing Z-trail). Let π = V1 , . . . , Vn+1 from Z0 to Z1 w.r.t. S, L with an arrowhead at Z1 and an be a path in G[L∅ such that V2 , . . . , Vn ∈ Z, V1 , Vn+1 < Z, inducing path from Z1 to Z2 w.r.t. S0 , L with an arrowhead and for each i ∈ {1, . . . , n}, there is an inducing path w.r.t. at Z1 . Then the walk w012 = π01 π12 can be truncated to an ∅, L linking Vi , Vi+1 that has an arrowhead at Vi (Vi+1 ) if inducing path from Z0 to Z2 w.r.t. S ∪ S0 ∪ {Z1 }, L in G. Vi ∈ Z (Vi+1 ∈ Z). Then π is called an inducing Z-trail. Lemma A.7. Let π = V1 , . . . , Vn+1 be an inducing Z-trail, Proof. The walk w012 does not contain more non-colliders and let π0 be a subsequence of π formed by removing one than those on π01 or π12 , so they must all be in L. node Vi of π such that Vi ∈ Z is a non-collider on π. Then All colliders on π0,1 and π1,2 as well as Z1 are in π0 is an inducing Z-trail. An(Z0 , Z1 , Z2 , S, S0 ), and therefore also all colliders of w012 . Proof. According to Lemma A.5, if Vi is a non-collider on Hence, the walk w012 is d-connected, and can be truncated π, then Vi−1 and Vi+1 are linked by an inducing path π that to an inducing path that starts with the first arrow of π01 contains an arrowhead at Vi−1 (Vi+1 ) if Vi−1 ∈ Z (Vi+1 ∈ and ends with the last arrow of π12 .  Z). Therefore, Vi−1 and Vi+1 are themselves adjacent, π0 is a path, and is a Z-trail.  Lemma A.12. Suppose Z0 , Z1 , . . . , Zk+1 is a path in G[L∅ with an arrowhead at Zk+1 on which all Z1 , . . . , Zk are col- Corollary A.8. Every inducing Z-trail π = V1 , . . . , Vn+1 liders. Then there exists an inducing path from Z0 to Zk+1 has a subpath π0 that is m-connected by Z. w.r.t. {Z1 , . . . , Zk }, L with an arrowhead at Zk+1 . Proof. Transform π into π0 by replacing non-collider Proof. Because all Zi , Zi+1 are adjacent and all Z1 , . . . , Zk nodes in Z by the direct edge linking their neighbours un- are colliders there exist inducing paths πi,i+1 w.r.t. ∅, L from til no such node exists anymore. By inductively applying Zi to Zi+1 that have arrowheads at Z1 , . . . , Zk (Lemma A.4). Lemma A.7, we see that π0 is also an inducing Z-trail, and The claim follows by repeatedly applying Lemma A.11 to every node in Z is a collider because otherwise we would the πi,i+1 ’s.  have continued transforming. So π0 must be m-connected by Z.  Lemma A.13. Suppose A → V1 ↔ . . . ↔ Vk ↔ X → D or A ↔ V1 ↔ . . . ↔ Vk ↔ X → D is a path in Lemma A.9. Let wG be a walk from X to Y in G, X, Y < L, G[L∅ (possibly k = 0), each Vi is a parent of D and there that is d-connected by Z. Let wM = V1 , . . . , Vn+1 be the exists an inducing path πXD from X to D w.r.t ∅, L that has subsequence of wG consisting only of the nodes in M = arrowheads on both ends. Then A and D cannot be m- G[L∅ . Then Z m-connects X and Y in M via a path along a separated in G[L∅ . subsequence w0M formed from wM by removing some nodes in Z (possibly w0M = wM ). Proof. Assume the path is A → V1 ↔ . . . ↔ Vk ↔ X → D. The case where the path starts with A ↔ V1 can be Proof. First, truncate from wM all subwalks between handled identically, since the first arrowhead does not af- nodes in Z that occur more than once. Now consider fect m-separation. all subsequences V1 , . . . , Vn+1 , n > 1, of wM where Assume A and D can be m-separated in G[L∅ , and let Z V2 , . . . , Vn ∈ Z, V1 , Vn+1 < Z, which now are all paths be such a separator. If V1 is not in Z then the path A → in wM . On those subsequences, every Vi must be adjacent V1 → D is not blocked, so V1 ∈ Z. Inductively it follows, in G to Vi+1 via a path containing no colliders, and all non- if Vi is not in Z, but all ∀j < i : V j ∈ Z then the path endpoints on that path must be in L. So there are inducing A → V1 ↔ . . . ↔ Vi−1 ↔ Vi → D is not blocked, so paths w.r.t. ∅, L between all Vi , Vi+1 , which have arrow- Vi ∈ Z for all i. heads at Vi (Vi+1 ) if Vi ∈ Z (Vi+1 ∈ Z). So V1 , . . . , Vn+1 is an inducing Z-trail, and has a subpath which m-connects There exist an inducing path πAX from A to X with an ar- V1 , Vn+1 given Z. Transform wM to w0M by replacing all rowhead at X w.r.t. to {V1 , . . . , Vk }, L (Lemma A.12) which inducing Z-trails by their m-connected subpaths. Accord- can be combined with πXD to an inducing path from A to ing to Lemma A.4, non-colliders on wM cannot be collid- D w.r.t. to {V1 , . . . , Vk , X}, L (Lemma A.11). ers on w0M , as bypassing inducing paths can remove but Hence no m-separator of A, D can contain {X, V1 , . . . , Vk } not create arrowheads. Moreover, all nodes in Z on w0M are (Lemma A.3). Then there cannot exist an m-separator, be- colliders. Hence w0M is m-connected by Z.  cause every separator must include V1 , . . . , Vk and the path A → V1 ↔ V2 ↔ . . . ↔ Vk ↔ X → D is open without function F IND S EP(G, X, Y, I, R) X ∈ Z.  R0 ← R \ (X ∪ Y) Z ← Ant(X, Y, I) ∩ R0 if T EST S EP(G, X, Y, Z) then A.2 ALGORITHMS return Z else This section contains algorithm pseudocodes and parts of return ⊥ their correctness proofs that were omitted from the main text for space reasons. Figure 7: FindSep A.2.1 TESTING A.2.3 FINDING A MINIMAL M-SEPARATOR For a given ancestral graph G the problem T EST S EP can For a given AG G the problem F IND M IN S EP can be solved with a modified Bayes-Ball algorithm in time be solved with algorithm 8 F IND M IN S EP NAIVE in O(n+m). In the algorithm every bi-directed edge A ↔ B is O(|Ant(X ∪ Y)||EAn |) = O(n(n + m)) or algorithm 9 considered as a pair of edges A ← · → B and an undirected F IND M IN S EP M ORAL in O(|Em |) = O(n2 ) time. An edge A − B as a directed edge pointing to the currently vis- ited node. function F IND M IN S EP NAIVE(G, X, Y, I, R) G0 ← GAnt(X∪Y∪I) function T EST S EP(G, X, Y, Z) Z ← R ∩ Ant(X ∪ Y ∪ I) Run Bayes-Ball from X if not T EST S EP(G0 , X, Y, Z) then return (Y not reachable) return ⊥ Figure 5: TestSep for all U in Z \ I do if T EST S EP(G0 , X, Y, Z \ {U}) then The problem T EST M IN S EP can be solved using Algorithm Z ← Z \ {U} 6 T EST M IN S EP in O(|Em An |) = O(n2 ) time. Alternatively, return Z the problem can be solved with an algorithm that iteratively Figure 8: FindMinSepNaive removes from Z nodes and tests if the resulting set remains an m-separator. This can be done in time O(n(n + m)). Algorithm 8 F IND M IN S EP NAIVE depends on an implicit The correctness of the algorithms for T EST M IN S EP can moral graph and the fact that in an undirected graph every be shown by generalizing the results presented in (Tian node that cannot be removed from a separating set has to et al., 1998) for m-separation. 6 T EST M IN S EP, runs in be in separating subsets, and runs in O(|Ant(X ∪ Y)||EAn |). O(|EmAn |) because Rx and R y can be computed with an ordi- nary search that aborts when a node in Z is reached. function F IND M IN S EP M ORAL(G, X, Y, I, R) G0 ← GAnt(X∪Y∪I) function T EST M IN S EP(G, X, Y, Z) G0a ← GaAnt(X∪Y∪I) if Z \ Ant(X ∪ Y) , ∅ then return false Z0 ← R ∩ Ant(X ∪ Y) if not T EST S EP(G, X, Y, Z) then Remove from G0a all nodes of I return false if not T EST S EP(G0 , X, Y, Z) then G0a ← GaAnt(X∪Y) return ⊥ Rx ← {Z ∈ Z | ∃ path X − Z in G0a Run BFS from X. Whenever a node in Z0 is met, not intersecting Z \ {Z}} mark it, if it is not already marked and do not continue if Z * Rx then return false along the path. When BFS stops, let Z00 be the set of all R y ← {Z ∈ Z | ∃ path Y − Z in G0a marked nodes. Remove all markings not intersecting Z \ {Z}} Run BFS from Y. Whenever a node in Z00 is met, if Z * R y then return false mark it, if it is not already marked and do not continue return true along the path. When BFS stops, let Z be the set of all marked nodes. Figure 6: TestMinSep return Z ∪ I Figure 9: FindMinSepMoral A.2.2 FINDING AN M-SEPARATOR Algorithm 9 F IND M IN S EP M ORAL begins with the sep- The problem can be solved using Algorithm 7 F IND S EP arating set R ∩ Ant(X ∪ Y) and finds a subset satisfy- in O(n + m) time. The correctness follows directly from ing the conditions tested by algorithm 6 T EST M IN S EP, in Lemma 3.4. O(|EmAn |). A.2.4 FINDING A MINIMUM COST A.2.6 TESTING FOR ADJUSTMENT M-SEPARATOR AMENABILITY The problem M IN C OST S EP can be solved with algorithm Let N(V) denote all nodes adjacent to V, and Sp(V) denote 10 F IND M IN C OST S EP in O(n3 ). all spouses of V, i.e., nodes W such that W ↔ V ∈ E. The adjustment amenability of a graph G w.r.t sets X, Y can be tested with the following algorithm: function F IND M IN C OST S EP(G, X, Y, I, R, w) G0 ← GAnt(X∪Y∪I) function T ESTA DJUSTMENTA MENABILITY(G, X, Y) G0a ← GaAnt(X∪Y∪I) for all D in Ch(X) ∩ PCP(X, Y) do Add a node Xm connected to all nodes in C←∅ X, and a node Ym connected to all nodes A←∅ in Y. function CHECK(V) Assign infinite cost to all nodes in if C[V] then return A[V] X ∪ Y ∪ (V \ R) and cost w(Z) to every C[V] ← true other node Z. A[V] ← ((Pa(V) ∪ Sp(V)) \ N(D) , ∅) Remove all nodes of I from G0a . for all W ∈ Sp(V) ∩ Pa(D) do Change the graph to a flow network as if CHECK(W) then A[V] ← true described in Cormen et al. (2001) and return a return A[V] minimum cutset Z. for all X in X ∩ Pa(D) do Figure 10: FindMinCostSep if ¬CHECK(X) then return false The correctness without I follows from the fact that a min- Figure 12: TestAdjustmentAmenability imum set is a minimal set and the minimal cut found in the ancestor moral graph is therefore the minimal separating The algorithm checks for every edge X → D on a proper set. The handling of I is shown in Acid and de Campos causal path to Y whether it satisfies the amenability condi- (1996). tions of Lemma 5.5 by searching a collider path through the parents of D to a node Z not connected to D; note that con- dition (1) of Lemma 5.5 is identical to condition (2) with an A.2.5 ENUMERATING ALL MINIMAL empty collider path. Since CHECK performs a depth-first- M-SEPARATORS search by checking every node only once and then contin- uing to its neighbors, each iteration of the outer for-loop in The problem L IST M IN S EP can be solved with algorithm the algorithm runs in linear time O(n + m). Therefore, the 11 L IST M IN S EP with O(n3 ) delay between every out- entire algorithm runs in O(k(n + m)) where k ≤ |Ch(X)|. putted Z. function L IST M IN S EP(G, X, Y, I, R) G0 ← GAnt(X∪Y∪I) G0a ← GaAnt(X∪Y∪I) Add a node Xm connected to all X nodes. Add a node Ym connected to all Y nodes. Remove all nodes of I. Remove all nodes of V \ R, but insert additional edges connecting the neighbours. of all removed nodes. Use the algorithm in Takata (2010) to list all sets separating Xm and Ym . Figure 11: ListMinSep The correctness is shown by Textor and Liśkiewicz (2011) for adjustment sets and generalizes directly to m- separators, because after moralization, both problems are equivalent to enumerating vertex cuts of an undirected graph. The handling of I is shown by Acid and de Cam- pos (1996).