=Paper= {{Paper |id=Vol-1274/paper8 |storemode=property |title=Toward Learning Graphical and Causal Process Models |pdfUrl=https://ceur-ws.org/Vol-1274/uai2014ci_paper8.pdf |volume=Vol-1274 |dblpUrl=https://dblp.org/rec/conf/uai/Meek14 }} ==Toward Learning Graphical and Causal Process Models== https://ceur-ws.org/Vol-1274/uai2014ci_paper8.pdf
           Toward Learning Graphical and Causal Process Models



                                                Christopher Meek
                                                 Microsoft Research
                                                 One Microsoft Way
                                                Redmond, WA 98052
                                                meek@microsoft.com


                      Abstract                               graphs called δ*separation which plays an analogous
                                                             role as d-separation in the work of Verma and Pearl
    We describe an approach to learning causal               (1990) and Spirtes, Glymour, and Scheines (2001).
    models that leverages temporal information.              Another key ingredient is the process independence
    We posit the existence of a graphical de-                statement that plays an analogous role to the inde-
    scription of a causal process that generates             pendence statement. Conceptually, we assume that
    observations through time. We explore as-                we can test whether a process independence state-
    sumptions connecting the graphical descrip-              ments about observable quantities holds by observing
    tion with the statistical process and what one           the process and that these observation provide insight
    can infer about the causal structure of the              into the causal structure governing the process. In
    process under these assumptions.                         particular, we posit the existence of a graphical de-
                                                             scription of a causal process and make assumptions
                                                             that connect δ*separation with observable process in-
1    Introduction                                            dependence statements. We explore what can be in-
                                                             ferred about the causal structure of the process un-
Data that measure the temporal dynamics of systems           der various observability assumptions. While the ulti-
is pervasive. The goal of this paper is to describe an       mate goal is to create a sound and complete method
approach to the development of a sound approach to           for causal inference for observations from a stochas-
causal inference for dynamic systems. One of the pop-        tic dynamic system, this paper represents some initial
ular extant approaches is Granger causality (Granger         steps towards this ultimate goal. In particular, the re-
1969) which fails to be sound in the presence of la-         sults in Section 3.2 can be viewed sufficient conditions
tent variables. Granger causality is typically applied in    for Granger causality and, in Section 3.3, we present
discrete-time continuous valued time-series. Roughly         sufficient conditions under which we can make sound
speaking, in a multivariate time series X a set of vari-     inferences about causal relationships including the ex-
ables are the Granger-causes of Xj if the historical val-    istence of causal relationships and the existence and
ues of this set of variables (including Xj ) are necessary   non-existence of latent common causal relationships.
and sufficient for optimal prediction. Unfortunately a
                                                             As presented in Section 3, our causal discovery algo-
variable deemed a Granger-cause can arise due to ei-
                                                             rithm assumes the existence of an oracle for process in-
ther a latent common cause or as a result of a direct
                                                             dependence statements. Our approach of abstracting
causal relationship and thus the approach cannot be
                                                             away the details of how one connects process indepen-
used to determine causal relationships if one does not
                                                             dence statements with particular statistical processes
exclude the possibility of latent variables.
                                                             allows us to simultaneously make progress on the
In this paper, we explore how one can leverage the           causal discovery problem for multiple distinct statis-
assumption that causes must precede effects to in-           tical processes such as marked point processes, Gaus-
form causal conclusions drawn from observations of           sian processes and dynamic Bayesian networks. In Sec-
a temporal statistical process. The approach taken           tion 4, we discuss two particular statistical processes
here is similar to the approach developed by Verma           and their associated process independence statements.
and Pearl (1990) and Spirtes, Glymour, and Scheines          In Section 5, we discuss some related work and open
(2001) for atemporal causal discovery. One key in-           research questions.
gredient in our approach is a new asymmetric graph-
ical separation criterion for directed (possibly cyclic)
2    Graphical Separation                                          separation for directed acyclic graphs, we would like
                                                                   a graphical separation criterion for directed graphs to
                                                                   answer questions about how past observations influ-
We use G = hL, Ei to denote a directed graph where                 ence future observations. Due in part to the fact that
L is a set of vertices and E ⊆ hL × Li is a set of edges           a directed graph does not explicitly encode temporal
represented as ordered pairs. We write a → b if ha, bi ∈           information we cannot simply apply d*separation on
E and say that a is a parent of b and b is the child of            the directed graph. Instead, we define δ*separation
a. Note that, in addition to allowing cycles, we also              which extends the graphical δ-separation of Didelez
allow that a vertex can be its own parent and child                (2008) to handle self-edges. For sets A, B, C ⊂ L
(i.e., a self-edge a → a). We use the shorthand a ↔ b              where A ∩ C = ∅ we say that that B is δ*separated
to indicate that a → b and b → a.                                  from A given C (or simply δ(A, C, B)) in G if an
A path in G is a sequence hl1 , . . . , ln i where there is        only if B is d-separated from A given C in the B-
an edge between successive pairs of vertices in G. The             historical dependency graph G B where G B = L, E B
length of a path p = hl1 , . . . , ln i is |p| = n and a path      and E B = E\{hb, ai ∈ E|b ∈ B, a 6= b}. Note that
p is termed a trivial path if |p| = 1. A vertex li on              δ*separation is not symmetric in the first and third
path p = hl1 , . . . , ln i is a collider on p if li−1 → li and    arguments due to the use of the graph G B .
li ← li+1 and a non-collider otherwise. A directed path
in graph G = hL, Ei is a sequence of vertices hl1 , . . . , ln i
such that hli , li+1 i ∈ E. The source of a directed path
                                                                   3   Learning the Structure of a Causal
is the first vertex in the path. We denote the set of                  Process
ancestors for a set A by An(A). The ancestor relation
is reflexive and thus A ⊆ An(A).                                   Our aim is to connect statistical processes with causal
                                                                   graphs and to learn the causal graph governing a sys-
We define a graph separation criterion called
                                                                   tem of observed events. We assume that there is a
d*separation for directed graphs which is an exten-
                                                                   statistical process governing what and when events
sion of d-separation (Pearl 1988). An extension of d-
                                                                   happen. We denote a statistical process for a set
separation is required as a pure vertex separation cri-
                                                                   of observation types L by PL . We also assume that
terion like d-separation cannot separate a vertex from
                                                                   we can observe the process to determine the whether
itself which is required to appropriately handle self-
                                                                   process independence statements hold. We will write
edges in directed graphs. A path p d*connects ver-
                                                                   P I(A, C, B) to indicate that the process associated
tices a and b given the set of vertices C in graph G if
                                                                   with observations of type B does not depend on the
every collider on p is in An(C) and every non-collider
                                                                   history of observations of type A given the history of
on p is not in C. For sets of vertices A, B, C ⊆ L
                                                                   observations of type C in a given process PL (where
where A ∩ C = ∅ we say that B is d*separated from
                                                                   A ∩ C = ∅). We write ¬P I(A, C, B) if this is not the
A by C in graph G if and only if there does not exist
                                                                   case. We call such statements process independence
a non-trivial d*connecting path between some a ∈ A
                                                                   statements. We note that process independence state-
and some b ∈ B given C in G.
                                                                   ments need not correspond to statistical independence
There are two key differences from Pearl’s d-separation            statements and, as with δ*separation, there is no ex-
that allow us to appropriately handle cyclic directed              pectation that such process independence statements
graphs. First, we restrict d*separation statements to              ought to be symmetric. In this section, we assume the
sets in which A ∩ C is the empty set but allow the                 existence of a process independence oracle for the rel-
sets A and B to overlap. Second, d*connecting paths                evant statistical process. In Section 4, we discuss par-
must be non-trivial. These modifications enable us                 ticular statistical processes and the problem of testing
to use d*separation statements to distinguish between              process independence statements for those processes.
graphs in which there is a self-edge (a → a) and one
                                                                   A process PL satisfies the Causal Factorization As-
in which there is not.
                                                                   sumption with respect to a causal process graph G =
We use directed graphs to represent temporal statis-               hL, Ei if and only if for all A, B, C ⊂ L where A∩B = ∅
tical processes. We associate the vertices L with a                it is the case that δ(A, B, C) ⇒ P I(A, B, C)
set of possible observation types (i.e., things that can
                                                                   A process PL satisfies the Causal Dependence Assump-
happen). The edges denote potential dependencies
                                                                   tion with respect to a causal process graph G = hL, Ei
between observations and the absence of a directed
                                                                   if and only if for all A, B, C ⊂ L where A ∩ B = ∅ it
edge from observation type a to observation type b
                                                                   is the case that P I(A, B, C) ⇒ δ(A, B, C)
indicates that the process that generates observations
of type b does not directly depend on the history of               The Causal Analysis (CA) Algorithm (Algorithm 1)
observations of type a. Analogous to the use of d-                 uses a process independence oracle to construct a di-
rected graph. We use πlG to denote the parents of l            process graph G. Our causal factorization and depen-
in graph G and |B| to denote the cardinality of the            dence assumptions allow us to focus on δ*separation
set B. The basic idea is to use process independence           in G by assuming that the observed process indepen-
statements to remove edges from an initially complete          dence statements accurately reflect the δ*separation
graph. This algorithm is analogous to the PC Algo-             statements about G for the observed observation types.
rithm of Spirtes, Glymour and Scheines (2001) but              In order to understand and interpret the output of the
does not have an orientation phase.                            CA algorithm we need to understand the conditions
                                                               that lead to edges in the final output. We begin by
Note that the output of the CA algorithm is a directed
                                                               defining the concept of vertex blockability relative to
graph and that any edges presented do not necessarily
                                                               a set of observed event types.
indicate a causal relationship. In the remainder of this
section we explore the interpretation of the output of         We say that a vertex a is b-unblockable relative to O in
the CA algorithm under various assumptions. Recall             G if and only if for all C ⊆ O \ {a, b} ¬δ(a, C, b) is true
that a ↔ b simply indicates that a → b and b → a and           of G. Otherwise the vertex is said to be b-blockable
not the existence of a latent common cause.                    relative to O. Note that if b → b then if b ∈ O b is
                                                               b-unblockable relative to O.
Input: A set of events L and a process PL
Output: A directed graph G                                     We say that l is a direct cause of l0 relative to O for
Let G = hL, Ei be a complete directed graph.;                  causal process graph G if and only if there exists a
foreach l ∈ L do                                               directed path hl1 , . . . , ln i where l1 = l and ln = l0 and
   Let n = 0;                                                  li 6∈ O for (1 < i < n). We call the path in the
   foreach l0 ∈ πlG do                                         definition of direct cause a witnessing path that l is a
      foreach B ⊆ πlG \ {l0 } where |B| = n do                 direct cause of l0 . We let Db denote the set of observed
          if P I(l0 , B, l) holds in PL then                   direct causes of the variable b relative to O, that is,
              E = E \ hl0 , li                                 members of O that are direct causes of b relative to O.
          end                                                  Example 1. Let E = {a → c, c → b}, L = {a, b, c}
      end                                                      and O = {a, b}. The vertex a is b-unblockable relative
      Let n = n + 1;                                           to O for G = hL, Ei but the vertex b is a-blockable
   end                                                         relative to O. In this example, a is a direct cause of b
end                                                            relative to O in graph G and a → c → b is a witnessing
Return G = hL, Ei;                                             path for this fact.
Algorithm 1: The Causal Analysis (CA) Algorithm                Lemma 3. If l0 is a direct cause of l relative to O in
                                                               G then l0 is l-unblockable relative to O in G.
Theorem 1 (Complete Observations). If PL satisfies
both the causal dependence and factorization assump-           The following lemma allows us to make causal infer-
tions with respect to G then algorithm CA(L, PL ) re-          ences using the causal analysis algorithm about the
turns G 0 = G.                                                 absence of a direct causal relationship.
Lemma 1. If PL satisfies the causal dependence as-             Lemma 4. If PL satisfies the causal dependence as-
sumption for G = hL, Ei and algorithm CA(L, PL ) re-           sumption with respect to G then, in the graph G 0 output
turns G 0 = hL, E 0 i then if l0 → l ∈ E then l0 → l ∈ E 0 .   by CA(O,PL ), the set of parents for each event type
                                                               include all of its direct causes relative to O.
Lemma 2. If PL satisfies both the causal depen-                In particular, if the algorithm finds that an event type
dence and factorization assumptions for G = hL, Ei             a is not a parent of event type b then a is not a direct
and algorithm CA(L, PL ) returns G 0 = hL, E 0 i then if       cause of b.
l0 → l 6∈ E then l0 → l 6∈ E 0 .

Proof of Theorem 1: The theorem follows from                   3.2   Causal sufficiency
Lemmas 1 and 2.                                                In the section, we restrict the type of unobserved event
                                                               types which enables us to make strong inferences about
3.1   Absence of a direct causal relationship                  the causal structure of a process. In particular we as-
                                                               sume causal sufficiency which is essentially an assump-
Next we consider the case in which some of the event
                                                               tion that there are no latent confounding processes.
types in the system are not observed. We let O ⊆ L
be the set of observed event types. In this case we will       A set of event types O ⊂ L is causally sufficient with
assume that the causal factorization and dependence            respect to a graph G = hL, Ei if and only if every
assumptions hold for a process PL and some causal              common cause of l, l0 ∈ O is in the set of event types
O.                                                               We say that a is a cause of b in G and if there is a
                                                                 directed path from a to b in G.
A directed graph G 0 = hO, E 0 i is causally correct with
respect to a graph G = hL, Ei if for every edge ha, bi ∈         We aim to find common features of all graphs that are
E 0 a is a direct cause of b with respect to O in G.             consistent with the observed pattern of process inde-
Theorem 2 (Causal Sufficiency). If PL satisfies both             pendence statements. Latent processes, however, can
the causal dependence and factorization assumptions              mask the causal nature of the observed pattern of de-
for G = hL, Ei and O ⊆ L is causally sufficient with             pendencies.
respect to G then the graph G 0 returned by algorithm            For a pair of vertices a, b and graph G we say that there
CA(O, PL ) is causally correct with respect to G and             is a potential indirect inducing path into b relative to O
O.                                                               if and only if (1) there is a vertex c1 ∈ O \ {a, b} such
Lemma 5. If PL satisfies the causal dependence and               that a → b in G and (2) there is a sequence of vertices
factorization assumptions with respect to G and O is             c1 , . . . , cn ⊆ O \ {a, c} such that ci ↔ ci+1 and cn ↔ b
causally sufficient for G then the output of the CA al-          in G.
gorithm removes the edge a → b if a is not a direct              Lemma 7. For any set of observed variable O, if a
cause of b relative to O.                                        graph has an inducing path between observed variables
                                                                 a, b into b containing another observed variable then
3.3   Causal insufficiency                                       the output of the CA algorithm will contain a potential
                                                                 indirect inducing path into b.
We have shown that the CA algorithm can provide
                                                                 Theorem 3 (Sufficient Cause). If PL satisfies both
causally accurate information under the assumptions
                                                                 the causal dependence and factorization assumptions
of causal sufficiency, causal factorization and causal
                                                                 for G = hL, Ei then if CA produces G 0 with vertices
dependence. In this section we consider removing the
                                                                 O ⊆ L for which the subgraph over {a, b} is a → b and
assumption of causal sufficiency.
                                                                 G 0 contains no potential inducing path between a, b into
Example 2. Let E = {a ← c, c → b}, L = {a, b, c}                 b then a is a cause of b in G.
and O = {a, b}. The observed event types O are not               Lemma 8. If PL satisfies both the causal dependence
causally sufficient for the graph G = hL, Ei. In ad-             and factorization assumptions for G = hL, Ei and CA
dition, the CA algorithm fails to provide output that            produces G 0 with vertices O ⊆ L for which the subgraph
is causally correct. In particular, the CA algorithm             over {a, b, c} is a ↔ b ↔ c then
yields the graph in which a → b and b → a despite the
fact that neither is a a cause of b in G nor is b a cause
                                                                     • if P I(a, ∅, c) and P I(c, ∅, a) then there is a latent
of a.
                                                                       common causes of a, b and a (possibly distinct)
Our aim is to graphically characterize vertex separa-                  latent cause of b, c and b is not a direct cause of c
bility. We do so using the idea of an inducing path in a               and b is not a direct cause of a.
directed graph that was introduced for directed acyclic              • if P I(a, b, c) then there is no latent common
graphs by Verma and Pearl (1990). For a pair of ver-                   causes of b, c, b is a cause of c in G.
tices a, b, we define Aab = An({a}) ∪ An({b}) \ {a, b}.
A path p between ha, bi is an inducing path relative to
O if and only if (1) every vertex on p ∈ O is a collider
                                                                 4     Statistical Processes and Process
on p and (2) Every collider on p is in Aab . An induc-                 Independence
ing path p = hl1 = a, . . . , ln = bi from a to b is into b if
ln−1 → ln . An inducing path p = hl1 = a, . . . , ln = bi        Our approach to causal discovery through the obser-
from a to b is out of a if l1 → l2 .                             vation of a dynamic process is applicable to different
                                                                 temporal statistical processes. The key connection re-
Lemma 6. For a directed graph G the following three
                                                                 quired is a connection between process independence
statements are equivalent:
                                                                 statements and the observations from a particular sta-
                                                                 tistical process. In this section we consider two distinct
(a) A vertex a is b-unblockable relative to O in graph           statistical processes and discuss process independence
    G                                                            for these processes.
(b) There is an inducing path between a and b relative
    to O in graph G b . Note this inducing path must             4.1     Dynamic Bayesian Networks
    be into b.
                                                                 Dynamic Bayesian networks (DBNs) are a popular
(c) ¬δ(a, O ∩ Aab , b) in G.                                     discrete-time model that can capture temporal dy-
                                                                 namics of a statistical process. A DBN is a statis-
tical model of an infinite set of variables indexed by                      distribution for D in which the timestamps are con-
time. A variable Xit denotes the ith variable at time                       tinuous random variables can be written in this form.
t. We use X = X1 , . . . , Xn to denote the set of vari-                    For more details see [1, 2]. Despite the fact that the
able types in the DBN, that is, a variable with an un-                      modeling assumptions are weak, these models offer a
specified time component and X t to denote the set of                       powerful approach for decomposing the dependencies
variables at time t. The DBN specifies the evolution                        of different event types on the past. In particular, this
of X t as a stochastic function of the value of previous                    per label conditional specification allows one to model
variables X t−i (i > 0). In particular, the variable Xit                    detailed label-specific dependence on past events.
is a stochastic function of the value of its parents in
                                                                            Next we define a graphical conditional intensity model
a graph. The causal process graph associated with a
                                                                            that we call a graphical event model (GEM). A fil-
causal DBN is a graph over the variable types of the
                                                                            tered history for A ⊆ L as [h]A = {(ti , li )|(ti , li ) ∈
DBN X where there is an edge Xi → Xj if there ex-
                                                                            h ∧ li ∈ A}. A GEM is a pair < G, θ >, where
ists a t, i such that there is an edge Xit−i → Xj in the
                                                                            G =< L, E > is a directed graph over a set of event
DBN. Thus, the parent relationship of the causal pro-
                                                                            types and edges in E represent potential dependencies
cess graph captures the dependence of a variable type
                                                                            among event types. The parameters θ = {θl }l∈L pa-
on the history of other variable types. Furthermore,
                                                                            rameterize the intensity functions for each event type.
process independence statements P I(Xi , C, Xj ) corre-
                                                                            In particular, λl (t|ht , θl ) = λl (t|[ht ]πl , θl ) where πl is
spond to a set of independence statements of the form
                                                                            the set of parents for l in G. As in the case of the
I(Xi1 , . . . , Xit−1 , XC1 , . . . , XCt−1 , Xjt ). Without further
                                                                            DBN, a process independence statement correspond to
assumptions, testing process independence would be
                                                                            testing a dependence of an event type on set of event
unfeasible but if we focus on stationary processes with
                                                                            histories. One potential approach to testing a process
finite temporal dependency we can potentially test
                                                                            independence P I(a, C, b) is to estimate/learn an inten-
process independence statements.
                                                                            sity function for b using the event histories for {a} ∪ C
                                                                            and see if the intensity model depends on the event
4.2    Graphical Event Models                                               history for a. The work by Gunawardana et al (2011)
                                                                            on learning piecewise continuous intensity models is a
In this section, we define Conditional Intensity Mod-
                                                                            good starting point for this approach.
els and Graphical Event Models (GEMs) and con-
nect these models with previous work on the class of
Piecewise-Constant Conditional Intensity Models and                         5    Discussion
Poisson Networks. We assume that events of differ-
ent types are distinguished by labels l drawn from                          One of the goals for the research direction described
a finite alphabet L. An event is then composed of                           in this paper is the development a sound approach
a non-negative time-stamp t and a label l. A his-                           to causal inference for dynamic systems. One of the
                                                      n
tory is an event sequence h = {(ti , li )}i=1 where                         popular extant approaches is that of Granger causal-
0 < t1 < · · · < tn , and our data is a specific history                    ity which fails on this account. This approach is typi-
denoted by D. Given data D, we define the history                           cally applied in a discrete-time continuous valued time-
at time t as h(t, D) = {(ti , li ) | (ti , li ) ∈ D, ti ≤ t}. We            series and, thus, can be viewed as a dynamic Bayesian
suppress D from h(t, D) when clear from context and                         network. Roughly speaking, in a multivariate time se-
write hi = h(ti−1 ). By convention t0 = 0. We define                        ries X a set of variables are the Granger-causes of Xj if
the ending time t(h) of a history h as the time of the                      the historical values of this set of variables (including
last event in h: t(h) = max(t,l)∈h t so that t(hi ) = ti−1 .                Xj ) are necessary and sufficient for optimal predic-
A Conditional Intensity Model (CIM) is a set of non-                        tion. Unfortunately this approach does not appropri-
negative conditional intensity functions indexed by la-                     ately handle latent common causes. In particular, for
bel {λl (t|h; θ)}l∈L . The data likelihood for this model                   both of the scenarios described in Lemma 8 it is the
is                                                                          case that each of the variables is a Granger cause of its
                                                                            neighbors while this relationships need not be causal
                  n
                 YY                                                         as the lemma demonstrates. In fact, it is easy to con-
      p(D|θ) =             λl (ti |hi , θ)1l (li ) e−Λl (ti |hi ;θ)   (1)
                                                                            struct stochastic processes with latent factors which
                 l∈L i=1
                                                                            demonstrate that the inferential approach to Granger
                       Rt                                                   causality is not sound with respect to causal relations.
where Λl (t|h; θ) = −∞ λl (τ |h; θ)dτ and the function
1l (l0 ) is one if l0 = l and zero otherwise. The condi-                    There has been much work related to causal discov-
tional intensities are assumed to satisfy λl (t|h; θ) = 0                   ery and the estimation of causal effects in time-series.
for t ≤ t(h) to ensure that ti > ti−1 = t(hi ). These                       As discussed above, the work on Granger causality
modeling assumptions are quite weak. In fact, any                           (Granger 1969) is the most well known. The short-
comings of this approach are also well known (e.g.,         References
Eichler 2007) and there has been some work in trying
to address these known short comings. For instance,          [1] D. J. Daley and D. Vere-Jones. An Introduction to
Eichler (2007) proposes a similar approach to the ap-            the Theory of Point Processes: Elementary The-
proach described here but differs in that it allows for          ory and Methods, volume I. Springer, second edi-
the possibility of “simultaneous correlation” which re-          tion, 2002.
quires the use of an alternative definition of separa-
                                                             [2] Vanessa Didelez. Graphical models for marked
tion. In addition, while providing definitions of cause
                                                                 point processes based on local independence.
and spurious cause, sufficient conditions for the identi-
                                                                 JRSS-B, 70(1):245–264, 2008.
fication of causal relationships are not presented. The
work of Entner and Hoyer (2010) considers the prob-          [3] Michael Eichler. Granger causality and path di-
lem of causal discovery from time series data using              agrams for multivariate time series. Journal of
limited dependence vector autoregressive models and              Econometrics, 137:334–353, 2007.
the FCI algorithm that uses conditional independence
tests to identify the structure. Our approach of using       [4] Michael Eichler and Vanessa Didelez. Causal rea-
δ*separation is inspired by the work of Didelez (2008)           soning in graphical time series models. In Un-
who defined δ-separation and shows the connection be-            certainty in Artificial Intelligence, pages 109–116,
tween that graphical separation criterion and local in-          2007.
dependence of marked point processes. Our extension
                                                             [5] Doris Entner and Patrik O. Hoyer. On causal dis-
to δ*separation allows for the appropriate treatment of
                                                                 covery from time series data using FCI. In Prob-
self-edges which are essential in any self-excitatory or
                                                                 abilistic Graphical Models, pages 121–128, 2010.
self-inhibitory dynamic process. Another more loosely
connected work is that of Eichler and Didelez (2007)         [6] C.W.J. Granger. Investigating causal relations by
that considers the estimation of causal effects based            econometric models and cross-spectral methods.
on an intervention in a time-series.                             Econometrica, 37:424–438, 1969.
While the results described in this paper offer hope         [7] Asela Gunawardana, Christopher Meek, and
for developing a methodologically sound approach to              Puyang Xu. A model for temporal dependencies
causal inference for dynamic systems, there is much              in event streams. In Advances in Neural Informa-
work that needs to be done. Here are some of the                 tion Processing Systems, 2011.
open research questions.
                                                             [8] C. Meek. Strong completeness and faithfulness
  • Non-parametric tests for process independence for            in Bayesian networks. In Proceedings of Eleventh
    various type of temporal statistical processes               Conference on Uncertainty in Artificial Intelli-
                                                                 gence, Montreal, QU, pages 411–418. Morgan
                                                                 Kaufmann, August 1995.
  • Soundness and completeness results for
    δ*separation analogous to those provided by              [9] P. Spirtes, C. Glymour, and R. Scheines. Cau-
    Pearl (1988), Meek (1995) and Spirtes et al                  sation, Prediction, and Search, Second Edition.
    (2001) for d-separation. Note that Didelez (2008)            MIT Press, Cambridge, MA, second edition, 2001.
    has shown the soundness of δ-separation for
    a family of marked point processes related to           [10] T. Verma and J. Pearl. Equivalence and synthe-
    GEMs.                                                        sis of causal models. In Proceedings of Sixth Con-
                                                                 ference on Uncertainty in Artificial Intelligence,
  • A representation for equivalence classes of causal           Boston, MA, pages 220–227. Morgan Kaufmann,
    graphs with respect to δ*separation in the case of           July 1990.
    causal insufficiency (O ⊂ L) analogous to those
    developed by Verma and Pearl (1990) and Spirtes
    et al (2001) that captures the common casual as-
    pects of the set of graphs in the equivalence class.


Acknowledgments

Thanks to Asela Gunawardana and two anonymous
reviewers for their comments on an earlier draft of this
paper.