<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning the Structure of Causal Models with Relational and Temporal Dependence</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marc Maier</string-name>
          <email>maier@cs.umass.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Information and Computer Sciences, University of Massachusetts Amherst</institution>
          ,
          <addr-line>Amherst, MA 01003</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Many real-world domains are inherently relational and temporal-they consist of heterogeneous entities that interact with each other over time. Effective reasoning about causality in such domains requires representations that explicitly model relational and temporal dependence. In this work, we provide a formalization of temporal relational models. We define temporal extensions to abstract ground graphs-a lifted representation that abstracts paths of dependence over all possible ground graphs. Temporal abstract ground graphs enable a sound and complete method for answering d-separation queries on temporal relational models. These methods provide the foundation for a constraint-based algorithm, TRCD, that learns causal models from temporal relational data. We provide experimental evidence that demonstrates the need to explicitly represent time when inferring causal dependence. We also demonstrate the expressive gain of TRCD compared to earlier algorithms that do not explicitly represent time.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>Recent work in artificial intelligence has devoted
increasing attention to learning and reasoning with causal
knowledge. Causality is central to understanding the behavior of
complex systems and to selecting actions that will achieve
particular outcomes. Thus, causality is implicitly or
explicitly central to mainstream AI areas such as planning,
cognitive modeling, computational sustainability, game playing,
multiagent systems, and robotics. Causal inference is also
central to many areas beyond AI, including medicine,
public policy, and nearly all areas of science.</p>
      <p>Substantial research advances have been made over the
past several decades that allow the structure and
parameters of causal graphical models to be learned from
observational data. As early as the 1990s, researchers
developed constraint-based algorithms, such as the IC [Pearl
and Verma, 1991] and PC [Spirtes et al., 2000] algorithms,
that leverage the connection between the graphical criterion
of d-separation and the conditional independencies that are
inferred to hold in the underlying distribution, in order to
learn the structure of causal graphical models from data.
In this work, we significantly extend the expressiveness
of the models learnable with constraint-based algorithms.
Specifically, we provide a formalization of temporal
relational models, an expressive class of models that can
capture probabilistic dependencies between variables on
different types of entities within and across time points.
We extend the notion of abstract ground graphs [Maier et
al., 2013b]—a lifted representation that allows reasoning
about the conditional independencies implied by a
relational model—for temporal relational models, and we show
that temporal abstract ground graphs are a sound and
complete abstraction for ground graphs of temporal relational
models. Temporal abstract ground graphs can be used to
answer d-separation queries for temporal relational
models. We also extend an existing constraint-based algorithm
for inferring causal dependence in relational data—the
relational causal discovery (RCD) algorithm—to incorporate
time, thus providing a constraint-based method that learns
causal models from temporal relational data.
2</p>
    </sec>
    <sec id="sec-2">
      <title>RELATIONAL MODELS</title>
      <p>Propositional representations, such as Bayesian networks,
describe domains containing a single entity class. Many
real world systems comprise instances from multiple
entity classes whose variables are interdependent. Such
domains are often referred to as relational. In this section, we
introduce basic concepts of relational representations that
exclude time.</p>
      <p>A relational schema S = (E ; R; A; card ) specifies the
types of entities, relationships, and attributes that exist in
a domain. It includes a cardinality function that constrains</p>
      <p>Course
[Course, Takes, Student ].gpa ! [Course].di culty
the number of times an entity instance can participate in a
relationship. A relational schema can be depicted with an
Entity-Relationship (ER) diagram. Figure 1 shows an
example ER diagram (in gray) that describes a simplified
university domain. The domain consists of two entity classes
(Student and Course), and one relationship class (Takes).
The entity class Student has one attribute, gpa, and the
entity class Course has one attribute, di culty . The
cardinality constraints are shown with crow’s feet notation—
students can enroll in multiple courses and a course can
be taken by multiple students. A relational skeleton is a
partial instantiation of a relational schema. It specifies the
entity and relationship instances that exist in the domain.
Figure 2 shows an example relational skeleton for the
relational schema of Figure 1. The skeleton consists of two
Student instances, Alice and Bob, and two Course
instances, CS101 and CS201. Alice is taking CS101 and Bob
is taking both courses.</p>
      <p>Given a relational schema, we can specify
relational paths, which intuitively correspond to
ways of traversing the schema. For the schema
shown in Figure 1, possible paths include
[Student ; Takes; Course] (the courses a student takes), as
well as [Student ; Takes; Course; Takes; Student ] (other
students that take the same courses). Relational variables
consist of a relational path and an attribute that can be
reached through that path. For example, the relational
variable [Student ; Takes; Course]:di culty corresponds
to the difficulty of the courses that a student takes.
Probabilistic dependencies can be defined between relational
variables. Dependencies are said to be in canonical form
when the path of the effect (or outcome) relational variable
is a single item. For canonical dependencies, the path
of the cause (or treatment) relational variable describes
how dependence is induced. As an example, consider the</p>
      <sec id="sec-2-1">
        <title>Alice.gpa</title>
        <p>Bob.gpa</p>
      </sec>
      <sec id="sec-2-2">
        <title>CS101.difficulty</title>
      </sec>
      <sec id="sec-2-3">
        <title>CS201.difficulty</title>
        <p>[Course; Takes; Student ]:gpa ! [Course]:di culty
which states that the difficulty of a course is affected by the
gpa of students taking that course. Presumably, instructors
adjust the difficulty of the course based on the grade-point
average of enrolled students.</p>
        <p>A relational model M = (S; D; ) is a collection of
relational dependencies defined over a single relational schema
along with their parameterizations (a conditional
probability distribution for each attribute given its parents). The
structure of a relational model can be depicted by
superimposing the dependencies on the ER diagram of the
relational schema, as shown in Figure 1, and labeling each
arrow with the corresponding relational dependency.
Given a relational model M and a relational skeleton ,
we can construct a ground graph GG M by applying the
relational dependencies as specified in the model to the
specific instances of the relational skeleton. Figure 3 shows the
ground graph for the model of Figure 1 applied on the
relational skeleton of Figure 2. In this work, we restrict our
attention to relational models that do not contain the kind
of relational autocorrelation that gives rise to cycles in the
ground graph.
3</p>
        <p>TEMPORAL RELATIONAL MODELS
Relational models can be extended with a temporal
dimension to model probabilistic dependencies over time.
Such an extension is similar to the way in which
dynamic Bayesian networks (DBNs) extend Bayesian
networks [Murphy, 2002]. A temporal relational model can
be thought of as a sequence of time points, each of which
is associated with a (non-temporal) relational model, and a
set of dependencies that cross time points. Because
dependencies in this model have a causal interpretation,
dependencies across time points are only directed from the past
to the future.</p>
        <p>In this section, we extend the relational notions presented
in Section 2 to include time. We assume that (1) time is
discrete; (2) the schema is static; (3) relational dependencies
do not change over time; (4) the temporal relational
skeleton is given a priori; (5) the first-order Markov assumption
[Courset, Takest, Studentt].gpa ! [Courset].di culty
[Studentt+1, Studentt, Takest, Courset].di culty ! [Studentt+1].gpa
[Courset+1, Takest+1, Studentt+1].gpa ! [Courset+1].di culty
holds (i.e., treatment and outcome can be at most one time
point apart); and (6) all entities participating in a
relationship are contemporaneous with the relationship.
Under these assumptions, the structure of a temporal
relational model can be represented by using only two time
points, as shown in Figure 4. Every time point has the same
relational schema, shown in gray. A temporal relational
skeleton provides a partial instantiation of a temporal
relational schema. It specifies the entity and relationship
instances that exist in each time point. Note that different
sets of entity and relationship instances may be present in
each time step. An example temporal relational skeleton is
shown in Figure 5. In this case, both time points have the
same set of entity instances (Alice and Bob are instances
of the Student entity, CS101 and CS201 are instances of
the Course entity). However, the instances of the
relationship Takes differ. In t = 0, Alice takes CS101 and Bob
takes CS201. In t = 1, Alice takes CS201 and Bob takes
no classes.</p>
        <p>Temporal relational paths capture possible ways to
traverse the temporal schema; therefore, they can cross
time points. For example, a temporal relational path
is [Student t+1; Student t; Takes t; Course t], which
describes the classes that a student took in the previous
semester. More formally, a temporal relational path is
a sequence of non-temporal relational paths (relational
paths within a time point) and “jumps” between
neighboring time points. These jumps can happen at both
entities and relationships because each choice encodes
a distinct semantics. For example, the relational path
[Student t+1; Student t; Takes t; Course t] describes the
courses that a student took in the previous semester, while
the path [Student t+1; Takes t+1; Course t+1; Course t]
gpa
gpa</p>
        <p>first finds the courses that a student is taking this semester,
and then finds those courses in the previous semester. In
the example skeleton of Figure 5, for Alice, the first path
would reach CS101 at t = 0, while the second path, will
reach CS201 at t = 0.</p>
        <p>Definition 1. A temporal relational path P is a sequence
of non-temporal relational paths P0t0 ; : : : ; Pktk (k 0) such
that for any two consecutive paths Piti ; Pjtj in P the
following hold:
2. The last item class of Piti is the same as the first item
class of Pjtj .
3. No subpath of P is of the form [Ikt ; : : : ; Ikt ], where all
relations in the subpath are one-to-one.</p>
        <p>We use the notation time (P ) to denote the set of all time
points that appear in path P .</p>
        <p>Temporal relational variables consist of a temporal
relational path and an attribute that can be reached through
that path. Temporal relational dependencies define
probabilistic dependencies between two temporal relational
variables. Temporal probabilistic dependencies are never
directed backwards in time. Therefore, at the model level,
there are no dependencies going back in time. However,
the temporal constraints associated with a dependency are
also implicitly encoded by the temporal relational path that
annotates the dependency. To account for this, we forbid
the temporal relational path of the treatment to go through
any time points later than the time point of the outcome.
Definition 2. A temporal relational dependency consists
of two temporal relational variables with a common base
item, [I1t; : : : ; Ikt0 ]:Vk ! [I1t]:V1 such that
max time [I1t; : : : ; Ikt0 ]
t
The first-order Markov assumption for temporal causal
models implies that for every probabilistic dependency, if
the treatment is in time point t, then the outcome is either
in t or in t + 1. In the case of relational domains, the
relational path of the treatment carries some temporal
information since it can contain multiple time points. For the
firstorder Markov condition to hold, we require the relational
path of the treatment to only go through the current and the
next time points. More formally, a temporal relational
dependency [I1t; : : : ; Ikt0 ]:Vk ! [I1t]:V1 follows the first-order
Markov assumption if the following two conditions hold:
t = t0 or t = t0 + 1
time ([I1t; : : : ; Ikt0 ])
ft; t
1g
(1)
(2)
The structure of a 2-slice temporal relational model is
defined as a set of temporal relational dependencies over a
relational schema.</p>
        <sec id="sec-2-3-1">
          <title>Tower</title>
        </sec>
        <sec id="sec-2-3-2">
          <title>Cellphone</title>
          <p>Definition 3. The structure of a 2-slice temporal relational
model is a pair M = hS; DT i, where S is a relational
schema and DT is a set of temporal relational
dependencies that adhere to the first-order Markov assumption.
The temporal relational model described so far
subsumes propositional directed networks (Bayesian
networks), propositional temporal directed networks (dynamic
Bayesian networks), and relational non-temporal models.
This added expressivity comes at the cost of increased
complexity. This raises an obvious and important question:
What is the value of this added expressivity?
As an example of the practical utility of this added
expressivity, we consider a real dataset and show how a path with
temporal jumps leads to different terminal sets.
Specifically, we used the Reality Mining dataset [Eagle and
Pentland, 2006]. The dataset contains two entities, Tower and
Cellphone, and one relationship, Connects. The relational
schema for this domain is shown in Figure 6. For this
dataset, entity instances (i.e., the set of towers and
cellphones) do not change over time. However, the set of
relationship instances (the connections) are different at every
time point. In total, there are 95 cellphones, 32,579
towers, and 3,308,709 connections. Every connection is
timestamped with precision of 1 second. For our work, the time
granularity was coarsened to a day.</p>
          <p>We computed the terminal sets for the following three
paths:
P 1 : [Tower t+1; Tower t; Connectst; Cellphonet]
P 2 : [Tower t+1; Connectst+1; Connectst; Cellphonet]
P 3 : [Tower t+1; Connectst+1; Cellphonet+1; Cellphonet]
The first path corresponds to the cellphones that were
connected to a tower in the previous timestep. The second path
corresponds to the cellphones that connected to a tower
both in the current and in the previous timestep. The
third path corresponds to the cellphones that connected to
a tower in the current time step, and gets the state of those
cellphones in the previous timestep.</p>
          <p>We randomly selected 100 dates from the dataset. For each
of these dates and for each tower that was used in these
dates, we computed the terminal sets for the above paths.
For a given tower and date, we computed the Jaccard
distance between the different terminal sets. The Jaccard
distance between two sets A and B is defined as J (A; B) =
1 jjAA[\BBjj . Intuitively, this quantifies the overlap of two
sets, while accounting for the size of both. Table 1 shows
the average Jaccard distance between the terminal sets,
averaged across the dates and the towers. The results indicate
that, on average, the terminal sets reached through the three
paths will be different; therefore, this more expressive
representation could be of use in real data.
5</p>
          <p>TEMPORAL ABSTRACT GROUND
GRAPHS
An abstract ground graph is a lifted representation that
abstracts paths of dependence over all possible ground graphs
for a given relational model [Maier et al., 2013b]. Abstract
ground graphs are shown to be sound and complete in the
sense that every edge in the abstract ground graph
corresponds to an edge in some ground graph, and every edge
in an arbitrary ground graph is represented by an edge in
an abstract ground graph. In this section we adapt the
definition of abstract ground graphs for the case of a 2-slice
temporal relational model.</p>
          <p>Definition 4. A temporal abstract ground graph
tAGG MBh = hV; Ei for a 2-slice temporal
relational model M = hS; DT i, perspective B 2 E [ R, and
hop threshold h 2 N0 is an abstraction of the dependencies
DT for all possible ground graphs GG M T of M on
arbitrary temporal skeletons T . The temporal abstract
ground graph is a directed graph with the following nodes
and edges:
1. V = RV [ IV , where
(a) RV is the set of temporal relational variables with a
path of length at most h + 1.</p>
          <p>RV = f[Bt; ::: ; Ijt0 ]:V j length ([Bt; ::: ; Ijt0 ])
h + 1g
(b) IV are intersection variables between pairs of
temporal relational variables that could intersect1.</p>
          <p>IV = fX \ Y j X; Y 2 RV
and X = [Bt; ::: ; Ikt0 ; ::: ; Ijt00 ]:V
and Y = [Bt; ::: ; Ilt000 ; ::: ; Ijt00 ]:V and Ikt0 6= Ilt000 g
2. E = RVE [ IVE , where
(a) RVE</p>
          <p>RV</p>
          <p>RV are the relational variable edges:
RVE = f[Bt; ::: ; Ikt0 ]:Vk ! [Bt; ::: ; Ijt00 ]:Vj j</p>
          <p>[Ijt00 ; ::: ; Ikt0 ]:Vk ! [Ijt00 ]:Vj 2 DT and
[Bt; ::: ; Ikt0 ] 2 extend ([Bt; ::: ; Ijt00 ]; [Ijt00 ; ::: ; Ikt0 ])g
(b) IVE IV RV [ RV IV are the intersection
variable edges. This is the set of edges that intersection
variables “inherit” from the relational variables that they
were created from.</p>
          <p>The extend method converts dependencies of the model,
specified in the canonical form, into dependencies from the
perspective of the abstract ground graph.</p>
          <p>Temporal abstract ground graphs can be shown to be
a correct abstraction over all possible temporal ground
graphs. The proof follows the one provided for the
nontemporal abstract ground graphs, as presented by Maier et
al. [2013b].</p>
          <p>The temporal abstract ground graph for a model on the
student-courses domain with the dependency
[Student t+1; Student t; Takes t; Course t]:di culty !
[Student t+1]:gpa
is shown in Figure 7. This abstract ground graph is from
the perspective of Student and for hop threshold h = 4.
Disconnected nodes are omitted.
6
d-SEPARATION IN TEMPORAL</p>
          <p>ABSTRACT GROUND GRAPHS
The rules of d-separation provide a graphical criterion that
specifies whether two sets of variables in a directed acyclic
graph are conditionally independent given a third set of
variables. Specifically, let X; Y; Z be disjoint sets of
variables in a directed acyclic graph. X is d-separated from Y
1Only relational variables in the same time point can intersect.
[Student(t+1), Takes(t+1), Takes(t), Course(t)].difficulty
[Student(t+1), Takes(t+1), Takes(t), Course(t)].difficulty INT
[Student(t+1),Takes(t+1), Course(t+1), Course(t)].difficulty
[Student(t+1), Student(t), Takes(t), Course(t)].difficulty INT
[Student(t+1), Takes(t+1), Takes(t), Course(t)].difficulty
[Student(t+1), Student(t), Takes(t), Course(t)].difficulty
[Student(t+1), Student(t), Takes(t), Course(t)].difficulty INT
[Student(t+1),Takes(t+1), Course(t+1), Course(t)].difficulty
[Student(t+1), Takes(t+1), Takes(t), Student(t), Student(t+1)].gpa INT
[Student(t+1), Student(t), Takes(t), Takes(t+1), Student(t+1)].gpa)
[Student(t+1), Takes(t+1), Takes(t), Student(t), Student(t+1)].gpa INT
[Student(t+1), Takes(t+1), Course(t+1), Takes(t+1), Student(t+1)].gpa)
[Student(t+1), Takes(t+1), Takes(t), Student(t), Student(t+1)].gpa)
[Student(t+1)].gpa)
given Z if every path between variables in X and Y is not
a d-connecting path given Z. A path is d-connecting given
Z if for every collider W on the path, either W 2 Z or a
descendant of W is in Z, and every non-collider on the path
is not in Z. Geiger and Pearl [1988] and Verma and Pearl
[1988] showed that d-separation is sound and complete.
Ground graphs (and temporal ground graphs) are directed
acyclic graphs; therefore, the rules of d-separation can be
applied to them. In the case of domains where the
firstorder Markov assumption holds, a d-separating set for
variables in the same time point can be found by examining
only one time point in the past.</p>
          <p>Proposition 5. Let G be a temporal directed acyclic graph
that follows the first-order Markov assumption. (i) If X t+1
and Y t+1 are conditionally independent given some set Z,
then there exists a separating set W such that time (W)
ft; t + 1g. (ii) Similarly, if X t and Y t+1 are conditionally
independent given some set Z, then there exists a
separating set W such that time (W) ft; t + 1g.</p>
          <p>Proof. For each case, we will construct an appropriate
separating set:
(i) Let W = parents (X t+1)[parents (Y t+1). Because of
the first-order Markov assumption, time (W) ft; t + 1g.
Moreover, conditioning of W renders X t+1; Y t+1
independent because of the local Markov property (either X t+1
is a descendant of Y t+1 or vice versa, or none of them is a
descendant of the either).
(ii) In this case, let W = parents (Y t+1). Because of
the temporal semantics, X t is a non-descendant of Y t+1;
therefore, Y t+1 is conditionally independent of its
nondescendants (that include X t) given W.</p>
          <p>The significance of the above proposition is that, given a
model where the first-order Markov assumption holds, we
could infer conditional independencies (and use them to
learn the structure of the model) by only considering
consecutive time points.</p>
          <p>Maier et al. [2013b] showed that d-separation cannot be
applied directly to a relational model. To correct for that,
they introduced relational d-separation, a graphical
criterion that can be applied to abstract ground graphs and used
to infer conditional independencies that hold across all
possible ground graphs of the model. Here, we show that
the notion of relational d-separation can be generalized for
temporal abstract ground graphs as well. In the following
definition, Xjb denotes the terminal set of X starting at b,
i.e., the set of X instances that can be reached if we start
from instance b and follow the relational path of X on the
relational skeleton.</p>
          <p>Definition 6 (Temporal relational d-separation). Let X, Y,
and Z be three disjoint sets of temporal relational variables
from perspective B for a 2-slice temporal relational model
M such that not both X and Y contain variables in t. Then
X and Y are d-separated by Z if and only if, for any
temporal skeleton T , Xjb and Yjb are d-separated by Zjb in
ground graph GG M T for all b 2 T (B).</p>
          <p>The following theorem shows that temporal relational
dseparation is sound and complete up to a specified hop
threshold. We use the notation X to denote the set of
relational variables X augmented with the set of intersection
variables they participate in.</p>
          <p>Theorem 7. Let X, Y, and Z be three disjoint sets of
temporal relational variables for perspective B such that not
both X and Y contain variables in t. Then, for any
temporal skeleton T and for all b 2 (B), Xjb and Yjb are
d-separated by Zjb up to h in ground graph GG M T if
and only if X and Y are d-separated by Z on the abstract
ground graph tAGG MBh .</p>
          <p>Proof. We prove this by defining a non-temporal relational
model that is equivalent to the given temporal model. Since
d-separation is sound and complete for non-temporal
relational models, the result extends to the equivalent model,
and therefore, the temporal relational model. Given a
2slice temporal relational model MT = hS; DT i, construct
a relational model M = hS0; Di as follows:</p>
          <p>For every item in S, add an item in S0 with superscript t
and one with superscript t + 1: S0 = fIt; It+1 j I 2 Sg.</p>
          <p>For every entity E 2 S, add in S0 a 1-1 relation between
Et and Et+1.</p>
          <p>The set of relational dependencies is the set of temporal
relational dependencies: D = DT .</p>
          <p>It can be shown that these two models are equivalent in
the sense that there is a one-to-one correspondence
between valid paths in MT and M. Leveraging the
temporal constraints of d-separation described by Proposition 5
and the soundness and completeness of d-separation for
abstract ground graphs, we conclude that d-separation is
sound and complete (up to a hop threshold) in temporal
abstract ground graphs.
7</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>TEMPORAL RCD</title>
      <p>The theory of temporal relational d-separation allows us
to derive all conditional independence facts that are
consistent with the structure of a temporal relational model,
the same way that d-separation connects the structure of
a Bayesian network and the conditional independencies of
the underlying distribution. This is precisely the
connection that constraint-based algorithms leverage in order to
learn the structure of models. Thus, temporal relational
dseparation (and temporal abstract ground graphs) enable a
constraint-based algorithm, TRCD2, that learns the
structure of temporal and relational causal models from data.
TRCD extends RCD [Maier et al., 2013a] to operate over
a 2-slice temporal relational model. More specifically, it
constructs a set of temporal abstract ground graphs, one
from the perspective of each entity, and uses the theory
of temporal relational d-separation on temporal abstract
ground graphs to decide conditional independence facts.
TRCD uses the temporal abstract ground graphs to
determine which conditional independence facts should be
checked in the data and which dependencies (edges) in the
temporal relational model are implied by those facts. As
in the case of RCD, for practical reasons, the space of
potential dependencies is limited by a domain-specific hop
threshold.</p>
      <p>TRCD operates in two phases. Phase I learns a set of
undirected dependencies and Phase II employs a set of
orientation rules to orient those dependencies. Phase II of TRCD
uses the same orientation rules as RCD—collider detection,
known non-colliders (KNC), cycle avoidance (CA), Meek
rule 3 (MR3), and relational bivariate orientation (RBO).
Additionally, TRCD orients dependencies that cross time
points from the past to the future.</p>
      <p>RCD was shown to be sound and complete in the
sample limit (i.e., with perfect tests of conditional
independence) and for infinite hop threshold, under the standard
assumptions of the causal Markov condition, faithfulness,
and causal sufficiency for relational domains. By
leveraging the soundness and completeness of temporal relational
d-separation, TRCD can be shown to be sound and
complete (under the same assumptions).</p>
      <p>2Code available at kdl.cs.umass.edu/trcd.
)
n
a
em 1.00
e
h
ft
o
r
rro 0.95
e
d
r
a
d
n
tsa 0.90
h
it(
w
llca 0.85
e
R
/
n
o
i
ics 0.80
e
r
P
8
8.1
1 entity
0.95
0.90</p>
    </sec>
    <sec id="sec-4">
      <title>EXPERIMENTS</title>
      <p>ORACLE EXPERIMENTS
The goal of this set of experiments is twofold. First, we
evaluate the theoretical performance of TRCD in the large
sample limit. Towards that end, we employ an oracle for
answering d-separation queries in the place of statistical
tests of independence. Second, these experiments provide
experimental evidence about the correctness of temporal
relational d-separation.</p>
      <p>We generated synthetic schemas with number of entities
varying from 1 to 3, number of relationships fixed to one
less than the number of entities, and number of attributes
for each item drawn from Poisson( = 1) + 1. The
cardinalities were uniformly selected at random. For each
number of entities specified, we generated 500 different
schemas. This generating process yielded 1,500 different
schemas. For a given schema, we generated 10 different
models with number of dependencies ranging from 1 to 10.
Specifically, we generated all possible dependencies, up to
hop-threshold h = 3. Then, we chose the desired
number of dependencies from that space at random, subject to
the following constraints: Each relational variable has at
most 3 parents and the generated model contains no cycles.
Moreover, every model must contain at least one
dependency with a temporal relational path that contains more
than one time point. This procedure resulted in a total of
15,000 models.</p>
      <p>We ran TRCD with a d-separation oracle for each model.
Figure 8 shows average precision and recall after Phase I
(unoriented dependencies) and after Phase II (partially
oriented dependencies). As expected, given that these
experiments use an oracle, the algorithm always learns the
correct set of unoriented dependencies (unoriented precision
and recall are always 1). The algorithm makes no mistakes
in the orientation (oriented precision is always 1); however,
it is not possible to orient all dependencies (oriented recall
is lower than 1). Note that comparing to an oracle version
of RCD is not straightforward. The oracle requires a true
model that is fully directed. Converting the true temporal
model to a non-temporal one would often result in cycles
or undirected edges, and the relational d-separation oracle
cannot be used.</p>
      <p>Table 2 shows how often each orientation rule was used in
Phase II. Collider detection, known non-colliders (KNC),
and relational bivariate orientation (RBO) are orienting the
majority of the edges. As expected, in the case of
propositional models (one entity), the relational bivariate
orientation rule, a rule for edge orientation that is unique to
relational data, is never used. We observe that the other
two rules—cycle avoidance and Meek’s rule 3—do not fire
often. That can be explained by the fact that those rules
would never fire in the presence of a temporal edge.
Consider cycle avoidance in the propositional case: If the
pattern X ! Y ! Z and X Y is encountered, then cycle
avoidance orients X ! Y . If any of the dependencies
X ! Y ! Z crosses time points, then X and Y would be
in different time points and the edge between them would
be oriented based on temporal precedence. A similar
argument can be made for the case of Meek’s rule 3.
Y1
Y2
Y1
Y2
Y1
Y2</p>
      <p>B</p>
      <p>B
[Bt+1].Y 2 ! [Bt+1].Y 1</p>
      <p>B
[B].Y 2 ! [B].Y 1
X1</p>
      <p>X2</p>
      <p>X3
X1</p>
      <p>X2
X3</p>
      <p>A
A</p>
      <p>XY2</p>
      <p>XY3
XY2</p>
      <p>XY3
XY1
AB
XY1
AB
XY1</p>
      <p>AB
This experiment showcases the use of TRCD on data, i.e.,
without the use of an oracle to decide conditional
independence. Towards that end, we generated synthetic models,
generated data from these models, and applied TRCD on
them. The use of synthetic data allows us to have access
to the ground truth, so we can measure the accuracy of the
learned models. The data-generating process is described
in detail below.</p>
      <p>Using the same process as described in 8.1, we generated 5
synthetic schemas with 2 entities and 5 synthetic schemas
with 3 entities. For each schema, we generated 10 models
with number of dependencies ranging from 1 to 10. This
resulted in 100 different models. For each model we created
3 different relational skeletons over 300 timepoints. The
number of entity instances at each time point was drawn
from Poisson( ) + 1, where U (5; 10). The degree
distribution for the relationship instances was drawn from
Poisson( ) + 1, where U (1; 5). Regarding the
parameters of the graphical model, the marginals were
parameterized as normal distributions N ( ; ), where U (0; 5)
and U (0; 1). The conditional distribution for a
relational variable X was PY 2parents(X) avg (Y ) + 0:1
N (0; 1). This resulted in 300 datasets, each over 300 time
points.</p>
      <p>In order to assess statistical independence, we fitted a
standard linear least-squares regression equation to the
outcome variable using the treatment and the variables in the
conditioning set as covariates. For relational variables in
the conditioning set, we used the average as the aggregation
function. Then, we directly used the t-test of the coefficient
of the treatment variable to assess independence (p &gt; 0:05
or effect size &lt; 0:01). Figure 9 shows average precision
and recall of TRCD after Phase I and Phase II, when
applied to the synthetic datasets. While precision after Phase
I is more than 0.75 in most cases, the recall after Phase
I is relatively low. That implies that we concluded
independence (and therefore we removed an edge) more often
than we should. This corresponds to Type II errors and can
be attributed to the lack of good conditional independence
tests for relational data.</p>
      <p>Finally, to demonstrate the difference in expressiveness
between TRCD and RCD (the only constraint-based
algorithm for relational data), we ran RCD on a “temporally
flattened”3 version of the synthetic data. The true model
and the model that TRCD learned are shown in Figure 10.
The model learned by RCD is shown in Figure 11. TRCD
correctly learns and orients two of the edges, with the
correct path specification. Those dependencies cannot even be
expressed in the space of dependencies for RCD.</p>
      <p>3RCD is ignoring temporal information.
uniquely identified by instance id and time point.</p>
      <p>An instance is
[At+1,At,ABt].XY 2 ! [At+1].X2
[ABt+1,At+1,ABt+1,ABt].XY 1 ! [ABt+1].XY 3
[At+1,At].X2 ! [At+1].X3
[Bt+1,ABt+1,At+1,ABt+1].XY 3 ! [Bt+1].Y 1
X1</p>
      <p>X2
X3</p>
      <p>A</p>
      <p>XY2</p>
      <p>XY3
For the propositional case, there are several approaches
for learning the structure of temporal probabilistic
models. Most of them are not constraint-based methods, but
follow the search-and-score paradigm. Friedman et al.
[1998] present an algorithm to learn the structure of DBNs
from complete data using a search-and-score algorithm,
and from incomplete data using structural EM. La¨hdesma¨ki
and Shmulevich [2008] use Bayesian methods to learn
the structure of a DBN from steady state measurements
or from time-series data and steady state measurements.
Robinson and Hartemink [2010] present an MCMC
algorithm to learn the structure of a DBN that changes over
time. A different approach that learns a causal temporal
model from time series data is the difference-based
causality learner [Voortman et al., 2010]. This framework is
based on dynamic Structural Equation Models.</p>
      <p>A constraint-based method for temporal propositional
domains is presented by Entner and Hoyer [2010] who
extend the FCI algorithm [Spirtes et al., 2000] to temporal
domains. FCI relaxes the causal sufficiency assumption,
i.e., it allows the presence of latent common causes. Our
approach is the first approach that uses a constraint-based
algorithm for data that is both temporal and relational
(although under the assumption of causal sufficiency).
Another widely used method for inferring causal
relationships from temporal data is Granger causality [Granger,
1969]. The main idea underlying Granger causality is that
)
n
a
em 1.00
e
h
tf
o
r
rro 0.75
e
d
r
a
d
n
tsa 0.50
h
it
w
ll(
a
ce 0.25
R
/
n
o
ii
s
c
e
r
P</p>
      <p>TRCD on temporal data
a cause should improve the predictive accuracy of its
effect, compared to predictions based solely on the effect’s
past values. There has been work in extending Granger
causality for multivariate settings, as well as in combining
Granger causality with graphical models [Eichler, 2012].
Liu et al. [2010] propose a regularized Hidden Markov
Random Field regression to learn the structure of a
temporal causal graph from multivariate time-series data.
However, the methods used for learning the structure of the
causal model are not constraint-based.</p>
      <p>Another line of work in learning temporal and relational
models stems from combining first-order logic with
probabilistic frameworks. Logical Hidden Markov Models
extend Hidden Markov Models to handle relational (non-flat
data) [Kersting et al., 2006]. Kersting and Raiko [2005]
provide an EM-based algorithm to learn the structure of
LHMMs.</p>
      <p>In terms of representation, Manfredotti [2009] introduces
relational dynamic Bayesian networks (RDBNs), a
firstorder logic-based extension of dynamic Bayesian
networks. RDBNs are similar to the relational model we
define; however, we provide an explicit characterization for
the space of relational paths, and subsequently, the space
of relational dependencies. To be more specific, temporal
relational paths in our framework are restricted to
conjunctions of predicates that correspond to possible traversals of
the relational schema. This restriction, together with the
domain specific hop threshold, allows us to enumerate the
space of potential dependencies to learn.
10</p>
      <p>CONCLUSIONS AND FUTURE WORK
In this paper we presented a formalization of temporal
relational models, and we extended the theory of relational
d-separation to the temporal domain. We presented a
constraint-based algorithm, TRCD, that leverages the
notion of temporal relational d-separation to learn the causal
structure of temporal relational models from data. We
showed that the algorithm is sound and complete, and we
provided experimental evidence that showcases the
correctness of TRCD. Finally, we showed the improvement that
TRCD achieves compared to RCD when applied to
domains with a temporal component.</p>
      <p>TRCD makes certain simplifying assumptions. Future
work could focus on relaxing some of those assumptions,
specifically, allowing the structure of the causal model to
change over time (change point detection for relational
data). Another avenue for future research is relaxing the
causal sufficiency assumption by employing techniques
such as blocking [Rattigan et al., 2011] for temporal
relational domains. Finally, an important issue that arises
when modelling time as a discrete quantity is how to choose
the appropriate granularity of time points. Ribeiro et al.
[2012] provide an analysis on how aggregating at a given
time granularity affects the characterization of the
underlying temporal process. Continuous time Bayesian
networks [Nodelman et al., 2002, 2003] provide a way around
this by allowing each variable to be modeled at a different
time granularity.</p>
      <p>Acknowledgements
Funding was provided by the U.S. Army Research Office
(ARO) and Defense Advanced Research Projects Agency
(DARPA) under Contract Number W911NF-11-C-0088.
The content of the information in this document does not
necessarily reflect the position or the policy of the
Government, and no official endorsement should be inferred.
The U.S. Government is authorized to reproduce and
distribute reprints for Government purposes notwithstanding
any copyright notation here on.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Nathan</given-names>
            <surname>Eagle</surname>
          </string-name>
          and
          <string-name>
            <surname>Alex (Sandy) Pentland</surname>
          </string-name>
          .
          <article-title>Reality mining: Sensing complex social systems</article-title>
          .
          <source>Personal and Ubiquitous Computing</source>
          ,
          <volume>10</volume>
          (
          <issue>4</issue>
          ):
          <fpage>255</fpage>
          -
          <lpage>268</lpage>
          ,
          <year>March 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Michael</given-names>
            <surname>Eichler</surname>
          </string-name>
          .
          <article-title>Graphical modelling of multivariate time series</article-title>
          .
          <source>Probability Theory and Related Fields</source>
          ,
          <volume>153</volume>
          (
          <issue>1- 2</issue>
          ):
          <fpage>233</fpage>
          -
          <lpage>268</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Doris</given-names>
            <surname>Entner</surname>
          </string-name>
          and
          <string-name>
            <given-names>Patrik</given-names>
            <surname>Hoyer</surname>
          </string-name>
          .
          <article-title>On causal discovery from time series data using FCI</article-title>
          .
          <source>In Proceedings of the 5th European Workshop on Probabilistic Graphical Models (PGM-2010)</source>
          , pages
          <fpage>121</fpage>
          -
          <lpage>128</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Nir</given-names>
            <surname>Friedman</surname>
          </string-name>
          , Kevin Murphy, and
          <string-name>
            <given-names>Stuart</given-names>
            <surname>Russell</surname>
          </string-name>
          .
          <article-title>Learning the structure of dynamic probabilistic networks</article-title>
          .
          <source>In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence</source>
          , pages
          <fpage>139</fpage>
          -
          <lpage>147</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Dan</given-names>
            <surname>Geiger</surname>
          </string-name>
          and
          <string-name>
            <given-names>Judea</given-names>
            <surname>Pearl</surname>
          </string-name>
          .
          <article-title>On the logic of causal models</article-title>
          .
          <source>In Proceedings of the Fourth Annual Conference on Uncertainty in Artificial Intelligence</source>
          , pages
          <fpage>136</fpage>
          -
          <lpage>147</lpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Clive W. J.</given-names>
            <surname>Granger</surname>
          </string-name>
          .
          <article-title>Investigating causal relations by econometric models and cross-spectral methods</article-title>
          .
          <source>Econometrica</source>
          ,
          <volume>37</volume>
          (
          <issue>3</issue>
          ):
          <fpage>424</fpage>
          -
          <lpage>38</lpage>
          ,
          <year>July 1969</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Kristian</given-names>
            <surname>Kersting</surname>
          </string-name>
          and
          <string-name>
            <given-names>Tapani</given-names>
            <surname>Raiko</surname>
          </string-name>
          . “
          <string-name>
            <surname>Say</surname>
            <given-names>EM</given-names>
          </string-name>
          ”
          <article-title>for selecting probabilistic models for logical sequences</article-title>
          .
          <source>In Proceedings of the Twenty-First Conference in Uncertainty in Artificial Intelligence</source>
          , pages
          <fpage>300</fpage>
          -
          <lpage>307</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Kristian</given-names>
            <surname>Kersting</surname>
          </string-name>
          , Luc De Raedt, and
          <string-name>
            <given-names>Tapani</given-names>
            <surname>Raiko</surname>
          </string-name>
          .
          <article-title>Logical hidden Markov models</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>25</volume>
          :
          <fpage>425</fpage>
          -
          <lpage>456</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Harri</given-names>
            <surname>La</surname>
          </string-name>
          <article-title>¨hdesma¨ki and Ilya Shmulevich. Learning the structure of dynamic Bayesian networks from time series and steady state measurements</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>71</volume>
          (
          <issue>2- 3</issue>
          ):
          <fpage>185</fpage>
          -
          <lpage>217</lpage>
          ,
          <year>June 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Yan</given-names>
            <surname>Liu</surname>
          </string-name>
          , Alexandru Niculescu-Mizil,
          <string-name>
            <given-names>Aurelie C.</given-names>
            <surname>Lozano</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Yong</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <article-title>Learning temporal causal graphs for relational time-series analysis</article-title>
          .
          <source>In Proceedings of the TwentySeventh International Conference on Machine Learning</source>
          , pages
          <fpage>687</fpage>
          -
          <lpage>694</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Marc</given-names>
            <surname>Maier</surname>
          </string-name>
          , Katerina Marazopoulou, David Arbour,
          <string-name>
            <given-names>and David</given-names>
            <surname>Jensen</surname>
          </string-name>
          .
          <article-title>A sound and complete algorithm for learning causal models from relational data</article-title>
          .
          <source>In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence</source>
          , pages
          <fpage>371</fpage>
          -
          <lpage>380</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Marc</given-names>
            <surname>Maier</surname>
          </string-name>
          , Katerina Marazopoulou, and
          <string-name>
            <given-names>David</given-names>
            <surname>Jensen</surname>
          </string-name>
          .
          <article-title>Reasoning about Independence in Probabilistic Models of Relational Data</article-title>
          .
          <source>arXiv:1302.4381</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Cristina E.</given-names>
            <surname>Manfredotti</surname>
          </string-name>
          .
          <article-title>Modeling and Inference with Relational Dynamic Bayesian Networks</article-title>
          .
          <source>PhD thesis</source>
          , University of Milano,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Kevin P.</given-names>
            <surname>Murphy</surname>
          </string-name>
          .
          <article-title>Dynamic Bayesian Networks: Representation, Inference and Learning</article-title>
          .
          <source>PhD thesis</source>
          , University of California, Berkeley,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Uri</given-names>
            <surname>Nodelman</surname>
          </string-name>
          ,
          <string-name>
            <surname>Christian R. Shelton</surname>
            , and
            <given-names>Daphne</given-names>
          </string-name>
          <string-name>
            <surname>Koller</surname>
          </string-name>
          .
          <article-title>Continuous time Bayesian networks</article-title>
          .
          <source>In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence</source>
          , pages
          <fpage>378</fpage>
          -
          <lpage>387</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Uri</given-names>
            <surname>Nodelman</surname>
          </string-name>
          ,
          <string-name>
            <surname>Christian R. Shelton</surname>
            , and
            <given-names>Daphne</given-names>
          </string-name>
          <string-name>
            <surname>Koller</surname>
          </string-name>
          .
          <article-title>Learning continuous time Bayesian networks</article-title>
          .
          <source>In Proceedings of the Nineteenth International Conference on Uncertainty in Artificial Intelligence</source>
          , pages
          <fpage>451</fpage>
          -
          <lpage>458</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Judea</given-names>
            <surname>Pearl</surname>
          </string-name>
          and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Verma</surname>
          </string-name>
          .
          <article-title>A theory of inferred causation</article-title>
          . In J. Allen,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fikes</surname>
          </string-name>
          , and E. Sandewall, editors,
          <source>Principles of Knowledge Representation and Reasoning: Proceeding of the Second International Conference</source>
          , pages
          <fpage>441</fpage>
          -
          <lpage>452</lpage>
          . Morgan Kaufmann,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Matthew J.H.</given-names>
            <surname>Rattigan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Marc</given-names>
            <surname>Maier</surname>
          </string-name>
          , and David Jensen.
          <article-title>Relational blocking for causal discovery</article-title>
          .
          <source>In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence</source>
          , pages
          <fpage>145</fpage>
          -
          <lpage>151</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Bruno F. Ribeiro</surname>
            , Nicola Perra, and
            <given-names>Andrea</given-names>
          </string-name>
          <string-name>
            <surname>Baronchelli</surname>
          </string-name>
          .
          <article-title>Quantifying the effect of temporal resolution in timevarying network</article-title>
          .
          <source>CoRR, abs/1211.7052</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Joshua W.</given-names>
            <surname>Robinson</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alexander J.</given-names>
            <surname>Hartemink</surname>
          </string-name>
          .
          <article-title>Learning non-stationary dynamic Bayesian networks</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>11</volume>
          :
          <fpage>3647</fpage>
          -
          <lpage>3680</lpage>
          ,
          <year>December 2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Peter</given-names>
            <surname>Spirtes</surname>
          </string-name>
          , Clark Glymour, and Richard Scheines. Causation,
          <article-title>Prediction and Search</article-title>
          . MIT Press, Cambridge, MA, 2nd edition,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Verma</surname>
          </string-name>
          and
          <string-name>
            <given-names>Judea</given-names>
            <surname>Pearl</surname>
          </string-name>
          .
          <article-title>Causal networks: Semantics and expressiveness</article-title>
          .
          <source>In Proceedings of the Fourth Annual Conference on Uncertainty in Artificial Intelligence</source>
          , pages
          <fpage>352</fpage>
          -
          <lpage>359</lpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Mark</given-names>
            <surname>Voortman</surname>
          </string-name>
          , Denver Dash, and
          <string-name>
            <given-names>Marek J.</given-names>
            <surname>Druzdzel</surname>
          </string-name>
          .
          <article-title>Learning why things change: The difference-based causality learner</article-title>
          .
          <source>In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence</source>
          , pages
          <fpage>641</fpage>
          -
          <lpage>650</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>