=Paper= {{Paper |id=Vol-1504/uai2015aci_paper3 |storemode=property |title=Causal Reasoning for Events in Continuous Time: A Decision - Theoretic Approach |pdfUrl=https://ceur-ws.org/Vol-1504/uai2015aci_paper3.pdf |volume=Vol-1504 |dblpUrl=https://dblp.org/rec/conf/uai/Didelez15 }} ==Causal Reasoning for Events in Continuous Time: A Decision - Theoretic Approach == https://ceur-ws.org/Vol-1504/uai2015aci_paper3.pdf

Causal Reasoning for Events in Continuous Time:
A Decision–Theoretic Approach

Vanessa Didelez
School of Mathematics
University of Bristol
Vanessa.Didelez@bristol.ac.uk

Abstract 2 LOCAL INDEPENDENCE
GRAPHS
The dynamics of events occurring in continu-
ous time can be modelled using marked point The notion of dynamic dependence on which we focus
processes, or multi-state processes. Here, we here can be stated as follows. For stochastic processes
review and extend the work of Røysland et X(t), Y (t), Z(t) we say informally that X(t) is locally
al. (2015) on causal reasoning with local inde- independent of Y (t) given Z(t) if the present of X(t)
pendence graphs for marked point processes is independent of the past of Y (t) given the past of
in the context of survival analysis. We relate both X(t), Z(t). Slightly more formally we can write
the results to the decision-theoretic approach this as
of Dawid & Didelez (2010) using influence X(t)⊥⊥ FtY− | FtX,Z
−

diagrams, and present additional identifying where Ftk are filtrations generated by Xk (t), i.e. the
conditions. sets of information becoming available over time. Note
that this is an asymmetric type of independence as
discussed in detail in Didelez (2006).
1 INTRODUCTION
Marked Point Processes
Dynamic dependence structures among the occurrence
More formally we consider a marked point process
of different types of events in continuous time can
(MPP) to describe the occurrence of different types
be represented by local independence graphs as de-
of events E; this can be represented by a set of count-
veloped by Didelez (2006, 2007, 2008). In related
ing processes {Nj (t)} for each type of event j ∈ E. It
work, Røysland (2011, 2012) showed how causal in-
may often be too detailed to model the dependence
ference based on inverse probability weighting (IPW),
structure between all possible types of events; e.g. the
well known for longitudinal data (Robins et al., 2000),
event ‘stop treatment’ can necessarily only happen
can be extended to the continuous-time situation us-
after the event ‘start treatment’ and the two events
ing a martingale approach. Røysland et al. (2015)
are therefore trivially dependent. Instead of a MPP
combine these and give graphical rules for the iden-
one can therefore group certain events together to ob-
tifiability of the effect of interventions, which in the
tain a multi-state process with several components
context of events in time take the form of changes to
YV (t) = Y(t) = (Y1 (t), . . . , YK (t)), V = 1, . . . , K,
the intensities of specific processes, e.g. a treatment
where e.g. Yk (t) describes the treatment process with
process.
states ‘on / off treatment’. Note that the components
As we discuss here, the approach of Røysland et al. Yk (t) need to be such that none of them systematically
(2015) can be seen as the time-continuous version of change state at the same time, i.e. Y(t) is composable
Dawid & Didelez (2010), who develop a decision the- (see Didelez, 2007). Further each Yk (t) can be de-
oretic approach for sequential decisions in longitudi- scribed by a set of counting processes, one for each
nal settings and use a graphical representation with change of state, so that the whole Y(t) is itself an
influence diagrams that include decision nodes. This MPP. In the following we will not clearly distinguish
provides an explicit representation of the target of in- between a component Yk (t) of a composable multi-
ference as well as allowing us to to use simple graphical state process, or a counting process Nj (t) for an indi-
rules to check identifiability. vidual event.
Under mild regularity conditions, the Doob–Meyer Y1 Y2
Theorem tells us that each counting process can be
decomposed:

Yk (t) = Λk (t) + Mk (t) , Y4 Y3
| {z } | {z }
predictable martingale
Figure 1: A local independence graph.
where Λk (t) is predictable based on the history FtV−
of whole YV and Mk (t) is an FtV –martingale. We
will assume that the FtV –intensity processes λk (t) exist nodes in set B; then we say that C δ–separates A from
and have the following interpretation: B in the local independence graph G if it separates A
Z t and B in the undirected graph (GB m
An(A∪B∪C) ) ob-
Λk (t) = λk (s)ds, λk (t)dt = E(Nk (dt) | FtV− ). tained by moralising the subgraph of GB on the an-
0 cestral set An(A ∪ B ∪ C). Note that δ–separation
is asymmetric, i.e. δ–separating A from B is not the
Local Independence same as B from A. Meek (2014) introduces self-edges
From the above we see that λk (t) fully describes the de- so to be able to distinguish the case where a process
pendence of a process’ infinitesimal short-term expec- is locally independent of itself or not, and generalises
tation on the past. Any independencies must therefore the above to δ ∗ –separation.
be reflected in the structure of the intensity; if we find, A key result of Didelez (2008) is that, under mild reg-
for instance, that λk (t) remains unchanged regardless ularity conditions, we have for subsets A, B, C ⊂ V :
of whether an event of type j 6= k has occurred in the
past, then we say there is a local independence. if C δ–separates A from B then YA →
/ YB | YC .
Indeed, the formal definition is that Yk is locally in-
V \{j} The above is not obvious as the FtV –intensity and the
dependent of Yj given YV \{j,k} if λk (t) is Ft –
FtA∪B∪C –intensity of a process can be very different.
measurable, i.e. the intensity process remains the same
when information on the past of Yj is omitted. We Example I: The graph in Figure 1 encodes for in-
V \{j} stance that Y1 → / Y4 | (Y2 , Y3 ). Using δ–separation we
write this as Yj →/ Yk | YV \{j,k} . Note that Ft
always contains the past of the component Yk itself. can verify that this is not preserved without Y3 , i.e. it
Meek’s (2014) approach allows for cases where λk (t) is is not the case that Y1 → / Y4 | (Y2 ). This is because
Ft
V \{k}
–measurable. of the ‘selection effect’: knowing something about the
past of Y2 (t) makes the past of Y1 (t) informative for
Graphs and δ–Separation past of Y3 (t) and therefore predictive of Y4 (t).

The local independence graph G = (V, E) of a multi-
state process YV (t) = (Y1 (t), . . . , YK (t)) (or an MPP)
3 CAUSAL VALIDITY
is given such that the absence of a directed edge indi-
cates a local independence, i.e. So far we described a notion, and graphical represen-
tation, of dynamic (in)dependence based on how the
(j, k) ∈
/ E ⇒ Yj →
/ Yk |YV \{j,k} . present of a subprocess depends or not on the past of
other processes; in other words, a notion of time-lagged
The resulting graphs are directed, can have two di- (in)dependence. As it is based on the intensity process
rected edges between any two vertices, and can have it can be considered as characterised by infinitesimal
cycles. Note that pa(k)∩ch(k) 6= ∅ is possible, and short-term predictions, which is very much parallel to
similar for ancestors and decendants etc. so-called ‘Granger–causality’ (Granger, 1969). How-
Under regularity conditions, the definition implies that ever, much of the causal inference literature formalises
the intensity process λk for Yk is F cl(k) –measurable causality in terms of (sometimes hypothetical) inter-
(Didelez, 2008), where cl(k) is the closure (i.e. the set ventions. For instance a DAG is termed causal if the
of parents and k itself). set of variables XV is sufficiently ‘rich’ so that an in-
tervention that changes how a variable Xk is generated
As for conditional independence graphs, certain sep- corresponds to replacing p(xk |xpa(k) ) with a different
arations on a local independence graph imply further p̃(xk ) in the factorisation
local independencies. However, a different notion of
Y
separation is required, δ–separation: define GB as the p(xV ) = p(xi |xpa(i) ).
graph obtained after deleting all edges emanating from i∈V
Y1 Y2 For these to be well-defined, in particular for P̃ << P ,
we need W (t) to be uniformly integrable which can
σ1 be interpreted as λk (t), λ̃k (t) not being ‘too different’,
e.g. W (t) could be uniformly bounded. In fact, if Λk (t)
Y4 Y3
is assumed absolutely continuous such that λk (t) ex-
ists, then it is e.g. not possible to re-weight with an
Figure 2: An augmented local independence graph intervention that has discrete jumps of Nk (t) at fixed
with intervention indicator σ1 . time points. Note that this can be regarded as corre-
spondent of the ‘positivity’ condition typically made
in many causal inference contexts.
Røysland et al. (2015) extend this notion of interven-
tion to local independence graphs by assuming that
the intervention replaces the intensity process λk of Censoring and Re-Weighting
Yk by a different one λ̃k , which will typically be mea-
surable with respect to a smaller subset of processes, In the context of survival or duration data it is al-
e.g. those relevant to and observable by the decision most inevitable to have censoring (e.g. due to the end
maker. of the study). Censoring in itself can be regarded as
an event and modelled with a counting process that
Remember that for a given local independence graph jumps when the observation is censored. This then al-
G, each intensity process λk is F cl(k) –measurable. lows us to express assumptions about the censoring in
Røysland et al. (2015) then define this graph to be terms of its intensity process. A common assumption
causally valid for an intervention in Yk if this corre- is independent censoring which can be stated as the
sponds to replacing λk by λ̃k while all other intensities relevant process (e.g. survival) being locally indepen-
λj , j 6= k, remain the same under the intervention. dent of the censoring process, possibly conditional on
other observed processes. The most obvious violation
Intervention Indicator of this assumption occurs when there are unobserved
common causes for censoring and survival.
In analogy to the influence diagrams of Dawid (2002,
2012), it can be helpful to indicate graphically that Moreover, censoring can be linked to the above ideas
an intervention modifying the intensity of Yk is being of intervention and re-weighting in the following sense.
considered, by adding an intervention node σk . For The target of inference is typically a population where
the basic set-up chosen here, σk would itself not be a no censoring occurs (e.g. future patients) or where cen-
process and simply take values in {o, e} to indicate the soring is entirely random and stochastically indepen-
original system with intensity λk when σk = o, or the dent of other processes. Hence we can say that the
intervened system with intensity λ̃k when σk = e. The target is to replace the censoring intensity by a differ-
absence of any edges involving σk other than σk −→ Yk ent intensity that does not depend on the past. When
then represents the causal validity assumption, in anal- this is possible given the observed processes there-
ogy to extended stability of Dawid & Didelez (2010). fore depends among others on whether the local in-
dependence graph on all events including censoring is
Example I (ctd.): The graph in Figure 2 is aug- causally valid wrt. the censoring process. Røysland
mented with the intervention node σ1 to indicate that et al. (2015) discuss this further and give an example
Y1 is subject to possibly different intensities in the where censoring is independent, but based on a lo-
two different regimes. The absence of edges between cal independence graph that is not causally valid and
σ1 and other nodes indicates that their observational hence leading to incorrect inference. For the remainder
cl(k)
Ft –intensities remain the same under intervention. of the paper here we do not further consider censoring.

Re-Weighting
4 IDENTIFICATION
Similar to the case of longitudinal data, it turns out
that inference about the dynamics between events
under the intervened system can be obtained by In the following we assume that the index set of pro-
re-weighting. Specifically the weights are given as cesses is V = V0 ∪ X ∪ L ∪ U where V0 are observ-
W (t) := able processes of interest (‘outcome’ processes), X (or
counting process NX ) is the process in which we want
!∆Nk (s) Z t to intervene changing its intensity, L is a set of observ-
Y λ̃k (s) able processes in which we are not interested, and U
exp λk (s) − λ̃k (s)ds .
λk (s) 0 is a set of unobservable processes.
s≤t
Definition 1: σX X U1
Let G be the local independence graph for processes
V = V0 ∪ X ∪ L ∪ U ; assume causal validity wrt. X.
Consider an intervention in X that changes its obser-
vational F V –intensity λX to a F V0 –intensity λ̃X . We U2 L V0
say that the effect of such an intervention on V0 is
identified by L if the F V0 –intensities for every count-
Figure 3: An augmented local independence satisfying
ing process N ∈ V0 under the intervention exist and
simple stability.
are given by re-weighting with the above weights W (t).
Røysland et al. (2015) show the following sufficient
condition for identification: & Didelez’ (2010) notion of ‘sequential irrelevance’;
this condition allows unobserved processes in U to af-
Proposition 2: fect the treatment process X as long as they are ‘ir-
In the situation of Definition 1, if U →
/ X | (V0 ∪ L), relevant’ to the other processes of interest.
then the effect on V0 of intervening in X is identified
by L. Corollary 6:
Assume the preconditions of Definition 1, and the aug-
Example I (ctd.): In Figure 2, assume we are in- mented local independence graph Gσ (i.e. causal valid-
terested in the effect of an intervention in X = Y1 on ity wrt. X). Then U → / (V0 ∪ L) | X implies simple
V0 = Y4 and let L = Y2 and U = Y3 . Then we see that stability.
Proposition 2 is satisfied, meaning that re-weighting
will allow us to compute aspects of the possibly modi- Both, Corollary 5 and 6 are sufficient but not necessary
fied behaviour of Y4 under an intervention that changes for simple stability as the following example demon-
the intensity process of Y1 , where the weights require strates.
no observation of Y3 . Example II: The graph in Figure 3 shows a situa-
The condition of Proposition 2 is the point pro- tion where U = (U1 , U2 ) satisfies neither Proposition
cess analogue of sequential randomisation in Dawid & 2 nor Corollary 6. However, simple stability is satis-
Didelez (2010); it is in fact satisfied iff U ∩pa(X) = ∅. fied. Note that U1 alone fulfills Corollary 6 and U2
In other words, it formalises the notion that given the alone Proposition 2. All these would be destroyed by
past of observed processes, X(t) is at any time t inde- an edge between U1 and U2 .
pendent of the past of unobserved processes. Dawid
& Didelez (2010) show that this implies ‘simple stabil- 5 DISCUSSION
ity’ which in turn is a sufficient identifying condition
for sequential interventions in their longitudinal (time- More generality? In the time-discrete case, more gen-
discrete) setting. Here, we define the time-continuous eral conditions for causal effect identification can and
marked point process analogue as follows. have been given than those analogous to simple sta-
Definition 3: bility. Specific to sequential decisions in longitudinal
With the preconditions of Definition 1, and the aug- data these are for example addressed in Pearl & Robins
mented local independence graph Gσ with intervention (1995), Robins (1997), Dawid & Didelez (2010; section
node σX , we define that simple stability holds if 8). It appears not straightforward to generalise these
to the time-continuous situation with local indepen-
σX →
/ (L ∪ V0 ) | X. dence graphs considered here, as it assumes stationar-
ity of the dependence structure, while such more gen-
We conjecture that identification can in fact be ob- eral criteria are typically relevant when the structure
tained under the wider assumption of simple stability. changes over time. However, it is possible to generalise
local independence graphs to some extend in order to
Conjecture 4:
take non-stationarity of (in)dependencies into account,
Assume the preconditions of Definition 1, and the aug-
e.g. some independencies might hold before a certain
mented local independence graph Gσ (i.e. causal valid-
event has happened and others afterwards leading to
ity wrt. X).
a sequence of graphs that are valid in intervals defined
If simple stability holds, then the effect on V0 of inter-
by stopping times (Didelez, 2008).
vening in X is identified by L.
Why an intervention indicator? The decision theoretic
Corollary 5:
approach to causality makes it formally and graphi-
The condition of Proposition 2 implies simple stability.
cally explicit that an intervention in a particular node
We can now formulate a result corresponding to Dawid is being considered and what assumptions are involved
(Dawid, 2012). This allows greater clarity, e.g. regard- between U and V0 ∪ L at all. Hence, even if there are
ing the target of inference; but in our case it also allows moral edges between σX and U these do not lead to
to formulate conditions for identification that do not paths between V0 ∪ L and σX in the relevant moral
need to refer to or characterise unobservable processes graph and Definition 3 is satisfied.
U . The flip side is that one might miss an intuition
for what kinds of U violate the conditions, which may
impede justifying the assumption of simple stability. Acknowledgement
Here, we have linked the results to the notions of se-
quential randomisation / irrelevance of U which pro- Financial support from the Leverhulme Trust (RF–
vide some intuition. 2011–320) is gratefully acknowledged.

Causal Search? We assumed that the local indepen- References
dence graph is given and that subject matter knowl-
edge justifies causal validity wrt. certain events or pro- Dawid, A.P. (2002). Influence diagrams for causal
cesses. Meek (2014) addresses learning the graph. Un- modelling and inference. International Statistical
der a completeness assumption this is in principle (i.e. Review, 70:161-189.
given an oracle test for local independence) straight-
forward as there are no issues of Markov-equivalence Dawid, A.P. (2012). The Decision-Theoretic Ap-
due to the asymmetry of local independence in time, proach to Causal Inference. Chapter 4 in: Causal-
i.e. all edges can easily be oriented. Meek (2014) fur- ity – Statistical Perspectives and Applications
ther gives results for cases of unobserved processes, e.g. (eds. C.Berzuini, A.P.Dawid, L.Bernardinelli),
causal insufficiency. However, the main practical prob- Wiley.
lem in any real application will be a suitable test for Dawid, A.P., Didelez, V. (2010). Identifying the
local independence. In low-dimensional settings with consequences of dynamic treatment strategies: A
few events, this can be done almost non-parametrically decision theoretic overview. Statistics Surveys,
e.g. by testing equality of survival-curves; but in higher 4:184-231.
dimensions this becomes prohibitive. One could make
simplifying assumptions, such as assuming a Markov Didelez, V. (2006). Asymmetric separation for local
process; in this context it is important to be aware that independence graphs. In Proc. of 22nd UAI Con-
if YV (t) is Markov, then a subprocess YA (t), A ⊂ V ference, 130-137. AUAI Press.
is typically not.
Didelez, V. (2007). Graphical models for composable
finite Markov processes. Scandinavian Journal of
APPENDIX Statistics, 34:169-185.
Proof of Conjecture 4: see Røysland & Didelez (2015). Didelez, V. (2008). Graphical models for marked
point processes based on local independence.
Proof of Corollary 5: JRSSB, 70(1):245-264.
Remember that in the augmented local independence Granger, C.W.J. (1969). Investigating causal rela-
graph Gσ , assuming causal validity wrt. X, there tions by econometric models and cross-spectral
is only a single edge involving σX pointing into X. methods. Econometrica 37:424-438.
Further, the condition of Proposition 2 is satisfied
iff U ∩pa(X) = ∅ in G. The graphical check of Meek, C. (2014). Toward learning graphical and
δ–separation for simple stability involves removing all causal process models. In Proc. of 31th UAI Con-
outgoing edges from V0 ∪ L; in the resulting graph ference Causality Workshop, 43-48. AUAI Press.
before moralisation, there are no edges into X except
the one from σX . Hence, in the moral graph, σX only Pearl, J., Robins, J. (1995). Probabilistic evalua-
has an edge with X and Definition 3 is satisfied. tion of sequential plans from causal models with
hidden variables. In Proc of 11th UAI Confer-
ence, 444-453. Morgan Kaufmann Publishers, San
Proof of Corollary 6: Francisco.
As above, in the augmented local independence graph
Gσ there is only a single edge involving σX pointing Robins, J.M. (1997). Causal inference from complex
into X. The graphical check of δ–separation for longitudinal data. In: Latent Variable Modeling
simple stability, furthermore, involves removing all and Applications to Causality, (ed. M. Berkane).
outgoing edges out of V0 ∪ L and with the condition Lecture Notes in Statistics 120:69-117. Springer,
of Corollary 6 this means that there are no edges New York.
Robins, J.M., Hernan, M.A., Brumback, B. (2000).
Marginal structural models and causal inference
in epidemiology. Epidemiology, 11:550-560.

Røysland, K. (2011). A martingale approach
to continuous time marginal structural models.
Bernoulli, 17(3):895-915.

Røysland, K. (2012). Counterfactual analyses with
graphical models based on local independence.
Annals of Statistics, 40(4):2162-2194.

Røysland, K., Didelez, V., Nygard, M., Lange, T.,
Aalen, O.O. (2015). Causal reasoning in sur-
vival analysis: re-weighting and local indepen-
dence graphs. Submitted.

Røysland, K., Didelez, V. (2015). General criteria
for identification of causal effects between events
in continuous time. In preparation.