=Paper= {{Paper |id=Vol-1504/uai2015aci_paper3 |storemode=property |title=Causal Reasoning for Events in Continuous Time: A Decision - Theoretic Approach |pdfUrl=https://ceur-ws.org/Vol-1504/uai2015aci_paper3.pdf |volume=Vol-1504 |dblpUrl=https://dblp.org/rec/conf/uai/Didelez15 }} ==Causal Reasoning for Events in Continuous Time: A Decision - Theoretic Approach == https://ceur-ws.org/Vol-1504/uai2015aci_paper3.pdf
                Causal Reasoning for Events in Continuous Time:
                               A Decision–Theoretic Approach



                                               Vanessa Didelez
                                             School of Mathematics
                                              University of Bristol
                                        Vanessa.Didelez@bristol.ac.uk


                      Abstract                              2    LOCAL INDEPENDENCE
                                                                 GRAPHS
    The dynamics of events occurring in continu-
    ous time can be modelled using marked point             The notion of dynamic dependence on which we focus
    processes, or multi-state processes. Here, we           here can be stated as follows. For stochastic processes
    review and extend the work of Røysland et               X(t), Y (t), Z(t) we say informally that X(t) is locally
    al. (2015) on causal reasoning with local inde-         independent of Y (t) given Z(t) if the present of X(t)
    pendence graphs for marked point processes              is independent of the past of Y (t) given the past of
    in the context of survival analysis. We relate          both X(t), Z(t). Slightly more formally we can write
    the results to the decision-theoretic approach          this as
    of Dawid & Didelez (2010) using influence                                  X(t)⊥⊥ FtY− | FtX,Z
                                                                                               −

    diagrams, and present additional identifying            where Ftk are filtrations generated by Xk (t), i.e. the
    conditions.                                             sets of information becoming available over time. Note
                                                            that this is an asymmetric type of independence as
                                                            discussed in detail in Didelez (2006).
1   INTRODUCTION
                                                            Marked Point Processes
Dynamic dependence structures among the occurrence
                                                            More formally we consider a marked point process
of different types of events in continuous time can
                                                            (MPP) to describe the occurrence of different types
be represented by local independence graphs as de-
                                                            of events E; this can be represented by a set of count-
veloped by Didelez (2006, 2007, 2008). In related
                                                            ing processes {Nj (t)} for each type of event j ∈ E. It
work, Røysland (2011, 2012) showed how causal in-
                                                            may often be too detailed to model the dependence
ference based on inverse probability weighting (IPW),
                                                            structure between all possible types of events; e.g. the
well known for longitudinal data (Robins et al., 2000),
                                                            event ‘stop treatment’ can necessarily only happen
can be extended to the continuous-time situation us-
                                                            after the event ‘start treatment’ and the two events
ing a martingale approach. Røysland et al. (2015)
                                                            are therefore trivially dependent. Instead of a MPP
combine these and give graphical rules for the iden-
                                                            one can therefore group certain events together to ob-
tifiability of the effect of interventions, which in the
                                                            tain a multi-state process with several components
context of events in time take the form of changes to
                                                            YV (t) = Y(t) = (Y1 (t), . . . , YK (t)), V = 1, . . . , K,
the intensities of specific processes, e.g. a treatment
                                                            where e.g. Yk (t) describes the treatment process with
process.
                                                            states ‘on / off treatment’. Note that the components
As we discuss here, the approach of Røysland et al.         Yk (t) need to be such that none of them systematically
(2015) can be seen as the time-continuous version of        change state at the same time, i.e. Y(t) is composable
Dawid & Didelez (2010), who develop a decision the-         (see Didelez, 2007). Further each Yk (t) can be de-
oretic approach for sequential decisions in longitudi-      scribed by a set of counting processes, one for each
nal settings and use a graphical representation with        change of state, so that the whole Y(t) is itself an
influence diagrams that include decision nodes. This        MPP. In the following we will not clearly distinguish
provides an explicit representation of the target of in-    between a component Yk (t) of a composable multi-
ference as well as allowing us to to use simple graphical   state process, or a counting process Nj (t) for an indi-
rules to check identifiability.                             vidual event.
Under mild regularity conditions, the Doob–Meyer                                    Y1            Y2
Theorem tells us that each counting process can be
decomposed:

            Yk (t) =     Λk (t)      +    Mk (t) ,                                  Y4             Y3
                         | {z }           | {z }
                       predictable       martingale
                                                                      Figure 1: A local independence graph.
where Λk (t) is predictable based on the history FtV−
of whole YV and Mk (t) is an FtV –martingale. We
will assume that the FtV –intensity processes λk (t) exist    nodes in set B; then we say that C δ–separates A from
and have the following interpretation:                        B in the local independence graph G if it separates A
           Z t                                                and B in the undirected graph (GB               m
                                                                                                  An(A∪B∪C) ) ob-
  Λk (t) =     λk (s)ds, λk (t)dt = E(Nk (dt) | FtV− ).       tained by moralising the subgraph of GB on the an-
            0                                                 cestral set An(A ∪ B ∪ C). Note that δ–separation
                                                              is asymmetric, i.e. δ–separating A from B is not the
Local Independence                                            same as B from A. Meek (2014) introduces self-edges
From the above we see that λk (t) fully describes the de-     so to be able to distinguish the case where a process
pendence of a process’ infinitesimal short-term expec-        is locally independent of itself or not, and generalises
tation on the past. Any independencies must therefore         the above to δ ∗ –separation.
be reflected in the structure of the intensity; if we find,   A key result of Didelez (2008) is that, under mild reg-
for instance, that λk (t) remains unchanged regardless        ularity conditions, we have for subsets A, B, C ⊂ V :
of whether an event of type j 6= k has occurred in the
past, then we say there is a local independence.                  if C δ–separates A from B then YA →
                                                                                                    / YB | YC .
Indeed, the formal definition is that Yk is locally in-
                                                 V \{j}       The above is not obvious as the FtV –intensity and the
dependent of Yj given YV \{j,k} if λk (t) is Ft         –
                                                              FtA∪B∪C –intensity of a process can be very different.
measurable, i.e. the intensity process remains the same
when information on the past of Yj is omitted. We             Example I: The graph in Figure 1 encodes for in-
                                                  V \{j}      stance that Y1 → / Y4 | (Y2 , Y3 ). Using δ–separation we
write this as Yj →/ Yk | YV \{j,k} . Note that Ft
always contains the past of the component Yk itself.          can verify that this is not preserved without Y3 , i.e. it
Meek’s (2014) approach allows for cases where λk (t) is       is not the case that Y1 →   / Y4 | (Y2 ). This is because
Ft
  V \{k}
         –measurable.                                         of the ‘selection effect’: knowing something about the
                                                              past of Y2 (t) makes the past of Y1 (t) informative for
Graphs and δ–Separation                                       past of Y3 (t) and therefore predictive of Y4 (t).

The local independence graph G = (V, E) of a multi-
state process YV (t) = (Y1 (t), . . . , YK (t)) (or an MPP)
                                                              3    CAUSAL VALIDITY
is given such that the absence of a directed edge indi-
cates a local independence, i.e.                              So far we described a notion, and graphical represen-
                                                              tation, of dynamic (in)dependence based on how the
            (j, k) ∈
                   / E ⇒ Yj →
                            / Yk |YV \{j,k} .                 present of a subprocess depends or not on the past of
                                                              other processes; in other words, a notion of time-lagged
The resulting graphs are directed, can have two di-           (in)dependence. As it is based on the intensity process
rected edges between any two vertices, and can have           it can be considered as characterised by infinitesimal
cycles. Note that pa(k)∩ch(k) 6= ∅ is possible, and           short-term predictions, which is very much parallel to
similar for ancestors and decendants etc.                     so-called ‘Granger–causality’ (Granger, 1969). How-
Under regularity conditions, the definition implies that      ever, much of the causal inference literature formalises
the intensity process λk for Yk is F cl(k) –measurable        causality in terms of (sometimes hypothetical) inter-
(Didelez, 2008), where cl(k) is the closure (i.e. the set     ventions. For instance a DAG is termed causal if the
of parents and k itself).                                     set of variables XV is sufficiently ‘rich’ so that an in-
                                                              tervention that changes how a variable Xk is generated
As for conditional independence graphs, certain sep-          corresponds to replacing p(xk |xpa(k) ) with a different
arations on a local independence graph imply further          p̃(xk ) in the factorisation
local independencies. However, a different notion of
                                                                                         Y
separation is required, δ–separation: define GB as the                       p(xV ) =          p(xi |xpa(i) ).
graph obtained after deleting all edges emanating from                                   i∈V
                              Y1           Y2                     For these to be well-defined, in particular for P̃ << P ,
                                                                  we need W (t) to be uniformly integrable which can
                       σ1                                         be interpreted as λk (t), λ̃k (t) not being ‘too different’,
                                                                  e.g. W (t) could be uniformly bounded. In fact, if Λk (t)
                              Y4            Y3
                                                                  is assumed absolutely continuous such that λk (t) ex-
                                                                  ists, then it is e.g. not possible to re-weight with an
Figure 2: An augmented local independence graph                   intervention that has discrete jumps of Nk (t) at fixed
with intervention indicator σ1 .                                  time points. Note that this can be regarded as corre-
                                                                  spondent of the ‘positivity’ condition typically made
                                                                  in many causal inference contexts.
Røysland et al. (2015) extend this notion of interven-
tion to local independence graphs by assuming that
the intervention replaces the intensity process λk of             Censoring and Re-Weighting
Yk by a different one λ̃k , which will typically be mea-
surable with respect to a smaller subset of processes,            In the context of survival or duration data it is al-
e.g. those relevant to and observable by the decision             most inevitable to have censoring (e.g. due to the end
maker.                                                            of the study). Censoring in itself can be regarded as
                                                                  an event and modelled with a counting process that
Remember that for a given local independence graph                jumps when the observation is censored. This then al-
G, each intensity process λk is F cl(k) –measurable.              lows us to express assumptions about the censoring in
Røysland et al. (2015) then define this graph to be               terms of its intensity process. A common assumption
causally valid for an intervention in Yk if this corre-           is independent censoring which can be stated as the
sponds to replacing λk by λ̃k while all other intensities         relevant process (e.g. survival) being locally indepen-
λj , j 6= k, remain the same under the intervention.              dent of the censoring process, possibly conditional on
                                                                  other observed processes. The most obvious violation
Intervention Indicator                                            of this assumption occurs when there are unobserved
                                                                  common causes for censoring and survival.
In analogy to the influence diagrams of Dawid (2002,
2012), it can be helpful to indicate graphically that             Moreover, censoring can be linked to the above ideas
an intervention modifying the intensity of Yk is being            of intervention and re-weighting in the following sense.
considered, by adding an intervention node σk . For               The target of inference is typically a population where
the basic set-up chosen here, σk would itself not be a            no censoring occurs (e.g. future patients) or where cen-
process and simply take values in {o, e} to indicate the          soring is entirely random and stochastically indepen-
original system with intensity λk when σk = o, or the             dent of other processes. Hence we can say that the
intervened system with intensity λ̃k when σk = e. The             target is to replace the censoring intensity by a differ-
absence of any edges involving σk other than σk −→ Yk             ent intensity that does not depend on the past. When
then represents the causal validity assumption, in anal-          this is possible given the observed processes there-
ogy to extended stability of Dawid & Didelez (2010).              fore depends among others on whether the local in-
                                                                  dependence graph on all events including censoring is
Example I (ctd.): The graph in Figure 2 is aug-                   causally valid wrt. the censoring process. Røysland
mented with the intervention node σ1 to indicate that             et al. (2015) discuss this further and give an example
Y1 is subject to possibly different intensities in the            where censoring is independent, but based on a lo-
two different regimes. The absence of edges between               cal independence graph that is not causally valid and
σ1 and other nodes indicates that their observational             hence leading to incorrect inference. For the remainder
  cl(k)
Ft      –intensities remain the same under intervention.          of the paper here we do not further consider censoring.

Re-Weighting
                                                                  4    IDENTIFICATION
Similar to the case of longitudinal data, it turns out
that inference about the dynamics between events
under the intervened system can be obtained by                    In the following we assume that the index set of pro-
re-weighting. Specifically the weights are given as               cesses is V = V0 ∪ X ∪ L ∪ U where V0 are observ-
W (t) :=                                                          able processes of interest (‘outcome’ processes), X (or
                                                                  counting process NX ) is the process in which we want
                   !∆Nk (s)         Z t                         to intervene changing its intensity, L is a set of observ-
   Y     λ̃k (s)                                                  able processes in which we are not interested, and U
                              exp          λk (s) − λ̃k (s)ds .
         λk (s)                       0                           is a set of unobservable processes.
   s≤t
Definition 1:                                                                    σX     X         U1
Let G be the local independence graph for processes
V = V0 ∪ X ∪ L ∪ U ; assume causal validity wrt. X.
Consider an intervention in X that changes its obser-
vational F V –intensity λX to a F V0 –intensity λ̃X . We                    U2          L          V0
say that the effect of such an intervention on V0 is
identified by L if the F V0 –intensities for every count-
                                                             Figure 3: An augmented local independence satisfying
ing process N ∈ V0 under the intervention exist and
                                                             simple stability.
are given by re-weighting with the above weights W (t).
Røysland et al. (2015) show the following sufficient
condition for identification:                                & Didelez’ (2010) notion of ‘sequential irrelevance’;
                                                             this condition allows unobserved processes in U to af-
Proposition 2:                                               fect the treatment process X as long as they are ‘ir-
In the situation of Definition 1, if U →
                                       / X | (V0 ∪ L),       relevant’ to the other processes of interest.
then the effect on V0 of intervening in X is identified
by L.                                                        Corollary 6:
                                                             Assume the preconditions of Definition 1, and the aug-
Example I (ctd.): In Figure 2, assume we are in-             mented local independence graph Gσ (i.e. causal valid-
terested in the effect of an intervention in X = Y1 on       ity wrt. X). Then U → / (V0 ∪ L) | X implies simple
V0 = Y4 and let L = Y2 and U = Y3 . Then we see that         stability.
Proposition 2 is satisfied, meaning that re-weighting
will allow us to compute aspects of the possibly modi-       Both, Corollary 5 and 6 are sufficient but not necessary
fied behaviour of Y4 under an intervention that changes      for simple stability as the following example demon-
the intensity process of Y1 , where the weights require      strates.
no observation of Y3 .                                       Example II: The graph in Figure 3 shows a situa-
The condition of Proposition 2 is the point pro-             tion where U = (U1 , U2 ) satisfies neither Proposition
cess analogue of sequential randomisation in Dawid &         2 nor Corollary 6. However, simple stability is satis-
Didelez (2010); it is in fact satisfied iff U ∩pa(X) = ∅.    fied. Note that U1 alone fulfills Corollary 6 and U2
In other words, it formalises the notion that given the      alone Proposition 2. All these would be destroyed by
past of observed processes, X(t) is at any time t inde-      an edge between U1 and U2 .
pendent of the past of unobserved processes. Dawid
& Didelez (2010) show that this implies ‘simple stabil-      5   DISCUSSION
ity’ which in turn is a sufficient identifying condition
for sequential interventions in their longitudinal (time-    More generality? In the time-discrete case, more gen-
discrete) setting. Here, we define the time-continuous       eral conditions for causal effect identification can and
marked point process analogue as follows.                    have been given than those analogous to simple sta-
Definition 3:                                                bility. Specific to sequential decisions in longitudinal
With the preconditions of Definition 1, and the aug-         data these are for example addressed in Pearl & Robins
mented local independence graph Gσ with intervention         (1995), Robins (1997), Dawid & Didelez (2010; section
node σX , we define that simple stability holds if           8). It appears not straightforward to generalise these
                                                             to the time-continuous situation with local indepen-
                  σX →
                     / (L ∪ V0 ) | X.                        dence graphs considered here, as it assumes stationar-
                                                             ity of the dependence structure, while such more gen-
We conjecture that identification can in fact be ob-         eral criteria are typically relevant when the structure
tained under the wider assumption of simple stability.       changes over time. However, it is possible to generalise
                                                             local independence graphs to some extend in order to
Conjecture 4:
                                                             take non-stationarity of (in)dependencies into account,
Assume the preconditions of Definition 1, and the aug-
                                                             e.g. some independencies might hold before a certain
mented local independence graph Gσ (i.e. causal valid-
                                                             event has happened and others afterwards leading to
ity wrt. X).
                                                             a sequence of graphs that are valid in intervals defined
If simple stability holds, then the effect on V0 of inter-
                                                             by stopping times (Didelez, 2008).
vening in X is identified by L.
                                                             Why an intervention indicator? The decision theoretic
Corollary 5:
                                                             approach to causality makes it formally and graphi-
The condition of Proposition 2 implies simple stability.
                                                             cally explicit that an intervention in a particular node
We can now formulate a result corresponding to Dawid         is being considered and what assumptions are involved
(Dawid, 2012). This allows greater clarity, e.g. regard-      between U and V0 ∪ L at all. Hence, even if there are
ing the target of inference; but in our case it also allows   moral edges between σX and U these do not lead to
to formulate conditions for identification that do not        paths between V0 ∪ L and σX in the relevant moral
need to refer to or characterise unobservable processes       graph and Definition 3 is satisfied.
U . The flip side is that one might miss an intuition
for what kinds of U violate the conditions, which may
impede justifying the assumption of simple stability.         Acknowledgement
Here, we have linked the results to the notions of se-
quential randomisation / irrelevance of U which pro-          Financial support from the Leverhulme Trust (RF–
vide some intuition.                                          2011–320) is gratefully acknowledged.

Causal Search? We assumed that the local indepen-             References
dence graph is given and that subject matter knowl-
edge justifies causal validity wrt. certain events or pro-     Dawid, A.P. (2002). Influence diagrams for causal
cesses. Meek (2014) addresses learning the graph. Un-            modelling and inference. International Statistical
der a completeness assumption this is in principle (i.e.         Review, 70:161-189.
given an oracle test for local independence) straight-
forward as there are no issues of Markov-equivalence           Dawid, A.P. (2012). The Decision-Theoretic Ap-
due to the asymmetry of local independence in time,              proach to Causal Inference. Chapter 4 in: Causal-
i.e. all edges can easily be oriented. Meek (2014) fur-          ity – Statistical Perspectives and Applications
ther gives results for cases of unobserved processes, e.g.       (eds. C.Berzuini, A.P.Dawid, L.Bernardinelli),
causal insufficiency. However, the main practical prob-          Wiley.
lem in any real application will be a suitable test for        Dawid, A.P., Didelez, V. (2010). Identifying the
local independence. In low-dimensional settings with             consequences of dynamic treatment strategies: A
few events, this can be done almost non-parametrically           decision theoretic overview. Statistics Surveys,
e.g. by testing equality of survival-curves; but in higher       4:184-231.
dimensions this becomes prohibitive. One could make
simplifying assumptions, such as assuming a Markov             Didelez, V. (2006). Asymmetric separation for local
process; in this context it is important to be aware that         independence graphs. In Proc. of 22nd UAI Con-
if YV (t) is Markov, then a subprocess YA (t), A ⊂ V              ference, 130-137. AUAI Press.
is typically not.
                                                               Didelez, V. (2007). Graphical models for composable
                                                                  finite Markov processes. Scandinavian Journal of
APPENDIX                                                          Statistics, 34:169-185.
Proof of Conjecture 4: see Røysland & Didelez (2015).          Didelez, V. (2008). Graphical models for marked
                                                                  point processes based on local independence.
Proof of Corollary 5:                                             JRSSB, 70(1):245-264.
Remember that in the augmented local independence              Granger, C.W.J. (1969). Investigating causal rela-
graph Gσ , assuming causal validity wrt. X, there                 tions by econometric models and cross-spectral
is only a single edge involving σX pointing into X.               methods. Econometrica 37:424-438.
Further, the condition of Proposition 2 is satisfied
iff U ∩pa(X) = ∅ in G. The graphical check of                  Meek, C. (2014). Toward learning graphical and
δ–separation for simple stability involves removing all          causal process models. In Proc. of 31th UAI Con-
outgoing edges from V0 ∪ L; in the resulting graph               ference Causality Workshop, 43-48. AUAI Press.
before moralisation, there are no edges into X except
the one from σX . Hence, in the moral graph, σX only           Pearl, J., Robins, J. (1995). Probabilistic evalua-
has an edge with X and Definition 3 is satisfied.                 tion of sequential plans from causal models with
                                                                  hidden variables. In Proc of 11th UAI Confer-
                                                                  ence, 444-453. Morgan Kaufmann Publishers, San
Proof of Corollary 6:                                             Francisco.
As above, in the augmented local independence graph
Gσ there is only a single edge involving σX pointing           Robins, J.M. (1997). Causal inference from complex
into X. The graphical check of δ–separation for                  longitudinal data. In: Latent Variable Modeling
simple stability, furthermore, involves removing all             and Applications to Causality, (ed. M. Berkane).
outgoing edges out of V0 ∪ L and with the condition              Lecture Notes in Statistics 120:69-117. Springer,
of Corollary 6 this means that there are no edges                New York.
Robins, J.M., Hernan, M.A., Brumback, B. (2000).
  Marginal structural models and causal inference
  in epidemiology. Epidemiology, 11:550-560.

Røysland, K. (2011).       A martingale approach
   to continuous time marginal structural models.
   Bernoulli, 17(3):895-915.

Røysland, K. (2012). Counterfactual analyses with
   graphical models based on local independence.
   Annals of Statistics, 40(4):2162-2194.

Røysland, K., Didelez, V., Nygard, M., Lange, T.,
   Aalen, O.O. (2015). Causal reasoning in sur-
   vival analysis: re-weighting and local indepen-
   dence graphs. Submitted.

Røysland, K., Didelez, V. (2015). General criteria
   for identification of causal effects between events
   in continuous time. In preparation.