Causal Reasoning for Events in Continuous Time: A Decision–Theoretic Approach Vanessa Didelez School of Mathematics University of Bristol Vanessa.Didelez@bristol.ac.uk Abstract 2 LOCAL INDEPENDENCE GRAPHS The dynamics of events occurring in continu- ous time can be modelled using marked point The notion of dynamic dependence on which we focus processes, or multi-state processes. Here, we here can be stated as follows. For stochastic processes review and extend the work of Røysland et X(t), Y (t), Z(t) we say informally that X(t) is locally al. (2015) on causal reasoning with local inde- independent of Y (t) given Z(t) if the present of X(t) pendence graphs for marked point processes is independent of the past of Y (t) given the past of in the context of survival analysis. We relate both X(t), Z(t). Slightly more formally we can write the results to the decision-theoretic approach this as of Dawid & Didelez (2010) using influence X(t)⊥⊥ FtY− | FtX,Z − diagrams, and present additional identifying where Ftk are filtrations generated by Xk (t), i.e. the conditions. sets of information becoming available over time. Note that this is an asymmetric type of independence as discussed in detail in Didelez (2006). 1 INTRODUCTION Marked Point Processes Dynamic dependence structures among the occurrence More formally we consider a marked point process of different types of events in continuous time can (MPP) to describe the occurrence of different types be represented by local independence graphs as de- of events E; this can be represented by a set of count- veloped by Didelez (2006, 2007, 2008). In related ing processes {Nj (t)} for each type of event j ∈ E. It work, Røysland (2011, 2012) showed how causal in- may often be too detailed to model the dependence ference based on inverse probability weighting (IPW), structure between all possible types of events; e.g. the well known for longitudinal data (Robins et al., 2000), event ‘stop treatment’ can necessarily only happen can be extended to the continuous-time situation us- after the event ‘start treatment’ and the two events ing a martingale approach. Røysland et al. (2015) are therefore trivially dependent. Instead of a MPP combine these and give graphical rules for the iden- one can therefore group certain events together to ob- tifiability of the effect of interventions, which in the tain a multi-state process with several components context of events in time take the form of changes to YV (t) = Y(t) = (Y1 (t), . . . , YK (t)), V = 1, . . . , K, the intensities of specific processes, e.g. a treatment where e.g. Yk (t) describes the treatment process with process. states ‘on / off treatment’. Note that the components As we discuss here, the approach of Røysland et al. Yk (t) need to be such that none of them systematically (2015) can be seen as the time-continuous version of change state at the same time, i.e. Y(t) is composable Dawid & Didelez (2010), who develop a decision the- (see Didelez, 2007). Further each Yk (t) can be de- oretic approach for sequential decisions in longitudi- scribed by a set of counting processes, one for each nal settings and use a graphical representation with change of state, so that the whole Y(t) is itself an influence diagrams that include decision nodes. This MPP. In the following we will not clearly distinguish provides an explicit representation of the target of in- between a component Yk (t) of a composable multi- ference as well as allowing us to to use simple graphical state process, or a counting process Nj (t) for an indi- rules to check identifiability. vidual event. Under mild regularity conditions, the Doob–Meyer Y1 Y2 Theorem tells us that each counting process can be decomposed: Yk (t) = Λk (t) + Mk (t) , Y4 Y3 | {z } | {z } predictable martingale Figure 1: A local independence graph. where Λk (t) is predictable based on the history FtV− of whole YV and Mk (t) is an FtV –martingale. We will assume that the FtV –intensity processes λk (t) exist nodes in set B; then we say that C δ–separates A from and have the following interpretation: B in the local independence graph G if it separates A Z t and B in the undirected graph (GB m An(A∪B∪C) ) ob- Λk (t) = λk (s)ds, λk (t)dt = E(Nk (dt) | FtV− ). tained by moralising the subgraph of GB on the an- 0 cestral set An(A ∪ B ∪ C). Note that δ–separation is asymmetric, i.e. δ–separating A from B is not the Local Independence same as B from A. Meek (2014) introduces self-edges From the above we see that λk (t) fully describes the de- so to be able to distinguish the case where a process pendence of a process’ infinitesimal short-term expec- is locally independent of itself or not, and generalises tation on the past. Any independencies must therefore the above to δ ∗ –separation. be reflected in the structure of the intensity; if we find, A key result of Didelez (2008) is that, under mild reg- for instance, that λk (t) remains unchanged regardless ularity conditions, we have for subsets A, B, C ⊂ V : of whether an event of type j 6= k has occurred in the past, then we say there is a local independence. if C δ–separates A from B then YA → / YB | YC . Indeed, the formal definition is that Yk is locally in- V \{j} The above is not obvious as the FtV –intensity and the dependent of Yj given YV \{j,k} if λk (t) is Ft – FtA∪B∪C –intensity of a process can be very different. measurable, i.e. the intensity process remains the same when information on the past of Yj is omitted. We Example I: The graph in Figure 1 encodes for in- V \{j} stance that Y1 → / Y4 | (Y2 , Y3 ). Using δ–separation we write this as Yj →/ Yk | YV \{j,k} . Note that Ft always contains the past of the component Yk itself. can verify that this is not preserved without Y3 , i.e. it Meek’s (2014) approach allows for cases where λk (t) is is not the case that Y1 → / Y4 | (Y2 ). This is because Ft V \{k} –measurable. of the ‘selection effect’: knowing something about the past of Y2 (t) makes the past of Y1 (t) informative for Graphs and δ–Separation past of Y3 (t) and therefore predictive of Y4 (t). The local independence graph G = (V, E) of a multi- state process YV (t) = (Y1 (t), . . . , YK (t)) (or an MPP) 3 CAUSAL VALIDITY is given such that the absence of a directed edge indi- cates a local independence, i.e. So far we described a notion, and graphical represen- tation, of dynamic (in)dependence based on how the (j, k) ∈ / E ⇒ Yj → / Yk |YV \{j,k} . present of a subprocess depends or not on the past of other processes; in other words, a notion of time-lagged The resulting graphs are directed, can have two di- (in)dependence. As it is based on the intensity process rected edges between any two vertices, and can have it can be considered as characterised by infinitesimal cycles. Note that pa(k)∩ch(k) 6= ∅ is possible, and short-term predictions, which is very much parallel to similar for ancestors and decendants etc. so-called ‘Granger–causality’ (Granger, 1969). How- Under regularity conditions, the definition implies that ever, much of the causal inference literature formalises the intensity process λk for Yk is F cl(k) –measurable causality in terms of (sometimes hypothetical) inter- (Didelez, 2008), where cl(k) is the closure (i.e. the set ventions. For instance a DAG is termed causal if the of parents and k itself). set of variables XV is sufficiently ‘rich’ so that an in- tervention that changes how a variable Xk is generated As for conditional independence graphs, certain sep- corresponds to replacing p(xk |xpa(k) ) with a different arations on a local independence graph imply further p̃(xk ) in the factorisation local independencies. However, a different notion of Y separation is required, δ–separation: define GB as the p(xV ) = p(xi |xpa(i) ). graph obtained after deleting all edges emanating from i∈V Y1 Y2 For these to be well-defined, in particular for P̃ << P , we need W (t) to be uniformly integrable which can σ1 be interpreted as λk (t), λ̃k (t) not being ‘too different’, e.g. W (t) could be uniformly bounded. In fact, if Λk (t) Y4 Y3 is assumed absolutely continuous such that λk (t) ex- ists, then it is e.g. not possible to re-weight with an Figure 2: An augmented local independence graph intervention that has discrete jumps of Nk (t) at fixed with intervention indicator σ1 . time points. Note that this can be regarded as corre- spondent of the ‘positivity’ condition typically made in many causal inference contexts. Røysland et al. (2015) extend this notion of interven- tion to local independence graphs by assuming that the intervention replaces the intensity process λk of Censoring and Re-Weighting Yk by a different one λ̃k , which will typically be mea- surable with respect to a smaller subset of processes, In the context of survival or duration data it is al- e.g. those relevant to and observable by the decision most inevitable to have censoring (e.g. due to the end maker. of the study). Censoring in itself can be regarded as an event and modelled with a counting process that Remember that for a given local independence graph jumps when the observation is censored. This then al- G, each intensity process λk is F cl(k) –measurable. lows us to express assumptions about the censoring in Røysland et al. (2015) then define this graph to be terms of its intensity process. A common assumption causally valid for an intervention in Yk if this corre- is independent censoring which can be stated as the sponds to replacing λk by λ̃k while all other intensities relevant process (e.g. survival) being locally indepen- λj , j 6= k, remain the same under the intervention. dent of the censoring process, possibly conditional on other observed processes. The most obvious violation Intervention Indicator of this assumption occurs when there are unobserved common causes for censoring and survival. In analogy to the influence diagrams of Dawid (2002, 2012), it can be helpful to indicate graphically that Moreover, censoring can be linked to the above ideas an intervention modifying the intensity of Yk is being of intervention and re-weighting in the following sense. considered, by adding an intervention node σk . For The target of inference is typically a population where the basic set-up chosen here, σk would itself not be a no censoring occurs (e.g. future patients) or where cen- process and simply take values in {o, e} to indicate the soring is entirely random and stochastically indepen- original system with intensity λk when σk = o, or the dent of other processes. Hence we can say that the intervened system with intensity λ̃k when σk = e. The target is to replace the censoring intensity by a differ- absence of any edges involving σk other than σk −→ Yk ent intensity that does not depend on the past. When then represents the causal validity assumption, in anal- this is possible given the observed processes there- ogy to extended stability of Dawid & Didelez (2010). fore depends among others on whether the local in- dependence graph on all events including censoring is Example I (ctd.): The graph in Figure 2 is aug- causally valid wrt. the censoring process. Røysland mented with the intervention node σ1 to indicate that et al. (2015) discuss this further and give an example Y1 is subject to possibly different intensities in the where censoring is independent, but based on a lo- two different regimes. The absence of edges between cal independence graph that is not causally valid and σ1 and other nodes indicates that their observational hence leading to incorrect inference. For the remainder cl(k) Ft –intensities remain the same under intervention. of the paper here we do not further consider censoring. Re-Weighting 4 IDENTIFICATION Similar to the case of longitudinal data, it turns out that inference about the dynamics between events under the intervened system can be obtained by In the following we assume that the index set of pro- re-weighting. Specifically the weights are given as cesses is V = V0 ∪ X ∪ L ∪ U where V0 are observ- W (t) := able processes of interest (‘outcome’ processes), X (or counting process NX ) is the process in which we want !∆Nk (s) Z t  to intervene changing its intensity, L is a set of observ- Y λ̃k (s) able processes in which we are not interested, and U exp λk (s) − λ̃k (s)ds . λk (s) 0 is a set of unobservable processes. s≤t Definition 1: σX X U1 Let G be the local independence graph for processes V = V0 ∪ X ∪ L ∪ U ; assume causal validity wrt. X. Consider an intervention in X that changes its obser- vational F V –intensity λX to a F V0 –intensity λ̃X . We U2 L V0 say that the effect of such an intervention on V0 is identified by L if the F V0 –intensities for every count- Figure 3: An augmented local independence satisfying ing process N ∈ V0 under the intervention exist and simple stability. are given by re-weighting with the above weights W (t). Røysland et al. (2015) show the following sufficient condition for identification: & Didelez’ (2010) notion of ‘sequential irrelevance’; this condition allows unobserved processes in U to af- Proposition 2: fect the treatment process X as long as they are ‘ir- In the situation of Definition 1, if U → / X | (V0 ∪ L), relevant’ to the other processes of interest. then the effect on V0 of intervening in X is identified by L. Corollary 6: Assume the preconditions of Definition 1, and the aug- Example I (ctd.): In Figure 2, assume we are in- mented local independence graph Gσ (i.e. causal valid- terested in the effect of an intervention in X = Y1 on ity wrt. X). Then U → / (V0 ∪ L) | X implies simple V0 = Y4 and let L = Y2 and U = Y3 . Then we see that stability. Proposition 2 is satisfied, meaning that re-weighting will allow us to compute aspects of the possibly modi- Both, Corollary 5 and 6 are sufficient but not necessary fied behaviour of Y4 under an intervention that changes for simple stability as the following example demon- the intensity process of Y1 , where the weights require strates. no observation of Y3 . Example II: The graph in Figure 3 shows a situa- The condition of Proposition 2 is the point pro- tion where U = (U1 , U2 ) satisfies neither Proposition cess analogue of sequential randomisation in Dawid & 2 nor Corollary 6. However, simple stability is satis- Didelez (2010); it is in fact satisfied iff U ∩pa(X) = ∅. fied. Note that U1 alone fulfills Corollary 6 and U2 In other words, it formalises the notion that given the alone Proposition 2. All these would be destroyed by past of observed processes, X(t) is at any time t inde- an edge between U1 and U2 . pendent of the past of unobserved processes. Dawid & Didelez (2010) show that this implies ‘simple stabil- 5 DISCUSSION ity’ which in turn is a sufficient identifying condition for sequential interventions in their longitudinal (time- More generality? In the time-discrete case, more gen- discrete) setting. Here, we define the time-continuous eral conditions for causal effect identification can and marked point process analogue as follows. have been given than those analogous to simple sta- Definition 3: bility. Specific to sequential decisions in longitudinal With the preconditions of Definition 1, and the aug- data these are for example addressed in Pearl & Robins mented local independence graph Gσ with intervention (1995), Robins (1997), Dawid & Didelez (2010; section node σX , we define that simple stability holds if 8). It appears not straightforward to generalise these to the time-continuous situation with local indepen- σX → / (L ∪ V0 ) | X. dence graphs considered here, as it assumes stationar- ity of the dependence structure, while such more gen- We conjecture that identification can in fact be ob- eral criteria are typically relevant when the structure tained under the wider assumption of simple stability. changes over time. However, it is possible to generalise local independence graphs to some extend in order to Conjecture 4: take non-stationarity of (in)dependencies into account, Assume the preconditions of Definition 1, and the aug- e.g. some independencies might hold before a certain mented local independence graph Gσ (i.e. causal valid- event has happened and others afterwards leading to ity wrt. X). a sequence of graphs that are valid in intervals defined If simple stability holds, then the effect on V0 of inter- by stopping times (Didelez, 2008). vening in X is identified by L. Why an intervention indicator? The decision theoretic Corollary 5: approach to causality makes it formally and graphi- The condition of Proposition 2 implies simple stability. cally explicit that an intervention in a particular node We can now formulate a result corresponding to Dawid is being considered and what assumptions are involved (Dawid, 2012). This allows greater clarity, e.g. regard- between U and V0 ∪ L at all. Hence, even if there are ing the target of inference; but in our case it also allows moral edges between σX and U these do not lead to to formulate conditions for identification that do not paths between V0 ∪ L and σX in the relevant moral need to refer to or characterise unobservable processes graph and Definition 3 is satisfied. U . The flip side is that one might miss an intuition for what kinds of U violate the conditions, which may impede justifying the assumption of simple stability. Acknowledgement Here, we have linked the results to the notions of se- quential randomisation / irrelevance of U which pro- Financial support from the Leverhulme Trust (RF– vide some intuition. 2011–320) is gratefully acknowledged. Causal Search? We assumed that the local indepen- References dence graph is given and that subject matter knowl- edge justifies causal validity wrt. certain events or pro- Dawid, A.P. (2002). Influence diagrams for causal cesses. Meek (2014) addresses learning the graph. Un- modelling and inference. International Statistical der a completeness assumption this is in principle (i.e. Review, 70:161-189. given an oracle test for local independence) straight- forward as there are no issues of Markov-equivalence Dawid, A.P. (2012). The Decision-Theoretic Ap- due to the asymmetry of local independence in time, proach to Causal Inference. Chapter 4 in: Causal- i.e. all edges can easily be oriented. Meek (2014) fur- ity – Statistical Perspectives and Applications ther gives results for cases of unobserved processes, e.g. (eds. C.Berzuini, A.P.Dawid, L.Bernardinelli), causal insufficiency. However, the main practical prob- Wiley. lem in any real application will be a suitable test for Dawid, A.P., Didelez, V. (2010). Identifying the local independence. In low-dimensional settings with consequences of dynamic treatment strategies: A few events, this can be done almost non-parametrically decision theoretic overview. Statistics Surveys, e.g. by testing equality of survival-curves; but in higher 4:184-231. dimensions this becomes prohibitive. One could make simplifying assumptions, such as assuming a Markov Didelez, V. (2006). Asymmetric separation for local process; in this context it is important to be aware that independence graphs. In Proc. of 22nd UAI Con- if YV (t) is Markov, then a subprocess YA (t), A ⊂ V ference, 130-137. AUAI Press. is typically not. Didelez, V. (2007). Graphical models for composable finite Markov processes. Scandinavian Journal of APPENDIX Statistics, 34:169-185. Proof of Conjecture 4: see Røysland & Didelez (2015). Didelez, V. (2008). Graphical models for marked point processes based on local independence. Proof of Corollary 5: JRSSB, 70(1):245-264. Remember that in the augmented local independence Granger, C.W.J. (1969). Investigating causal rela- graph Gσ , assuming causal validity wrt. X, there tions by econometric models and cross-spectral is only a single edge involving σX pointing into X. methods. Econometrica 37:424-438. Further, the condition of Proposition 2 is satisfied iff U ∩pa(X) = ∅ in G. The graphical check of Meek, C. (2014). Toward learning graphical and δ–separation for simple stability involves removing all causal process models. In Proc. of 31th UAI Con- outgoing edges from V0 ∪ L; in the resulting graph ference Causality Workshop, 43-48. AUAI Press. before moralisation, there are no edges into X except the one from σX . Hence, in the moral graph, σX only Pearl, J., Robins, J. (1995). Probabilistic evalua- has an edge with X and Definition 3 is satisfied. tion of sequential plans from causal models with hidden variables. In Proc of 11th UAI Confer- ence, 444-453. Morgan Kaufmann Publishers, San Proof of Corollary 6: Francisco. As above, in the augmented local independence graph Gσ there is only a single edge involving σX pointing Robins, J.M. (1997). Causal inference from complex into X. The graphical check of δ–separation for longitudinal data. In: Latent Variable Modeling simple stability, furthermore, involves removing all and Applications to Causality, (ed. M. Berkane). outgoing edges out of V0 ∪ L and with the condition Lecture Notes in Statistics 120:69-117. Springer, of Corollary 6 this means that there are no edges New York. Robins, J.M., Hernan, M.A., Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11:550-560. Røysland, K. (2011). A martingale approach to continuous time marginal structural models. Bernoulli, 17(3):895-915. Røysland, K. (2012). Counterfactual analyses with graphical models based on local independence. Annals of Statistics, 40(4):2162-2194. Røysland, K., Didelez, V., Nygard, M., Lange, T., Aalen, O.O. (2015). Causal reasoning in sur- vival analysis: re-weighting and local indepen- dence graphs. Submitted. Røysland, K., Didelez, V. (2015). General criteria for identification of causal effects between events in continuous time. In preparation.