Inference of Cause and Effect with Unsupervised Inverse Regression

Eleni Sgouritsa

0 1 2 3

Bernhard Scho¨lkopf

0 1 2 3 0 D. Janzing , J. M. Mooij, K. Zhang, J. Lemeire, J. Zscheischler, P. Daniusis, B. Steudel , and B. Scho ̈lkopf. Informationgeometric approach to inferring causal directions. Artificial Intelligence , 182-183:1-31, 2012 1 Dominik Janzing Philipp Hennig Max Planck Institute for Intelligent Systems , Tu ̈bingen , Germany 2 J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press , 2nd edition, 2009 3 P. Hoyer , D. Janzing, J. M. Mooij, J. Peters , and B. Scho ̈lkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 21 (NIPS) , 2009

We address the problem of causal discovery in the two-variable case, given a sample from their joint distribution. Since X ! Y and Y ! X are Markov equivalent, conditional-independence-based methods [Spirtes et al., 2000, Pearl, 2009] can not recover the causal graph. Alternative methods, introduce asymmetries between cause and effect by restricting the function class (e.g., [Hoyer et al., 2009]). The proposed causal discovery method, CURE (Causal discovery with Unsupervised inverse REgression), is based on the principle of independence of causal mechanisms [Janzing and Scho¨lkopf, 2010]. For the case of only two variables, it states that the marginal distribution of the cause, say P (X), and the conditional of the effect given the cause P (Y jX) are “independent”, in the sense that they do not contain information about each other. This independence can be violated in the backward direction: the distribution of the effect P (Y ) and the conditional P (XjY ) may contain information about each other because each of them inherits properties from both P (X) and P (Y jX), hence introducing an asymmetry between cause and effect. For deterministic causal relations (Y = f (X)), all the information about the conditional P (Y jX) is contained in the function f . In this case, previous work [Janzing et al., 2012] formalizes “independence” as uncorrelatedness between logf ′ and the density of P (X), both viewed as random variables. For non-deterministic relations, we propose an implicit notion of independence, namely that pY jX cannot be estimated based on pX (lower case denotes density). However, it may be possible to estimate pXjY based on the density of the effect, pY . In practice, we are given empirical data x 2 RN , y 2 RN from P (X; Y ) and estimate pXjY based on y (intentionally hiding x). The relationship between the observed y and the latent x is modeled by a Gaussian Process (GP). Then, the required conditional pXjY is estimated as p^yXjY : (x; y) 7! p(xjy; y), with p(xjy; y) estimated by marginalizing out the latent x and the GP hyperparameters. CURE infers the causal direction using the procedure above two times: one to estimate pXjY based only on y and another to estimate pY jX based only on x. If the first estimation is better, X ! Y is inferred. Otherwise, Y ! X. To evaluate the conditional's estimation, we compare it to the one using both x and y. CURE was evaluated on synthetic and real data and often outperformed existing methods. On the downside, its computational cost is comparably high. This work was recently published at AISTATS 2015 [Sgouritsa et al., 2015].