=Paper=
{{Paper
|id=Vol-1774/MIDAS2016_paper7
|storemode=property
|title=Towards a sharp estimation of transfer entropy for identifying causality in financial time series
|pdfUrl=https://ceur-ws.org/Vol-1774/MIDAS2016_paper7.pdf
|volume=Vol-1774
|authors=Àlex Serès,Alejandra Cabaña,Argimiro Arratia
|dblpUrl=https://dblp.org/rec/conf/pkdd/SeresCA16
}}
==Towards a sharp estimation of transfer entropy for identifying causality in financial time series==
Towards a sharp estimation of transfer entropy for identifying causality in financial time series Àlex Serès1 , Alejandra Cabaña1 , and Argimiro Arratia2 1 Universitat Autònoma de Barcelona, Mathematics, Barcelona, SPAIN, alejandro.seres@e-campus.uab.cat, acabana@mat.uab.cat 2 Universitat Politècnica de Catalunya, Computer Science, Barcelona, SPAIN argimiro@cs.upc.edu Abstract. We present an improvement of an estimator of causality in financial time series via transfer entropy, which includes the side infor- mation that may affect the cause-effect relation in the system, i.e. a con- ditional information-transfer based causality. We show that for weakly stationary time series the conditional transfer entropy measure is non- negative and bounded below by the Geweke’s measure of Granger causal- ity. We use k-nearest neighbor distances to estimate entropy and approx- imate the distribution of the estimator with bootstrap techniques. We give examples of the application of the estimator in detecting causal ef- fects in a simulated autoregressive stationary system in three random variables with linear and non-linear couplings; in a system of non sta- tionary variables; and with real financial data. 1 Introduction The determination of cause-effect relation among financial time series poses sev- eral research challenges: besides the proper detection of the causality, it is im- portant to quantify its strength and the effects of side information that may be present in the system. Also, considering well known observed stylized facts about time series, the measure of causality should be sensible to detection of possible non linear dependence, and adaptable to non stationary time series. In econo- metrics the standard tool for testing statistical causality is the one proposed by Granger [6, 5], for the bivariate case, and extended by Geweke [3] to a condi- tional Granger causality test. These tests assume a linear relationship among the causes and effects since it is implemented by fitting autoregressive models. More recently there have been several approaches to testing causality based on non parametric methods, kernel methods and information theory, among others, in order to cope with non linearity and non stationarity, e.g., [2, 17, 11, 1, 16]. The purpose of this work is to contribute to the enhancement of an esti- mator of bivariate causality proposed in [16], and implemented as software tool TRENTOOL, which has been shown to be robust in detecting the direction and measuring the strength of causality in complex biological systems. Our work and contributions is structured as follows. In Section 2 we present our modified version of Wibral et al. transfer entropy based causality test by extending the definition to a conditional causality test, and thus, accounting for side informa- tion. In Section 3 we show that this conditional transfer entropy is a measure of statistical causality in the same sense as Granger causality (i.e. that the causes precede the effects). Section 4 describes the general steps to estimate transfer entropy. We use the same technique of k-nearest neighbor distances of [8] for estimating mutual information, but we make use of a bootstrap technique suited for stationary time series [12, 10], for approximating the distribution of the es- timator. As a proof of concept, in Section 5, we report three applications of the modified causality measure. Two with simulated data, one to assess the sen- sitivity of the causality test to linear and non linear couplings in a system of three variables, the other composed of two non stationary variables. The third experiment is our reality check where we use real financial data: we try to deter- mine the possible influence of a market index to another by testing for causality among the German DAX index and the Spanish IBEX. Due to space restrictions we omit several technical details in this report. These can be found in the extended document [14]. 2 Granger Causality and Transfer Entropy Let X = {Xt }, Y = {Yt }, Z = {Zt } be three stochastic processes defined on a common probability space, from which we try to infer a causal interaction between X and Y , and where Z represents the side information to complete the system. We denote the realization of these random variables at time t as xt , yt , zt . Further we use Xt , Yt and Zt to denote the state space vectors that characterize the processes at time t, in this case we choose the whole collection of random variables up to time t, to follow closely the definition given in [16]. Conditional Granger causality. It is said that X does not (Granger) cause Y , relative to the side information, Z, if for all t ∈ Z, P (Yt |Xt−1 , Yt−k , Zt−1 ) = P (Yt |Yt−k , Zt−1 ), where k ∈ N is the lag, and P (·|·) stands for conditional probability distribution. In the bivariate case (unconditional causality) the side information is omitted. To determine conditional Granger causality the following vector regression models are considered Yt = LY (Yt−1 ) + LXY (Xt−1 ) + LZY (Zt−1 ) + Y,t Yt = L e Y (Yt−1 ) + L e ZY (Zt−1 ) + e Y,t where LY , LXY , LZY , L eY , L e ZY are linear functions and Y,t , e Y,t are the residuals of the regression of Yt on Xt and information Zt , and without Xt . Then one can quantify the usefulness of including X in explaining Y using Geweke’s test var(e Y,t ) based on the variances of the residuals [3]: FX→Y |Z = log . var(Y,t ) Note that the residual variance of the second regression will always be larger or equal to the one from the first, so FX→Y |Z ≥ 0. For its statistical treatment, it is known that the corresponding maximum likelihood estimator will have a χ2 distribution under the null hypothesis FX→Y |Z = 0, and a non-central χ2 distribution under the alternative hypothesis FX→Y |Z > 0 [5, 3]. Transfer entropy. The transfer entropy of process X to Y , conditioned to the side information Z, can be obtained from mutual information as T EX−→Y |Z = I(Y + ; X − |Y − ⊕ Z − ) where A− denotes all information available of the past of A, A+ its immediate future, and A ⊕ B represents the concatenation of random vectors A and B. Wibral et al. in [16] argue extensively on the inadequacy of this formula in the bivariate case for truly capturing cause-effect relations, and go on to propose a transfer entropy with self prediction optimality at lag u, which extended to account for the side information Z, has the form: T ESP O:X−→Y,u|Z = I(Yt ; Xt−u |Yt−1 , Zt−1 ) Then they show, Theorem 1. For two discrete-time random processes X,Y with a state-space representation X, Y coupled from X to Y via a non-zero delay δ, T ESP O:X−→Y,u|Z is maximal for u = δ. This also holds in presence of additional coupling from Y to X. t u This approach suits every requisite that we need to correctly estimate infor- mation transfer, but in practice the associated estimator resulted in an unreli- able predictor due to the amount of information about Xt−δ present in Xt−u , for 1 < u < δ, which causes it to assign a larger value to the information “re- bounds” produced by the approximations assumed. (By rebound it is meant the contribution of information due to indirect causality, e.g. X → Z and Z → Y .) This can be corrected by using the state space representation for the target Yt : T ESP O:X−→Y,u|Z = I(Yt ; Xt−u |Yt−1 , Zt−1 ) (1) By the properties of the state-space representation we are not losing any condi- tion that we imposed to the former estimator, and we obtain an advantage: we are compensating the rebound effect of Xt−δ in Xt−u , because we swap the nu- merical value of Yt for a state vector that contains information of the past which, supposing that the interaction lag is constant for all t considered, accounts for a more precise value for the unwanted rebounds, with a clear apex at the true delay. On the other hand we have a disadvantage, related to the method used to estimate conditioned densities. As a result of using an element of a larger dimension we get a bigger error in the estimation. 3 Transfer Entropy is a measure of Causality Barnett et al. in [1] showed that for random variables with a Gaussian distribu- tion, measuring transfer entropy is equivalent to measuring Granger causality. This result easily extends to the conditional causality concept. Formally, it can be shown that FX→Y |Z = 2T EX−→Y |Z . Nevertheless, we want to evaluate our capacity to infer causality against financial data, which is widely known to behave neither as stationary nor to be Gaussian. For this motive it is very important to confirm that when measuring information transfer between processes we are, indeed, testing for statistical causality. We have not found any explicit proof of this proposition, but there are some signs in the literature. One is an argument asserting that non-zero Granger causality means non-zero transfer entropy [11], but it is a bit vague. Next is an indication for a lower bound for mutual information [8], which resulted really helpful in proving that for weakly stationary variables, not necessarily Gaussian, the linear influences detected with Granger causality are no more than the ones detected for Gaussian variables, and therefore we will detect them at the very least. We state without proof the following proposition, Proposition 1. Given two stochastic processes {Xt },{Yt }, and supposing that X causes Y in the Granger sense, then T EX−→Y |Z ≥ 0 t u Now, conditional transfer entropy relates to Granger causality as follows: Theorem 2. Let {Xt }, {Yt }, {Zt } be three jointly distributed, weakly stationary stochastic processes, defined in a common probability space. Then the standard measure of Granger causality and transfer entropy are related by: 2T EX−→Y |Z ≥ FX→Y |Z Proof. It suffices to prove that conditional mutual information is minimized by Gaussian distributions. Since T EX−→Y |Z is only a specific form of a conditional mutual information the result will follow. As proposed in [8] we will set a minimization problem on the mutual infor- mation for continuous variables using Lagrange multipliers. If we use µ(x, y) = µ0 (x, y|z), then the formula that we need to minimize does not change: µ0 (x, y|z) ZZ I(X; Y |Z) = µ0 (x, y|z)log 0 dxdy µx (x|z)µ0y (y|z) ZZ µ(x, y) = µ(x, y)log dxdy µx (x)µy (y) RR R R Recall that µ(x, y)dxdy = 1, µx (x) = µ(x, y)dy and µy (y) = µ(x, y)dx and that R 2 C is the covariance matrix R of the variables X, Y conditioned on Z, so that x dxµX (x) = Cxx = σx , y 2 dyµY (y) = Cyy = σy , and RR xydxdyµ(x, y) = Cxy . Then the Lagrangian can be written as: ZZ ZZ µ(x, y) L=− µ(x, y)log dxdy − λ1 ( µ(x, y)dxdy − 1) µx (x)µy (y) ZZ ZZ −λ2 ( x2 µ(x, y)dxdy − σx ) − λ3 ( y 2 µ(x, y)dxdy − σy ) ZZ −λ4 ( yxµ(x, y)dxdy − σxy ) ∂L Now impose ∂µ(x,y) = 0, since the first and second order moments are finite, we can swap the partial derivative with the integral sign, thus obtaining: ZZ ∂ µ(x, y) 0=− µ(x, y)log +λ1 µ(x, y) +λ2 x2 µ(x, y)) +λ3 y 2 µ(x, y) ∂µ(x, y) µx (x)µy (y) ∂ +λ4 yxµ(x, y) dxdy + λ1 + λ2 σx + λ3 σy + λ4 σxy , ∂µ(x, y) which is zero if the derivative inside the integral is zero, as it is positive. Then: µ(x, y) 1 1 0 = −log − µ(x, y) µ(x,y) − λ1 − λ2 x2 − λ3 y 2 − λ4 yx µx (x)µy (y) µx (x)µy (y) µx (x)µy (y) 2 2 and hence µ(x, y) = µx (x)µy (y)e−(1+λ1 ) e−λ2 x −λ3 y −λ4 yx , where λ1 , λ2 (≥ 0), λ3 (≥ 0), λ4 are constants fixed by the constraints. Therefore a probability den- sity that follows this equation is consistent with a Gaussian. To prove that it is the only one, integrate the expression of µ(x, y) over y and make the change of variables x = −iζ/c: Z Z 2 2 µ(x, y) dy = µx (x)µy (y)e−(1+λ1 ) e−λ2 x −λ3 y −λ4 yx dy ⇒ Z 2 2 µX (x) = µx (x)µy (y)e−(1+λ1 ) e−λ2 x −λ3 y −λ4 yx dy ⇒ Z 2 2 e(1+λ1 ) eλ2 x = µy (y)e−λ3 y −λ4 yx dy ⇒ Z 2 2 2 e(1+λ1 ) e−λ2 ζ /c = eiλ4 yζ/c [µY (y)e−λ3 y ] dy The left hand side of this last equality is a Gaussian density function, and the 2 right hand side, its Fourier transform, µY (y)e−λ3 y . Since the Fourier transform of a Gaussian is another Gaussian, and µY (y) cannot be a constant if it is a density function, then µY (y) is also Gaussian. The same argument (integrating over x) shows that µX (x) is Gaussian, thus the joint distribution of the random variables X, Y that minimizes its mutual information conditioned to Z, µ(x, y) = µ0 (x, y|z), is a Gaussian distribution. t u 4 Practical estimation of transfer entropy In this section we outline our methodology used to estimate transfer entropy. Our methodology follows the procedures outlined in [16], but we introduce some key modifications: aside from our particular estimator which deals with conditional mutual information, we test for null transfer entropy by obtaining a bootstrap approximation of the distribution of the estimator, a non parametric procedure. The state-space representation. First we need to reconstruct the corre- sponding state-space of the interacting systems from a scalar time series. For this purpose we use Takens delay embedding [15] with the parameters suggested by Ragwitz [13]. Delay embedding states of the systems under consideration can be (m) written as delay vectors of the form: Xt = (Xt , Xt−τ , Xt−2τ , . . . , Xt−(m−1)τ ), where m and τ denote the embedding dimension and Takens’ embedding delay respectively. Embedding lag. To enhance the predictability between the state variables it is key to choose the correct variables. Ragwitz’ criterion yields delay embedding states that provide optimal self prediction for a large class of systems. It is based in a close study of the properties of an assumed multivariate Langevin process from which the data proceeds, assuming that the univariate series behaves locally like a Markov chain. As for the delay parameter τ , in the limit of infinite precision of the data and infinitely long time series, all values are equivalent. In a practical situation, how- ever, a good choice of the delay is crucial. If τ is too large, successive elements of the embedding vector are almost independent and the vectors fill an amorphous cloud in the state-space. If τ is too small, successive elements of the embedding vector are strongly correlated and all vectors are clustered around the “diago- nal” of the state space. We will need to search for meaningful neighbors of these vectors in the next two steps, and in both extreme cases they are difficult to ob- tain. Then, the simplest reasonable estimate of an optimal delay is the first zero of the autocorrelation function ρ(k) = γ(k)/γ(0), where γ(k) = Cov(Xt , Xt+k ), k ∈ Z. To discern the first zero, our program performs the test H0 : ρ(k) = 0 versus H1 : ρ(k) 6= 0 Embedding dimension. To select the dimension of the vectors of the re- constructed state-space we use the false nearest neighbors method [9, 7]. This method compares distances of m-valued state-space vectors to its k-nearest neighbor (knn) with distances for (m + 1)-valued state-space vectors from the same data. These distances are really easy to calculate considering that all vec- tors have a similar form once we have the estimated delay. Only the knn search may prove to be a problem if the dimension becomes too big. If the distances grow too much with the growth in dimension, the neighbors are deemed as false, and we try again increasing the dimension by one, until no false neighbor is found. In this way the data itself dictate the embedding dimension, in a way that it approximates correctly the dimension of the manifold in which the states of the system reside. Mutual Information estimation Once we have the parameters involved in the state space representation of both uni-valued series of data, we apply the formula (1) to find the transfer entropy from X to Y: T ESP O:X−→Y,u = I(Yt ; Xt−u |Yt−1 ) = I(Yt ; Xt−u , Yt−1 ) − I(Yt ; Yt−1 ) (We are not considering a third variable Zt symbolizing side information to simplify notation, the inclusion of which would mean four mutual information terms in the previous formula instead of two.) To estimate the mutual information I(A; B) between two multivariate time series, A and B, we use the k-nearest neighbor estimator suggested in [8]. When we use this method the only assumption behind is a certain smoothness of the underlying probability functions, so it fits in as a non-parametric technique. This estimator has the form ˆ B) = Ψ (k) − hΨ (nA + 1) + Ψ (nB + 1)i + Ψ (N ) I(A, where N is the length of the series used, nA is the number of points in A whose pairwise distance is below a certain given threshold (similarly define nB ), and Ψ (x) is the digamma function: Ψ (x) = Γ (x)−1 dΓdx(x) . The digamma function satisfies the recursion Ψ (x + 1) = Ψ (x) + 1/x and Ψ (1) = −C, with C being the Euler-Mascheroni constant. Bootstrap sampling Once we have the estimators of the Transfer Entropy be- tween our time series as a function of the lag, we approximate their distribution via a bootstrap experiment. Take bootstrap replications of the series and re- compute the estimators as usual. Bootstrapping stationary time series has been widely studied in [12] and [10] and we follow their procedures. This allows us to assess (for each lag) whether there is no causation (transfer entropy equal to zero) by looking at the bootstrap confidence intervals. We think that this is a more natural way of approximating the distribution of the statistic of the trans- fer entropy estimator, than the test conducted in [16] which consist in comparing the observed TE with that of surrogate data obtained by permutations of the source. 5 Simulations and applications in finance We present several experiments that we have performed to assess the efficiency of our method to estimate transfer entropy, as well an application to real data. Example 1. Consider an autoregressive stationary system in three variables with one linear coupling (Y → Z), and two non-linear ones (X → Y and X → Z), proposed in [4]. We will use different lags in each interaction to be able to clearly differentiate them, as well as the “rebounds” produced in an indirect cause, specifically δX→Y = 10, δX→Z = 5, δY →Z = 15. The system satisfies: 2 Xt = 3.4Xt−1 (1 − Xt−1 )2 eXt−1 + 0.41,t 2 Yt = 3.4Yt−1 (1 − Yt−1 )2 eYt−1 + 0.5Xt−10 2 + 0.42,t 2 Zt = 3.4Zt−1 (1 − Zt−1 )2 eZt−1 + 0.3Yt−15 + 0.5Xt−5 Zt−1 + 0.43,t where 1,t , 2,t , 3,t are Gaussian white noise terms with identity covariance ma- trix. To simulate it, we have taken the 30 initial terms of each series as 0, and begin applying the formula from there for a total of 104 steps, and the last 103 were used to test the transfer entropy estimator, as the arbitrary initial condi- tions should have little effect then. We have performed the geometric time series bootstrap 60 times for each lag, with average sizes of length 100. We then have drawn 90% confidence intervals. We have estimated up to lag 30 to save com- putation time, as it should be enough to recover the causal interactions. The results are presented in Figure 1. Observe that we have recovered all the true delays as clearly distinct max- imums of the transfer entropy versus the lag plots. Its important to highlight that the scale in the y-axis, that is the transfer entropy magnitude, is different in each graph, as our estimator depends on the embedding dimension, so each test is unique and we must look at the error bars generated in the bootstrap process to look at it with perspective. That being said, variation in TE depending on extremely different embedding dimension reaches a 30% of the larger value, and in this graph we can make conclusions about the strength of the interactions as we see differences of an order of magnitude. That is why, as expected, we can consider null the Z → X and Y → X interactions, as the measured TE for the lag values statistically different to zero is really small. There is a hint of a pattern, but it should be no more than a residual influence in the estimator, motivated by the real influence in the opposite direction. The Z → Y interaction is also fairly small, but really interesting: Z does not influence directly Y , but X influences both, first Z at u = 5 and Y at u = 10, therefore Zt−5 has important information on Y via X. Here we have a good example of when we would need to apply the whole TE formula, conditioning to a known overarching series. On the other hand, the autoregressive interaction of all the variables, rep- resented in the diagonal, has been recovered at a maximum in the u = 1 lag, with decreasing intensity as the influence of the past of the series declines and stabilizing at 0 after. The X → Y interaction is prominently recovered in a sim- ilar fashion, at u = 10, also showing the decreasing remnants of information still present at values around the true delay. The linear interaction X → Y is also displayed with a maximum at u = 15. And the interaction X → Z shows two interesting properties of our estimator, aside of successfully detecting the influ- ence at u = 5 with a distinct local maximum. We see how we detect a maximum at u = 25, which corresponds to the interaction X → Y → Z, which means that we successfully detect indirect interactions, as it displays a strength similar to Y → Z. The other important detail is to check how the variances of the indirect interaction are bigger than those of the direct one. We repeat the same analysis using a linear Granger causality estimator, pro- vided in the R library MSBVAR. We used the regressions up to lag u = 15, as we did not want the indirect influences to have enough steps to affect the test over the direct interactions that are completed at this lag. The results obtained are shown in Table 1. X−>X X−>Y X−>Z 1.2 0.4 1.0 0.3 3 0.8 0.2 0.6 2 TE TE TE 0.1 0.4 1 0.2 0.0 0.0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Lag Lag Lag Y−>X Y−>Y Y−>Z 0.10 2.0 0.4 1.5 0.3 0.05 0.2 1.0 TE TE TE 0.00 0.1 0.5 0.0 0.0 −0.05 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Lag Lag Lag Z−>X Z−>Y Z−>Z 4 0.25 0.10 3 0.15 0.05 2 TE TE TE 0.05 1 0.00 −0.05 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Lag Lag Lag Fig. 1. Graphs representing the estimated transfer entropy depending on the lag of the supposed cause, for each of the nine possible interactions of the system. The test recovers swiftly the Y → Z linear interaction, and also the X → Y quadratic interaction, but fails at detecting any interaction in X → Z, which is highly non-linear at true lag u = 5. To better check the responsiveness of the test we have repeated it considering various smaller lag values for X, and we Table 1. Granger test statistics and p-values for different interactions in the system F-statistic p-value Y →X 1.0111734 0.44062932 Z→X 1.5610944 0.07815942 X→Y 278.3554360 0.00000000 Z→Y 1.2341100 0.23930984 X→Z 0.7986152 0.68012265 Y →Z 18.8199782 0.00000000 never obtained a favorable p-value. Our TE test shows the various interactions in the system (direct or indirect), justifying its appropriateness when dealing with non-linear systems. t u X−>X X−>Y 4 0.4 0.3 3 0.2 TE TE 2 0.1 1 −0.1 0.0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Lag Lag Y−>X Y−>Y 4 0.10 3 0.05 2 TE TE 0.00 1 −0.05 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Lag Lag Fig. 2. Graphs representing the estimated transfer entropy depending on the lag. This time both variables are non stationary. Example 2. Next we propose a similar (but non stationary) system. Consider two processes (X → Y ) with a true delay δX→Y = 10 and random variance that follows an IGARCH(1,1) model: Xt = 0.7Xt−1 + σ1,t 1,t Yt = 0.3Yt−1 + 0.5Xt−10 Yt−2 + σ2,t 2,t 2 σi,t = 0.2 + 0.92i,t−1 + 0.1σi,t−1 2 The results are displayed in Figure 2 and the analysis is similar to the previous one. We recover perfectly the autoregressive tendencies, and we obtain a fairly clear view of the X → Y interaction, similar to the X → Z interaction in Example 1, but now it takes a larger time to fade to zero, so we can sense the distortion generated by the variance, although we can still identify the true delay with precision. t u Example 3. As a real world application of our estimator we take a look at the relationship between two major European stock indices, the German DAX-30 and the Spanish IBEX-35, between January 1st, 2011 and June 24th, 2016, that constitute approximately 1200 data points, as we have a daily frequency data feed. We have tested for transfer entropy the log-returns (log(Pt /Pt−1 ), where Pt is the adjusted close price of index at day t) of the DAX to the log-returns of IBEX, as we suppose that the territorially more influential stock market affects the smaller, especially in the current context where Germany holds a major portion of Spain’s debt. The results are represented in Figure 3. We can see a strong influence detected in lag u = 1 that fades quickly in 8 days. It is logical to think that real stock markets react to their sphere of influence quickly enough that daily data does not allow for precise detection of the interaction lag, as we expect it to be much smaller, but we can definitely conclude that there is a causal influence present. t u DAX−>IBEX 1.2 1.0 0.8 TE 0.6 0.4 0.2 0.0 0 5 10 15 20 25 30 Lag Fig. 3. Transfer entropy of the DAX-30 index to the IBEX-35 depending on the lag of the interaction Acknowledgments A. Cabaña acknowledges support of MINECO projec MTM2015-69493-R. A. Arratia acknowledges support of MINECO project APCOM (TIN2014-57226- P), and Gen. Cat. SGR2014-890 (MACDA). References 1. Barnett, L., Barnett, A.B., Seth, A.K.: Granger causality and transfer entropy are equivalent for gaussian variables. Phys. Rev. Lett 103, 238701 (2009) 2. Diks, C., Wolski, M.: Nonlinear granger causality: Guidelines for multivariate anal- ysis. Journal of Applied Econometrics (2015) 3. Geweke, J.: Measurement of linear dependence and feedback between multiple time series. J. J. Amer. Stat. Assoc. 77, 304–313 (1982) 4. Gourévitch, B., Bouquin-Jeannès, R.L., Faucon, G.: Linear and nonlinear causality between signals: methods, examples and neurophysiological applications. Biological Cybernetics 95, 349–369 (2006) 5. Granger, C.W.J.: Testing for causality: A personal viewpoint. J. Econ. Dyn. Con- trol 2, 329–352 (1980) 6. Granger, C.: Investigating causal relations by econometric models and cross- spectral methods. Econometrica 37, 424–438 (1969) 7. Hegger, R., Kantz, H.: Improved false nearest neighbor method to detect deter- minism in time series data. Physical Review E 60(4), 4970 (1999) 8. Karasov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phy. Rev. E 69, 066138 (2004) 9. Kennel, M., Brown, R., Abarbanel, H.: Determining embedding dimension for phase-space reconstruction using a geometrical construction. Physical Review A 45(6), 3403–3411 (1992) 10. Kunsch, H.R.: The jackknife and the bootstrap for general stationary observations. Annals of Statistics 17, 1217–1241 (1989) 11. Marinazzo, D., Pellicoro, M., Stramaglia, S.: Kernel method for nonlinear granger causality. Phys. Rev. Lett. 100, 144103 (2008) 12. Politis, D.N., Romano, J.P.: The stationary bootstrap. Journal of the American Statistical Association 89, 1303–1313 (1994) 13. Ragwiz, M., Kantz, H.: Markov models from data by simple nonlinear time series predictors in delay embedding spaces. Phys. Rev. E 65, 056201 (2002) 14. Serès-Cabasés, A.: Causality via transfer entropy. Bachelor Sc. Thesis, Universitat Autònoma de Barcelona (2016) 15. Takens, F.: Dynamical systems and turbulence. In: Lecture Notes in Mathematics, vol. 898, Warwick 1980 Symp., pp. 366–381. Springer (1981) 16. Wibral, M., Pampu, N., Priesemann, V., Siebenhühner, F., Seiwert, H., Linder, M., Lizier, J., Vicente, R.: Measuring information-transfer delays. PLoS ONE 8(2), e55809 (2013) 17. Zaremba, A., Aste, T.: Measures of causality in complex datasets with application to financial data. Entropy 16, 2309–2349 (2014)