=Paper= {{Paper |id=Vol-1774/MIDAS2016_paper7 |storemode=property |title=Towards a sharp estimation of transfer entropy for identifying causality in financial time series |pdfUrl=https://ceur-ws.org/Vol-1774/MIDAS2016_paper7.pdf |volume=Vol-1774 |authors=Àlex Serès,Alejandra Cabaña,Argimiro Arratia |dblpUrl=https://dblp.org/rec/conf/pkdd/SeresCA16 }} ==Towards a sharp estimation of transfer entropy for identifying causality in financial time series== https://ceur-ws.org/Vol-1774/MIDAS2016_paper7.pdf
 Towards a sharp estimation of transfer entropy
 for identifying causality in financial time series

            Àlex Serès1 , Alejandra Cabaña1 , and Argimiro Arratia2
      1
        Universitat Autònoma de Barcelona, Mathematics, Barcelona, SPAIN,
           alejandro.seres@e-campus.uab.cat, acabana@mat.uab.cat
    2
      Universitat Politècnica de Catalunya, Computer Science, Barcelona, SPAIN
                                 argimiro@cs.upc.edu



       Abstract. We present an improvement of an estimator of causality in
       financial time series via transfer entropy, which includes the side infor-
       mation that may affect the cause-effect relation in the system, i.e. a con-
       ditional information-transfer based causality. We show that for weakly
       stationary time series the conditional transfer entropy measure is non-
       negative and bounded below by the Geweke’s measure of Granger causal-
       ity. We use k-nearest neighbor distances to estimate entropy and approx-
       imate the distribution of the estimator with bootstrap techniques. We
       give examples of the application of the estimator in detecting causal ef-
       fects in a simulated autoregressive stationary system in three random
       variables with linear and non-linear couplings; in a system of non sta-
       tionary variables; and with real financial data.


1    Introduction

The determination of cause-effect relation among financial time series poses sev-
eral research challenges: besides the proper detection of the causality, it is im-
portant to quantify its strength and the effects of side information that may be
present in the system. Also, considering well known observed stylized facts about
time series, the measure of causality should be sensible to detection of possible
non linear dependence, and adaptable to non stationary time series. In econo-
metrics the standard tool for testing statistical causality is the one proposed by
Granger [6, 5], for the bivariate case, and extended by Geweke [3] to a condi-
tional Granger causality test. These tests assume a linear relationship among
the causes and effects since it is implemented by fitting autoregressive models.
More recently there have been several approaches to testing causality based on
non parametric methods, kernel methods and information theory, among others,
in order to cope with non linearity and non stationarity, e.g., [2, 17, 11, 1, 16].
    The purpose of this work is to contribute to the enhancement of an esti-
mator of bivariate causality proposed in [16], and implemented as software tool
TRENTOOL, which has been shown to be robust in detecting the direction and
measuring the strength of causality in complex biological systems. Our work
and contributions is structured as follows. In Section 2 we present our modified
version of Wibral et al. transfer entropy based causality test by extending the
definition to a conditional causality test, and thus, accounting for side informa-
tion. In Section 3 we show that this conditional transfer entropy is a measure of
statistical causality in the same sense as Granger causality (i.e. that the causes
precede the effects). Section 4 describes the general steps to estimate transfer
entropy. We use the same technique of k-nearest neighbor distances of [8] for
estimating mutual information, but we make use of a bootstrap technique suited
for stationary time series [12, 10], for approximating the distribution of the es-
timator. As a proof of concept, in Section 5, we report three applications of the
modified causality measure. Two with simulated data, one to assess the sen-
sitivity of the causality test to linear and non linear couplings in a system of
three variables, the other composed of two non stationary variables. The third
experiment is our reality check where we use real financial data: we try to deter-
mine the possible influence of a market index to another by testing for causality
among the German DAX index and the Spanish IBEX.
    Due to space restrictions we omit several technical details in this report.
These can be found in the extended document [14].


2    Granger Causality and Transfer Entropy
Let X = {Xt }, Y = {Yt }, Z = {Zt } be three stochastic processes defined on
a common probability space, from which we try to infer a causal interaction
between X and Y , and where Z represents the side information to complete
the system. We denote the realization of these random variables at time t as
xt , yt , zt . Further we use Xt , Yt and Zt to denote the state space vectors that
characterize the processes at time t, in this case we choose the whole collection
of random variables up to time t, to follow closely the definition given in [16].

Conditional Granger causality. It is said that X does not (Granger) cause
Y , relative to the side information, Z, if for all t ∈ Z, P (Yt |Xt−1 , Yt−k , Zt−1 ) =
P (Yt |Yt−k , Zt−1 ), where k ∈ N is the lag, and P (·|·) stands for conditional
probability distribution. In the bivariate case (unconditional causality) the side
information is omitted. To determine conditional Granger causality the following
vector regression models are considered

                Yt = LY (Yt−1 ) + LXY (Xt−1 ) + LZY (Zt−1 ) + Y,t
                Yt = L
                     e Y (Yt−1 ) + L
                                   e ZY (Zt−1 ) + e
                                                  Y,t

where LY , LXY , LZY , L
                       eY , L
                            e ZY are linear functions and Y,t , e
                                                                 Y,t are the residuals
of the regression of Yt on Xt and information Zt , and without Xt . Then one
can quantify the usefulness of including X in explaining Y using Geweke’s test
                                                               var(e Y,t )
based on the variances of the residuals [3]: FX→Y |Z = log                  .
                                                               var(Y,t )
    Note that the residual variance of the second regression will always be larger
or equal to the one from the first, so FX→Y |Z ≥ 0. For its statistical treatment,
it is known that the corresponding maximum likelihood estimator will have a
χ2 distribution under the null hypothesis FX→Y |Z = 0, and a non-central χ2
distribution under the alternative hypothesis FX→Y |Z > 0 [5, 3].

Transfer entropy. The transfer entropy of process X to Y , conditioned to the
side information Z, can be obtained from mutual information as
                      T EX−→Y |Z = I(Y + ; X − |Y − ⊕ Z − )
where A− denotes all information available of the past of A, A+ its immediate
future, and A ⊕ B represents the concatenation of random vectors A and B.
Wibral et al. in [16] argue extensively on the inadequacy of this formula in the
bivariate case for truly capturing cause-effect relations, and go on to propose
a transfer entropy with self prediction optimality at lag u, which extended to
account for the side information Z, has the form:
                  T ESP O:X−→Y,u|Z = I(Yt ; Xt−u |Yt−1 , Zt−1 )
Then they show,
Theorem 1. For two discrete-time random processes X,Y with a state-space
representation X, Y coupled from X to Y via a non-zero delay δ, T ESP O:X−→Y,u|Z
is maximal for u = δ. This also holds in presence of additional coupling from Y
to X.                                                                         t
                                                                              u
    This approach suits every requisite that we need to correctly estimate infor-
mation transfer, but in practice the associated estimator resulted in an unreli-
able predictor due to the amount of information about Xt−δ present in Xt−u ,
for 1 < u < δ, which causes it to assign a larger value to the information “re-
bounds” produced by the approximations assumed. (By rebound it is meant the
contribution of information due to indirect causality, e.g. X → Z and Z → Y .)
This can be corrected by using the state space representation for the target Yt :
                  T ESP O:X−→Y,u|Z = I(Yt ; Xt−u |Yt−1 , Zt−1 )                (1)
By the properties of the state-space representation we are not losing any condi-
tion that we imposed to the former estimator, and we obtain an advantage: we
are compensating the rebound effect of Xt−δ in Xt−u , because we swap the nu-
merical value of Yt for a state vector that contains information of the past which,
supposing that the interaction lag is constant for all t considered, accounts for
a more precise value for the unwanted rebounds, with a clear apex at the true
delay. On the other hand we have a disadvantage, related to the method used
to estimate conditioned densities. As a result of using an element of a larger
dimension we get a bigger error in the estimation.

3   Transfer Entropy is a measure of Causality
Barnett et al. in [1] showed that for random variables with a Gaussian distribu-
tion, measuring transfer entropy is equivalent to measuring Granger causality.
This result easily extends to the conditional causality concept. Formally, it can
be shown that FX→Y |Z = 2T EX−→Y |Z .
    Nevertheless, we want to evaluate our capacity to infer causality against
financial data, which is widely known to behave neither as stationary nor to be
Gaussian. For this motive it is very important to confirm that when measuring
information transfer between processes we are, indeed, testing for statistical
causality. We have not found any explicit proof of this proposition, but there are
some signs in the literature. One is an argument asserting that non-zero Granger
causality means non-zero transfer entropy [11], but it is a bit vague. Next is an
indication for a lower bound for mutual information [8], which resulted really
helpful in proving that for weakly stationary variables, not necessarily Gaussian,
the linear influences detected with Granger causality are no more than the ones
detected for Gaussian variables, and therefore we will detect them at the very
least. We state without proof the following proposition,
Proposition 1. Given two stochastic processes {Xt },{Yt }, and supposing that
X causes Y in the Granger sense, then T EX−→Y |Z ≥ 0                       t
                                                                           u
   Now, conditional transfer entropy relates to Granger causality as follows:
Theorem 2. Let {Xt }, {Yt }, {Zt } be three jointly distributed, weakly stationary
stochastic processes, defined in a common probability space. Then the standard
measure of Granger causality and transfer entropy are related by:
                              2T EX−→Y |Z ≥ FX→Y |Z
Proof. It suffices to prove that conditional mutual information is minimized by
Gaussian distributions. Since T EX−→Y |Z is only a specific form of a conditional
mutual information the result will follow.
    As proposed in [8] we will set a minimization problem on the mutual infor-
mation for continuous variables using Lagrange multipliers. If we use µ(x, y) =
µ0 (x, y|z), then the formula that we need to minimize does not change:
                                                      µ0 (x, y|z)
                               ZZ
                 I(X; Y |Z) =       µ0 (x, y|z)log 0                 dxdy
                                                  µx (x|z)µ0y (y|z)
                               ZZ
                                                  µ(x, y)
                             =      µ(x, y)log                dxdy
                                                µx (x)µy (y)
              RR                             R                           R
Recall that µ(x, y)dxdy = 1, µx (x) = µ(x, y)dy and µy (y) = µ(x, y)dx and
that
R 2 C is the covariance matrix R of the variables X, Y conditioned         on Z, so that
  x dxµX (x) = Cxx = σx , y 2 dyµY (y) = Cyy = σy , and
                                                                     RR
                                                                        xydxdyµ(x, y) =
Cxy . Then the Lagrangian can be written as:
                 ZZ                                       ZZ
                                  µ(x, y)
          L=−        µ(x, y)log               dxdy − λ1 (       µ(x, y)dxdy − 1)
                                µx (x)µy (y)
                    ZZ                                ZZ
               −λ2 (    x2 µ(x, y)dxdy − σx ) − λ3 (       y 2 µ(x, y)dxdy − σy )
                    ZZ
               −λ4 (    yxµ(x, y)dxdy − σxy )
              ∂L
Now impose ∂µ(x,y)  = 0, since the first and second order moments are finite, we
can swap the partial derivative with the integral sign, thus obtaining:

   ZZ             
            ∂                    µ(x, y)
0=−                 µ(x, y)log              +λ1 µ(x, y) +λ2 x2 µ(x, y)) +λ3 y 2 µ(x, y)
         ∂µ(x, y)              µx (x)µy (y)
                                                                
                                ∂
    +λ4 yxµ(x, y) dxdy +               λ1 + λ2 σx + λ3 σy + λ4 σxy ,
                            ∂µ(x, y)

which is zero if the derivative inside the integral is zero, as it is positive. Then:

             µ(x, y)                    1            1
0 = −log                − µ(x, y)     µ(x,y)
                                                                − λ1 − λ2 x2 − λ3 y 2 − λ4 yx
           µx (x)µy (y)                          µx (x)µy (y)
                                    µx (x)µy (y)

                                                       2        2
and hence µ(x, y) = µx (x)µy (y)e−(1+λ1 ) e−λ2 x −λ3 y −λ4 yx , where λ1 , λ2 (≥ 0),
λ3 (≥ 0), λ4 are constants fixed by the constraints. Therefore a probability den-
sity that follows this equation is consistent with a Gaussian. To prove that it is
the only one, integrate the expression of µ(x, y) over y and make the change of
variables x = −iζ/c:
           Z                Z
                                                         2     2
               µ(x, y) dy = µx (x)µy (y)e−(1+λ1 ) e−λ2 x −λ3 y −λ4 yx dy ⇒
                            Z
                                                         2     2
                  µX (x) = µx (x)µy (y)e−(1+λ1 ) e−λ2 x −λ3 y −λ4 yx dy ⇒
                            Z
                          2               2
            e(1+λ1 ) eλ2 x = µy (y)e−λ3 y −λ4 yx dy ⇒
                            Z
                      2   2                        2
       e(1+λ1 ) e−λ2 ζ /c = eiλ4 yζ/c [µY (y)e−λ3 y ] dy

The left hand side of this last equality is a Gaussian density function, and the
                                                    2
right hand side, its Fourier transform, µY (y)e−λ3 y . Since the Fourier transform
of a Gaussian is another Gaussian, and µY (y) cannot be a constant if it is a
density function, then µY (y) is also Gaussian. The same argument (integrating
over x) shows that µX (x) is Gaussian, thus the joint distribution of the random
variables X, Y that minimizes its mutual information conditioned to Z, µ(x, y) =
µ0 (x, y|z), is a Gaussian distribution.                                         t
                                                                                 u


4   Practical estimation of transfer entropy

In this section we outline our methodology used to estimate transfer entropy. Our
methodology follows the procedures outlined in [16], but we introduce some key
modifications: aside from our particular estimator which deals with conditional
mutual information, we test for null transfer entropy by obtaining a bootstrap
approximation of the distribution of the estimator, a non parametric procedure.
The state-space representation. First we need to reconstruct the corre-
sponding state-space of the interacting systems from a scalar time series. For this
purpose we use Takens delay embedding [15] with the parameters suggested by
Ragwitz [13]. Delay embedding states of the systems under consideration can be
                                         (m)
written as delay vectors of the form: Xt = (Xt , Xt−τ , Xt−2τ , . . . , Xt−(m−1)τ ),
where m and τ denote the embedding dimension and Takens’ embedding delay
respectively.

Embedding lag. To enhance the predictability between the state variables it
is key to choose the correct variables. Ragwitz’ criterion yields delay embedding
states that provide optimal self prediction for a large class of systems. It is based
in a close study of the properties of an assumed multivariate Langevin process
from which the data proceeds, assuming that the univariate series behaves locally
like a Markov chain.
    As for the delay parameter τ , in the limit of infinite precision of the data and
infinitely long time series, all values are equivalent. In a practical situation, how-
ever, a good choice of the delay is crucial. If τ is too large, successive elements of
the embedding vector are almost independent and the vectors fill an amorphous
cloud in the state-space. If τ is too small, successive elements of the embedding
vector are strongly correlated and all vectors are clustered around the “diago-
nal” of the state space. We will need to search for meaningful neighbors of these
vectors in the next two steps, and in both extreme cases they are difficult to ob-
tain. Then, the simplest reasonable estimate of an optimal delay is the first zero
of the autocorrelation function ρ(k) = γ(k)/γ(0), where γ(k) = Cov(Xt , Xt+k ),
k ∈ Z. To discern the first zero, our program performs the test
                      H0 : ρ(k) = 0 versus       H1 : ρ(k) 6= 0

Embedding dimension. To select the dimension of the vectors of the re-
constructed state-space we use the false nearest neighbors method [9, 7]. This
method compares distances of m-valued state-space vectors to its k-nearest
neighbor (knn) with distances for (m + 1)-valued state-space vectors from the
same data. These distances are really easy to calculate considering that all vec-
tors have a similar form once we have the estimated delay. Only the knn search
may prove to be a problem if the dimension becomes too big. If the distances
grow too much with the growth in dimension, the neighbors are deemed as false,
and we try again increasing the dimension by one, until no false neighbor is
found. In this way the data itself dictate the embedding dimension, in a way
that it approximates correctly the dimension of the manifold in which the states
of the system reside.

Mutual Information estimation Once we have the parameters involved in
the state space representation of both uni-valued series of data, we apply the
formula (1) to find the transfer entropy from X to Y:
   T ESP O:X−→Y,u = I(Yt ; Xt−u |Yt−1 ) = I(Yt ; Xt−u , Yt−1 ) − I(Yt ; Yt−1 )
(We are not considering a third variable Zt symbolizing side information to
simplify notation, the inclusion of which would mean four mutual information
terms in the previous formula instead of two.)
    To estimate the mutual information I(A; B) between two multivariate time
series, A and B, we use the k-nearest neighbor estimator suggested in [8]. When
we use this method the only assumption behind is a certain smoothness of the
underlying probability functions, so it fits in as a non-parametric technique. This
estimator has the form

              ˆ B) = Ψ (k) − hΨ (nA + 1) + Ψ (nB + 1)i + Ψ (N )
              I(A,

where N is the length of the series used, nA is the number of points in A whose
pairwise distance is below a certain given threshold (similarly define nB ), and
Ψ (x) is the digamma function: Ψ (x) = Γ (x)−1 dΓdx(x) . The digamma function
satisfies the recursion Ψ (x + 1) = Ψ (x) + 1/x and Ψ (1) = −C, with C being the
Euler-Mascheroni constant.


Bootstrap sampling Once we have the estimators of the Transfer Entropy be-
tween our time series as a function of the lag, we approximate their distribution
via a bootstrap experiment. Take bootstrap replications of the series and re-
compute the estimators as usual. Bootstrapping stationary time series has been
widely studied in [12] and [10] and we follow their procedures. This allows us
to assess (for each lag) whether there is no causation (transfer entropy equal to
zero) by looking at the bootstrap confidence intervals. We think that this is a
more natural way of approximating the distribution of the statistic of the trans-
fer entropy estimator, than the test conducted in [16] which consist in comparing
the observed TE with that of surrogate data obtained by permutations of the
source.


5   Simulations and applications in finance

We present several experiments that we have performed to assess the efficiency
of our method to estimate transfer entropy, as well an application to real data.


Example 1. Consider an autoregressive stationary system in three variables
with one linear coupling (Y → Z), and two non-linear ones (X → Y and X →
Z), proposed in [4]. We will use different lags in each interaction to be able to
clearly differentiate them, as well as the “rebounds” produced in an indirect
cause, specifically δX→Y = 10, δX→Z = 5, δY →Z = 15. The system satisfies:
                                       2
       Xt = 3.4Xt−1 (1 − Xt−1 )2 eXt−1 + 0.41,t
                                   2
       Yt = 3.4Yt−1 (1 − Yt−1 )2 eYt−1 + 0.5Xt−10
                                             2
                                                  + 0.42,t
                                   2
       Zt = 3.4Zt−1 (1 − Zt−1 )2 eZt−1 + 0.3Yt−15 + 0.5Xt−5 Zt−1 + 0.43,t
where 1,t , 2,t , 3,t are Gaussian white noise terms with identity covariance ma-
trix. To simulate it, we have taken the 30 initial terms of each series as 0, and
begin applying the formula from there for a total of 104 steps, and the last 103
were used to test the transfer entropy estimator, as the arbitrary initial condi-
tions should have little effect then. We have performed the geometric time series
bootstrap 60 times for each lag, with average sizes of length 100. We then have
drawn 90% confidence intervals. We have estimated up to lag 30 to save com-
putation time, as it should be enough to recover the causal interactions. The
results are presented in Figure 1.
    Observe that we have recovered all the true delays as clearly distinct max-
imums of the transfer entropy versus the lag plots. Its important to highlight
that the scale in the y-axis, that is the transfer entropy magnitude, is different in
each graph, as our estimator depends on the embedding dimension, so each test
is unique and we must look at the error bars generated in the bootstrap process
to look at it with perspective. That being said, variation in TE depending on
extremely different embedding dimension reaches a 30% of the larger value, and
in this graph we can make conclusions about the strength of the interactions
as we see differences of an order of magnitude. That is why, as expected, we
can consider null the Z → X and Y → X interactions, as the measured TE
for the lag values statistically different to zero is really small. There is a hint of
a pattern, but it should be no more than a residual influence in the estimator,
motivated by the real influence in the opposite direction. The Z → Y interaction
is also fairly small, but really interesting: Z does not influence directly Y , but X
influences both, first Z at u = 5 and Y at u = 10, therefore Zt−5 has important
information on Y via X. Here we have a good example of when we would need
to apply the whole TE formula, conditioning to a known overarching series.
    On the other hand, the autoregressive interaction of all the variables, rep-
resented in the diagonal, has been recovered at a maximum in the u = 1 lag,
with decreasing intensity as the influence of the past of the series declines and
stabilizing at 0 after. The X → Y interaction is prominently recovered in a sim-
ilar fashion, at u = 10, also showing the decreasing remnants of information still
present at values around the true delay. The linear interaction X → Y is also
displayed with a maximum at u = 15. And the interaction X → Z shows two
interesting properties of our estimator, aside of successfully detecting the influ-
ence at u = 5 with a distinct local maximum. We see how we detect a maximum
at u = 25, which corresponds to the interaction X → Y → Z, which means that
we successfully detect indirect interactions, as it displays a strength similar to
Y → Z. The other important detail is to check how the variances of the indirect
interaction are bigger than those of the direct one.
   We repeat the same analysis using a linear Granger causality estimator, pro-
vided in the R library MSBVAR. We used the regressions up to lag u = 15, as we
did not want the indirect influences to have enough steps to affect the test over
the direct interactions that are completed at this lag. The results obtained are
shown in Table 1.
                          X−>X                                            X−>Y                                          X−>Z




                                                     1.2




                                                                                                     0.4
                                                     1.0




                                                                                                     0.3
     3




                                                     0.8




                                                                                                     0.2
                                                     0.6
     2
TE




                                                TE




                                                                                                TE

                                                                                                     0.1
                                                     0.4
     1




                                                     0.2




                                                                                                     0.0
                                                     0.0
     0




             0   5   10   15     20   25   30                0   5   10   15     20   25   30              0   5   10   15     20   25   30

                          Lag                                             Lag                                           Lag



                          Y−>X                                            Y−>Y                                          Y−>Z
     0.10




                                                     2.0




                                                                                                     0.4
                                                     1.5




                                                                                                     0.3
     0.05




                                                                                                     0.2
                                                     1.0
TE




                                                TE




                                                                                                TE
     0.00




                                                                                                     0.1
                                                     0.5




                                                                                                     0.0
                                                     0.0
     −0.05




             0   5   10   15     20   25   30                0   5   10   15     20   25   30              0   5   10   15     20   25   30

                          Lag                                             Lag                                           Lag



                          Z−>X                                            Z−>Y                                          Z−>Z
                                                                                                     4
                                                     0.25
     0.10




                                                                                                     3
                                                     0.15
     0.05




                                                                                                     2
TE




                                                TE




                                                                                                TE
                                                     0.05




                                                                                                     1
     0.00




                                                     −0.05




                                                                                                     0




             0   5   10   15     20   25   30                0   5   10   15     20   25   30              0   5   10   15     20   25   30

                          Lag                                             Lag                                           Lag




Fig. 1. Graphs representing the estimated transfer entropy depending on the lag of the
supposed cause, for each of the nine possible interactions of the system.




    The test recovers swiftly the Y → Z linear interaction, and also the X → Y
quadratic interaction, but fails at detecting any interaction in X → Z, which is
highly non-linear at true lag u = 5. To better check the responsiveness of the
test we have repeated it considering various smaller lag values for X, and we
Table 1. Granger test statistics and p-values for different interactions in the system

                                                    F-statistic                 p-value
                            Y →X                   1.0111734                  0.44062932
                            Z→X                    1.5610944                  0.07815942
                            X→Y                   278.3554360                 0.00000000
                            Z→Y                    1.2341100                  0.23930984
                            X→Z                    0.7986152                  0.68012265
                            Y →Z                   18.8199782                 0.00000000



never obtained a favorable p-value. Our TE test shows the various interactions
in the system (direct or indirect), justifying its appropriateness when dealing
with non-linear systems.                                                      t
                                                                              u


                                 X−>X                                                  X−>Y
           4




                                                               0.4
                                                               0.3
           3




                                                               0.2
      TE




                                                          TE
           2




                                                               0.1
           1




                                                               −0.1 0.0
           0




                   0   5   10     15    20   25     30                    0   5   10   15     20   25   30

                                  Lag                                                  Lag



                                 Y−>X                                                  Y−>Y
                                                               4
           0.10




                                                               3
           0.05




                                                               2
      TE




                                                          TE
           0.00




                                                               1
           −0.05




                                                               0




                   0   5   10     15    20   25     30                    0   5   10   15     20   25   30

                                  Lag                                                  Lag




Fig. 2. Graphs representing the estimated transfer entropy depending on the lag. This
time both variables are non stationary.




Example 2. Next we propose a similar (but non stationary) system. Consider
two processes (X → Y ) with a true delay δX→Y = 10 and random variance that
follows an IGARCH(1,1) model:

                            Xt = 0.7Xt−1 + σ1,t 1,t
                                Yt = 0.3Yt−1 + 0.5Xt−10 Yt−2 + σ2,t 2,t
                             2
                            σi,t = 0.2 + 0.92i,t−1 + 0.1σi,t−1
                                                          2
    The results are displayed in Figure 2 and the analysis is similar to the previous
one. We recover perfectly the autoregressive tendencies, and we obtain a fairly
clear view of the X → Y interaction, similar to the X → Z interaction in
Example 1, but now it takes a larger time to fade to zero, so we can sense the
distortion generated by the variance, although we can still identify the true delay
with precision.                                                                     t
                                                                                    u


Example 3. As a real world application of our estimator we take a look at the
relationship between two major European stock indices, the German DAX-30
and the Spanish IBEX-35, between January 1st, 2011 and June 24th, 2016, that
constitute approximately 1200 data points, as we have a daily frequency data
feed. We have tested for transfer entropy the log-returns (log(Pt /Pt−1 ), where
Pt is the adjusted close price of index at day t) of the DAX to the log-returns of
IBEX, as we suppose that the territorially more influential stock market affects
the smaller, especially in the current context where Germany holds a major
portion of Spain’s debt. The results are represented in Figure 3. We can see a
strong influence detected in lag u = 1 that fades quickly in 8 days. It is logical to
think that real stock markets react to their sphere of influence quickly enough
that daily data does not allow for precise detection of the interaction lag, as
we expect it to be much smaller, but we can definitely conclude that there is a
causal influence present.                                                           t
                                                                                    u


                                        DAX−>IBEX
                    1.2
                    1.0
                    0.8
               TE

                    0.6
                    0.4
                    0.2
                    0.0




                          0   5    10     15        20   25     30

                                           Lag




Fig. 3. Transfer entropy of the DAX-30 index to the IBEX-35 depending on the lag of
the interaction




Acknowledgments

A. Cabaña acknowledges support of MINECO projec MTM2015-69493-R.
A. Arratia acknowledges support of MINECO project APCOM (TIN2014-57226-
P), and Gen. Cat. SGR2014-890 (MACDA).
References
 1. Barnett, L., Barnett, A.B., Seth, A.K.: Granger causality and transfer entropy are
    equivalent for gaussian variables. Phys. Rev. Lett 103, 238701 (2009)
 2. Diks, C., Wolski, M.: Nonlinear granger causality: Guidelines for multivariate anal-
    ysis. Journal of Applied Econometrics (2015)
 3. Geweke, J.: Measurement of linear dependence and feedback between multiple time
    series. J. J. Amer. Stat. Assoc. 77, 304–313 (1982)
 4. Gourévitch, B., Bouquin-Jeannès, R.L., Faucon, G.: Linear and nonlinear causality
    between signals: methods, examples and neurophysiological applications. Biological
    Cybernetics 95, 349–369 (2006)
 5. Granger, C.W.J.: Testing for causality: A personal viewpoint. J. Econ. Dyn. Con-
    trol 2, 329–352 (1980)
 6. Granger, C.: Investigating causal relations by econometric models and cross-
    spectral methods. Econometrica 37, 424–438 (1969)
 7. Hegger, R., Kantz, H.: Improved false nearest neighbor method to detect deter-
    minism in time series data. Physical Review E 60(4), 4970 (1999)
 8. Karasov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phy.
    Rev. E 69, 066138 (2004)
 9. Kennel, M., Brown, R., Abarbanel, H.: Determining embedding dimension for
    phase-space reconstruction using a geometrical construction. Physical Review A
    45(6), 3403–3411 (1992)
10. Kunsch, H.R.: The jackknife and the bootstrap for general stationary observations.
    Annals of Statistics 17, 1217–1241 (1989)
11. Marinazzo, D., Pellicoro, M., Stramaglia, S.: Kernel method for nonlinear granger
    causality. Phys. Rev. Lett. 100, 144103 (2008)
12. Politis, D.N., Romano, J.P.: The stationary bootstrap. Journal of the American
    Statistical Association 89, 1303–1313 (1994)
13. Ragwiz, M., Kantz, H.: Markov models from data by simple nonlinear time series
    predictors in delay embedding spaces. Phys. Rev. E 65, 056201 (2002)
14. Serès-Cabasés, A.: Causality via transfer entropy. Bachelor Sc. Thesis, Universitat
    Autònoma de Barcelona (2016)
15. Takens, F.: Dynamical systems and turbulence. In: Lecture Notes in Mathematics,
    vol. 898, Warwick 1980 Symp., pp. 366–381. Springer (1981)
16. Wibral, M., Pampu, N., Priesemann, V., Siebenhühner, F., Seiwert, H., Linder,
    M., Lizier, J., Vicente, R.: Measuring information-transfer delays. PLoS ONE 8(2),
    e55809 (2013)
17. Zaremba, A., Aste, T.: Measures of causality in complex datasets with application
    to financial data. Entropy 16, 2309–2349 (2014)