INTRODUCTION

Joint Conference (March

Structural Change Point Detection Using A Large Random Matrix and Sparse Modeling

Katsuya Ito∗

katsuya1ito@gmail.com 0

Akira Kinoshita

kino@preferred.jp 1

Masashi Yoshikawa

yoshikawa@preferred.jp 1 0 Graduate School of Economics, The University of Tokyo , Bunkyo, Tokyo , Japan 1 Preferred Networks, Inc. , Chiyoda, Tokyo , Japan

2019

26 2019

This paper proposes a new structural change point detection method for time series using our new matrix decomposition method. We propose YMN decomposition which decomposes one time-series data matrix into a large random matrix and a sparse matrix. YMN decomposition can obtain the information about the latent structure from the sparse matrix. Thus our structural change point detection method using YMN decomposition can detect the change in higher-order moments of a mixing matrix as well as typical structural changes such as the change in mean, variance, and autocorrelation of time series. We also partly theorize our methods using existing theories of random matrix and statistics. Our experiment using artificial data demonstrates the efectiveness of our change point detection techniques and our experiment using real data demonstrates that our methods can detect structural changes in economics and finance.

INTRODUCTION

Unveiling the model of time-series data is necessary to explain the causal relationships and to forecast the future. In economic and financial modeling, the models of time-series data are not apparent in many cases; moreover, their models are time varying [ 2, 28, 32, 34 ]. Under such obscurity and instability of the models, explaining the causal relationships among data and forecasting the future are dificult but central problems in these fields [ 14, 16 ]. One straightforward method to overcome the instability is using change-point detection methods. Change-point detection methods divide piece-wise stable time-series data into several stable timeseries data [ 38 ]. Moreover, detecting the change point of the structure is an essential task in time-series modeling in economics and finance [ 2 ]. Research with a similar purpose can be found in many fields such as biology [ 8 ], neuroscience [ 40 ], and computer network analysis [ 1 ].

Structural change point detection is dificult because we do not know the structure of the time-series before/after the change point, and how the change occurs. However, factor modeling is typically used to unveil the partial structures in economics [ 6 ] and finance [ 33 ]. Economic or financial data such as GDPs and stock prices are observed as high-dimensional time series data. Assume that we observe a vector yt at time t . A factor model assumes that a few typical “factors” generate the observed data. It formally introduces two latent variables A and zt and assumes that the linear equation holds:

yt = Azt . ∗This work was performed when the author was an intern at Preferred Networks, Inc.

Field Robotics Neuro

science yt (observed)

Sensor data EEG/fMRI Network Trafic data Economics Macro data Finance Stock prices

zt (latent) Motion of parts Waves from parts

Packets Factors Factors A (structure)

Structure of data

Structure of brain Network

Structure Economic Structure Structure of Market zt is a low-dimensional time-variant vector of factors, while A is a time-invariant coeficient matrix. A is called “factor loadings” in the context of econometrics and “mixing matrix” in the context of computer science. A implies the latent “structure” behind the observed data. However, the structure may be changed abruptly in reality because of major economic or political events such as Brexit or the Global Financial Crisis. Most of the work performed by economists and quantitative financial analysts, such as the prediction and explanation of time-series, strongly rely on their models [ 19, 37 ]. It is, therefore, crucial for them to detect changes in their structures. As summarized in Table 1, the structures (i.e. the relationships between observed and latent data) are important, and similar types of problems occur in many fields.

The change exhibits several patterns. The simplest one is the change in the mean and variance of A’s entities; it can be detected by merely calculating the mean and variance of y. The change in the numbers of factors and the change in the autocorrelation of y are also the essential but straightforward ones [ 9, 12, 20 ]. The third- and fourth-order moments are also significant in the literature of finance and econometrics, which attach special importance to them [ 15 ]. However, they are dificult to be detected because in high-dimensional cases, they approach the normal distribution by the central limit theorem1 and the change cannot be observed in y. To the best of our knowledge, nonparametric methods that detect changes of such skewness and kurtosis do not exist, whereas many parametric methods do exist. [ 4, 30, 31 ].

Many previous studies showed the efectiveness of linear models in detecting the structural changes of high-dimensional data. Well-known methods such as principal component analysis (PCA), non-negative matrix factorization (NMF), and independent component analysis (ICA) can be interpreted as methods to 1This interpretation of the central limit theorem is often used in ICA literature [ 23 ]. In this literature, skew and kurtosis are interpreted as non-Gaussianity and by using many samples, the non-Gaussianity of data will decrease by the central limit theorem. decompose one data matrix into two matrices that satisfy specific properties. Moreover, a variant of these methods that specializes in change point detection is developed such as OMWRPCA [ 41 ], ILRMA [ 24 ], and TVICA [ 13 ].

We herein propose a new non-parametric change point detection method of the structures of multidimensional time series. Our method decomposes the observed data matrix into a large random matrix and a sparse coeficient matrix. First, we use randomly generated large matrices as latent time series. However, when a large random time series row vectors is generated, most of its row vectors are not related to the latent time series, and some are close to the latent time series. Therefore, we select good row vectors by applying a sparse matrix to this. Finally, we can detect the structural changes by calculating the diference of the sparse matrix. Some previous works reported similar frameworks such as latent feature Lasso [ 42 ] or extreme learning machine [ 22 ]; however, studies using both the random matrix and the sparse matrix have not been reported.

Moreover, we defined a new 2 type of change point, i.e., “the third or fourth moment of factor loading changes”. Our experiment showed that our method can detect these change points. Our framework does not exert power in other tasks such as prediction and factor analysis. Meanwhile, in the context of non-parametric change point detection, it has a high expressive power.

The main contributions of this paper are as follows. • We propose YMN decomposition, which decompose one time-series data matrix into a product of a random matrix and a sparse matrix. • We propose a structural change point detection method using YMN decomposition. • We demonstrate the efectiveness of our structural change point detection techniques using experimental data and real data.

The remainder of the paper is organized as follows. In the next section, we present the related work. In Section 3, we clearly define the problems to solve. In Section 4, we propose the method. The method is evaluated in Section 5. We discuss the experimental results, issues, and future work in Section 6. Finally, we conclude the paper in Section 7. 2

RELATED WORK

This paper proposes a new structural change point detection method using nonparametric matrix decomposition. In this section, we review the previous research that is related to ours in terms of (1) method and (2) purpose. We do not review the works that are not directly related to ours. Therefore, we refer to the survey by Truong et al. [ 38 ] for a general review of change point detection methods. 2.1

Non-parametric change point detection

As mentioned in the introduction, change point detection can be divided into two types: (1) parametric and (2) nonparametric change point detection.

Because parametric change point detection places strong assumptions on the distribution of time series, it exhibits strong capability within the assumption of the model users, but any unexpected change point cannot be detected. Meanwhile, nonparametric change point detection methods place only a few assumptions on the time-series distribution of data. Therefore, 2This concept is not new. However, no research in econometrics has defined this type of change yet. the model is free and can present a robust result; however, their results often cannot be interpreted, and they can be less accurate than parametric methods in some situations, e.g., the observed time-series data completely follows one stochastic model.

Nonparametric change point detection methods are primarily divided into two types. One type uses matrix decomposition and the other uses kernel methods. The kernel methods [ 21, 26 ] demonstrate a high ability when handling data with nonlinear models; however, they are similar to parametric methods in that the results strongly depend on the choice of the kernel functions. On the other hand, methods that fit each purpose and each data’s property are developed in each field, such as Robust PCA [ 41 ], NMF [ 24 ], and ICA [ 3 ]. However, because a unique method to decompose one matrix into two matrices does not exist, it is necessary to place some assumptions on both matrices. These assumptions significantly restrict the degree of freedom of the model. PCA assumes no covariance on latent data,ICA assumes independence on latent data. Hence, the method of matrix decomposition used in each field difers significantly. 2.2

Structural change point detection

Next, we will explain the structural change point detection methods. In the parametric method, it is possible to assume a graph structure or a time-series structure such as auto-regressive (AR) and moving average [ 25, 40 ]. Meanwhile, in nonparametric methods, such a strong assumption cannot be made originally, but many empirical studies indicate the magnitude of the expressive power of the linear model. Among them, research to verify the magnitude of the changes in the linear model factor loadings is not attracting much attention; however, it is valuable in many ifelds including economics, finance, biology, neuroscience, and computer network analysis. However, economics and finance contain rich multidimensional time-series data from the beginning, and they unveil both their economic structure and structural changes. Changes in the number of factors are the most popular research topics [ 5 ], and the structural changes in AR models are also popular [ 39 ]. Recently, a method using wavelet transform and PCA has been developed, and this method can be applied to both structural changes above [ 7 ]. Meanwhile, in the finance literature, a method that uses ICA has been developed [ 13 ].

Finally, we clarify our position. First, our method is an online change point detection method using nonparametric matrix decomposition. Our method can be considered a matrix decomposition method; meanwhile, it is close to the kernel method in that it assumes that the latent time-series follows some model. However, in the context of online change point detection, time-series are generated within a small window width; therefore, the randomly generated time-series variation is large, and our method can be considered as model-free. Meanwhile, our purpose is to detect the change point of the structure, and can be applied to all the structural changes defined above. It is also applicable to new types of changes such as changes in higher moments of the mixing matrix. Under such circumstances, our method can be regarded as a new nonparametric method to detect structural changes. 3

PROBLEM STATEMENT

In this section, we mathematically and clearly explain the models that we will investigate, define the structural change that we want to detect and explain the change according to the real-world applications.Table 2 summarizes the notations used herein.

We begin from the generative model. We observe a vector yt ∈ RD at every time t . We assume that the vector yt is generated as a mixture of latent time series

yt = At zt , where At ∈ RD×K is a mixing matrix and zt ∈ RK is a latent time series. We can observe only yt and that both At and zt are latent.

As is often the case in an econometric setting zt is assumed to follow one stable deterministic or stochastic model. At is assumed to be piece-wise constant with respect to time and shows the structure behind data. Therefore, At may change abruptly because major events occurred and the relation between the latent and the observed changed. In this study, we denote t ∗ as the time of a structural change point if the following holds:

At = ( A(1) A(2) where A(1), A(2) ∈ RD×K are constants and A(1) , A(2). Meanwhile, latent random signal zt is assumed to follow a Gaussian distribution at any time, and it is independent and identically distributed (i.i.d.) with respect to time:

zt ∼ N (0, I K ). where I K is an identity matrix in RK ×K . Finally, the problem is to obtain the time of the structural change point t ∗, and the estimation of At is of no interest.

As for the type of changes that we will detect, we will follow the literature of econometric and finance [ 5, 7 ]. The structural changes defined in this literature can be divided into three types, that is (1) change in the n-th moment of At (2) change in the dimension of At and zt (3) change in the autocorrelation of yt Finally, we will explain the real-world applications and interpretation of these three changes.

Change (1) is the simplest change. For example, the 1st-order moment change in the At means the abrupt increase and decrease of the values. The 2nd-order moment change in the At means some of the latent factors are increasingly or decreasingly correlated to the observed data. The 4th-order moment change in the At means the increase or decrease in the number of non-zero entries of At .

Change (2) is also under intense investigation in econometrics because a new connection between the latent and observed timeseries are recognized and the explanation capability of the old model decreased if the number of factors increased. Although this is contrary to the setting, “The dimension of latent data K is a constant,” we will address this type of change point similarly.

Change (3) appears not related to our models, but by substituting ⟨zt , yt −1, yt −2, · · · ⟩ for zt , we can also address the time-series models. This change is also important in many literature because the time-series becomes less predictable from the observed timeseries if the autocorrelation of the y decreased and vice versa. 4

METHOD

In this section, we propose our structural change point detection method. Our goal is to detect the change in the matrix At . Although both coeficients At and factors zt are latent, the theory of random matrix and statistics [ 10, 11, 36 ] suggests that we can approximate zt by selecting some vectors in a large random matrix, and that At can be determined by Lasso regression. We ifrst decompose the matrix of observed vectors based on this suggestion, and subsequently evaluate the change in the estimated At . 4.1

Theory of Random Matrix and Lasso

In this section, we describe the methodology and the theory of our method. First, we will explain the problem, and how and why that problem can be solved.

As described in the previous section, our goal is to detect the change in At . Most of the existing methods such as PCA, NMF, and ICA solve this problem by estimating the At and subsequently detect the change in the estimated At . However, we will directly calculate the change d(At , At −1) for an ideal distance of matrix d, and we are not interested in estimating At .

To calculate the change in structure d(At , At −1), our method decomposes the observed data yt into a sparse matrix Mt and a large random matrix Nt , and subsequently calculates the change in Mt . How our method operates can be explained by the following three basic ideas.

(1) If we generate many random vectors {Nt(i)}iQ=1, the linear combination of {N t(i)}iQ=1 may be similar to the true latent variables zt . (2) If we perform a Lasso regression analysis, setting Nt (random vectors) as explanatory variables and yt (observed time-series) as explained variables, then random vectors in Nt that are similar to zt may be automatically chosen and used to explain yt . (3) If we perform such a Lasso regression analysis and obtain Mt as a sparse coeficient matrix (i.e., yt = Mt Nt ), the change in Mt may be related with the change in At (structural change = true coeficient matrix’s change). We explain our basic ideas briefly but they are mathematically formulated and confirmed by the existing random matrix theory and statistical theory in this section. For idea (1), we can use the random matrix theory. Suppose we generated K -dimensional random vectors denoted by {Nt(i)}iQ=1 for Q times. As above, we are considering the case where Q is much larger than K . For example, in our experiments, we consider the case with Q = 100 max{D, K }. If the matrix ⟨Nt(1), · · · Nt(Q )⟩ is full rank (i.e., rank = K ), some of the {Nt(i)}iQ=1 are linearly independent and we can obtain any vector in RK (including Zt ) by the linear combination of {Nt(i)}iQ=1 . The situation where the matrix is full rank can be interpreted in that any singular value of the matrix is non-zero. Many studies regarding the singular value of the random matrix have been conducted [ 35 ], and we used the result of Tao and Vu [ 36 ].

Proposition 4.1 (Theorem 1.3 [ 36 ]). Let ξ be a real random variable, Mn (ξ ) be the random n × n matrix whose entries are i.i.d. copies of ξ , and σi (M) be the i-th largest singular value of a matrix M.

Suppose that E[ξ ] = 0, E[ξ 2] = 1 and E[ξ C0 ] < ∞ for some suficiently large absolute constant C0. Subsequently, for all t > 0, we have

p(nσn (Mn (ξ ))2 ≤ t ) = 1 − e−t /2−√t + O(n−c ) where c > 0 is an absolute constant and implied constants in the O(.) The notation depends on E[ξ C0 ] but are uniform on t

This proposition can be used in any normalized square, random matrix (i.e., each entry are i.i.d. random variables whose mean and variance are 0 and 1, respectively). The stronger result is obtained in the Gaussian case (Theorem 1.1 in [ 11 ]). Moreover, these results can be extended to rectangular matrices (Theorem 6.5 in [ 11 ]).

Idea (2) is mathematically formulated as follows.

Proposition 4.2 (mathematical formulation of idea (2)). Let • zt 1, · · · , zt n ∈ R be i.i.d random variables. • a1, · · · an ∈ R be constants. • yt := a1zt 1 + · · · + anzt n for all t = 0, 1, · · · , T − 1 • wt 1, · · · , wtm ∈ R be i.i.d random variables • where each pairs zt i , wt j are independent. and we estimate the coeficients by Lasso, such that

yt = b1zt 1 + · · · + bnzt n + bn+1wt 1 + · · · + bn+mwtm holds. Then, as T → ∞.

• bi → ai for i = 1, · · · , n • bi → 0 for i = n + 1, · · · n + m

To prove this proposition, we use a seminal work by Candes and Tao [ 10, 11 ]. They proved this proposition within this assumption (restricted isometry constants)

Definition 4.3 (Definition 1.1 [ 10 ] Restricted Isometry Constants). Let F be the matrix with the finite collection of vectors (vj )j ∈J ∈ Rp as columns. For every integer 1 ≤ S ≤ | J |, we define the S-restricted isometry constants δS to be the smallest quantity such that FT obeys

(1 − δS )∥c ∥2 ≤ ∥FT c ∥2 ≤ (1 + δS )∥c ∥2 for all subset T ⊂ J of cardinality |T | ≤ S, and all real coeficients (cj )j ∈T . Similarly, we define the S, S ′-restricted orthogonality constants θS,S′ for S + S ′ ≤ | J | to be the smallest quantity such that

|⟨FT c, FT c ′⟩| ≤ θS,S′ · ∥c ∥ ∥c ′∥ holds for all disjoint sets T , T ′ ⊂ J of cardinality |T | ≤ S and |T ′| ≤ S ′.

The numbers δS and θS,S′ measure how close the vectors are to behaving as an orthonormal system. Subsequently, Proposition 4.2 can be proven as follows.

Proposition 4.4 (Theorem 1.1 [ 11 ]). Consider the linear model y = X β + z where X ∈ Rn×p , y ∈ Rn , z ∼ N (0, σ 2In ) Suppose β ∈ Rp is a vector of parameters that satisfies • ||β ∥L0 = |{i | βi , 0}| = S • δ2S + θS,2S < 1 where δ2S , θS,2S are the constants defined in Definition Def4.3. Here, we will estimate β ′ by setting λp = p2 log p

β ′ = argminβ˜ ∈Rp ∥β˜∥L1 subject to ∥X ∗(y − X β˜)∥L∞ ≤ λp σ Then β ′ obeys

∥β ′ − β ∥L22 ≤ C12 · (2 log p) · S · σ 2 where C1 = 4/(1 − δS − θS,2S ) with large probability. 3

4.2 YMN Matrix Decomposition and Structural Change Point Detection

Idea (1) can be interpreted as, "we can generate the true latent time series zt ;" idea (2) can be interpreted as "we can obtain the true latent time series zt ". Therefore, by combining ideas (1) and (2), we will obtain idea (3), "we can generate and obtain the true latent time-series and the true latent mixing matrix."

Conjecture 4.5 (mathematical formulation of idea (3)). Let • yt = At zt • yt ∈ RD×T be an observed matrix, • zt ∈ RK ×T be a latent matrix, • At ∈ RD×K be a mixing matrix, • Nt ∈ RQ ×T be a large random matrix. • Mt ∈ RD×Q be the sparsest matrix such that ∥yt − Mt Nt ∥ ≤ (1 − θ )∥yt ∥ holds for a given hyperparameter θ .

Then, distances d1 and d2 exist in RD×K and RD×Q such that d1(At1 , At2 ) ≃ d2(Mt1 , Mt2 ) holds for all 0 < t1 < t2 < T

This conjecture suggests that we can approximate the change of mixing matrix d1(At1 , At2 ) by the change of the sparse matrix d2(Mt1 , Mt2 ) and that we don’t have to directly identify A in change point detection problems. In the remainder of this section, we show the matrix decomposition algorithm and the change point detection algorithm based on this conjecture. We do not know the properties of the distances d1andd2. However, we used the function d2(Mt1 , Mt2 ) := ∥Mt1 ∥L2 /∥Mt2 ∥L2 , which is not a distance but is robust under the permutations of the row vectors of M .

Algorithm 14 shows how our matrix decomposition using a large random matrix is conducted. Our ultimate goal is to decompose the observed data yt into a random matrix Nt and sparse matrix Mt such that the change in the Mt is related to the change in At . To obtain a sparse matrix Mt , we first generate a large random matrix Nt ; subsequently, our method minimizes the sparseness of the coeficient matrix within a constraint ∥Y −Mt Nt ∥ ≤ (1 −θ )∥Y ∥ where Y is the D ×T matrix whose row vectors are yt . Algorithm 2 shows our change point detection 3 The term "with large probability" is used in the original paper [ 11 ]. This term means that the probability that the equation holds is above 1 − √π 1log p 4We name this algorithm "YMN" after the equation Y = M N .M is the first initial of the mixing matrix and mechanism, and N is the first initial of the normalized random matrix.

Algorithm 1 Matrix decomposition using a large random matrix (YMN)

Definition

Y : D × T matrix that we want to decompose M: D × K sparse matrix N : K × T large random matrix Ni : 1 × T i-th row vector of N E[x ], V [x ]: the mean and variance of vector x ’s entities θ : threshold value of Lasso fitting score α : hyperparameter of Lasso, i.e., the coeficient of the L1 term δ : step value of α , used during hyperparameter optimization Lasso(X , T ; α ): matrix W that minimizes ||T − W X ||2 +α ||W ||1

End Definition

N = ⟨N0, · · · , NK −1⟩ ∼ N (0, I ) for i = 0, · · · , K − 1 do

Ni ⇐ (Ni − E[Ni ])/pV [Ni ] end for M ⇐ Lasso(Y , N ; α ) while ∥Y − M N ∥ ≤ (1 − θ )∥Y ∥ do α ⇐ α + δ

M ⇐ Lasso(Y , N ; α ) end while algorithm. First, we use the series of window frames Yt for each time and perform the matrix decomposition of Yt . At this time, we have a series of matrix Mt . We calculate the change point score using the distance between Mt and Mt +1.

Algorithm 2 Change point detection algorithm using YMN decomposition

Definition

w: window size, t : current time yt : series vector, st : change point score ⟨, ⟩: concatenation of vectors into a matrix.

MatrixDecomp(Y ): Decompose Y into two matrices by YMN decomposition.

End Definition

for t = 0, · · · , T − w − 1 do Yt ⇐ ⟨yt , yt +1, . . . , yt +w −1⟩ Nt Mt ⇐ MatrixDecomp(Yt ) st ⇐ d2(Mt , Mt +1) end for 5

EXPERIMENT

To demonstrate that our methods can detect structural change points, we conducted two experiments. First, we conducted experiments in artificial settings where their structures and changes are clear. We demonstrate which type of structural change points can be detected and how these changes are detected clearly by these experiments.

Subsequently, we conducted experiments in real-world settings where the structures and changes in structures are not clear. We used economics [ 18 ] and financial [ 17 ] data that are popular in the literature of structural change point detection. We demonstrate that our method can be used in the real-world through these experiments. To verify that our method can detect the structural changes described above, we conducted experiments with artificial data and compared the AUC scores to those of the existing methods (PCA, ICA, and NMF). All of the experiments were conducted in an "online" setting. In other words, we fetched the time-series data yt for w times and decomposed the matrix ⟨yt , yt +1, . . . , yt +w −1⟩ into two matrices Mt , Nt and detected the change in Mt . Change point detection algorithms using PCA, ICA, and NMF are executed in the same manner as Algorithm 2, that is, by defining MatrixDecomp in Algorithm 2 as PCA, ICA, and NMF. For example, if we perform a dimensional reduction from Yt to Zt such that Zt = Mt Yt holds by PCA, then we calculate d2(Mt , Mt +1). In addition, if we perform a matrix decomposition by ICA or NMF such that Yt = Mt Nt holds, then we calculate d2(Mt , Mt +1). Note that if we perform N MF , we consider Yt + C instead of Yt where C is a constant such that Yt + C > 0 holds for all t . Table 3 summarizes the settings of these experiments.

In this experiment, we detected the following types of changes. (1) Mean of Factor Loadings’ Change

( A (t < t ∗), At =

A + W yt +1 = αt yt + βt 1 + γt ϵt To verify that our method can be used in the real world, we conducted experiments with two famous datasets in economics [ 18 ] and finance [ 17 ]. First, we calculated d2(Mt +1, Mt ) at each t . We then listed A points whose d2(Mt +1, Mt ) are in the top A. Finally, we clustered these A points into B points using k-means clustering. Note that the listing and clustering are performed by of-line settings, whereas our change-point detection method is an online method. Because no ground truth of change points exists in economic or financial data, we only consider true- and false-positive cases. In other words, we listed change points detected by our methods and checked whether a major event occurred in finance or economics.

5.2.1 US macroeconomic dataset. We analyzed the US representative macroeconomic dataset of 101 time series, collected monthly between 1959 and 2006 (T = 576), for the change points.

We listed 10 remarkable changes resulting from the aforementioned of-line listing method with P = 30, Q = 10 in Figure 5.2. Our change point detection methods clearly show the following major change points in the US economy. • the Great Moderation period that started in 1983 • major economic recession in early 1970s • major economic recession in early 1980s • Internet Bubble between 1999 and 2001 • the oil crisis at 1973 These change points are consistent with the existing research on business cycles [ 29 ]. Moreover, the Internet Bubble and the oil crisis are not detected in previous research in econometrics [ 5, 7, 9, 27 ].

5.2.2 US 48 Industry Portfolios. We analyzed the US 48 industry portfolio collected daily between 1926 and 2018 (T = 24350) for the change points [ 17 ]. We listed five most significant changes by the same clustering methods as Figure 5.2 in Figure 5.3. Our change point detection methods clearly show the following four major change points in the US stock market history: Wall Street Crash of 1929, Kennedy Slide of 1962, Black Monday of 1987, and ifnancial crisis of 2007. Moreover, we observed that the magnitudes of these change by the scores. 6

DISCUSSION

We demonstrated that our method can detect all three types of structural changes defined in Section 3. This is because Conjecture 4.5 is true to some degree. That is, by Proposition 4.1, we can generate the same latent time-series by combining the random time-series; further, by Proposition 4.2, we can detect true latent time series by Lasso. Table 4 shows that this method is much more accurate than the existing matrix decomposition methods. This is because our method was initially created to detect the structural changes that we defined, but the existing methods cannot detect these changes.

Figure 5.1 shows the L2 norm of the Mt and we can see that the change of the original mixing matrix At and the change of the sparse matrix Mt are highly correlated. Hence this figure supports our Conjecture 4.5.

However, we could not detect the change (6). This is because the distance we used does not consider the permutation of the row. In other words, we consider only ∥Mt ∥L2 at each time t . Therefore, we cannot detect changes in swaps of elements of At . We expect that our methods can detect these changes by choosing some of the row vectors and by performing our decomposition repeatedly. From the facts above, it is clear that our method can be applied to the detection of structural changes in economics and ifnance. As mentioned in the Introduction, econometrics is an advanced research field in handling the change points; therefore, we think that this method can be applied to the detection of structural changes in many fields. For example, neuroscience and network analysis have similar objectives to ours because they cannot detect the exact model of the time-series, and structural changes are important to them. 4.0 Mt fo3.0 m r o n 2L2.0

7 CONCLUSION

We herein proposed a new nonparametric change point detection method of the structures of multidimensional time series. We 0 20 40 60 80

time (a) Experiment (1) Mean of At ’s Change 0 20 40 60 80

time (b) Experiment (2) Variance of At ’s Change 0 20 40 60 80

time (c) Experiment (3) Skew and Kurtosis of At ’s Change used a large random matrix to generate the latent time series, and Lasso to obtain good row vectors and to select the related row vectors. We also demonstrated that our method could detect not only typical structural changes used in econometrics and ifnance but also new types of structural changes such as the changes in the higher moment of the mixing matrix. With the random matrix theory and statistical theory of Lasso, we partly unveiled the mechanism and theory of our methods. However, a conjecture that fully supports our methods remains unproven. We demonstrated the efectiveness of our change point detection techniques by artificial data and real-world data in economics and ifnance. Similar structural changes may occur in many fields, but we have allocated comparison with the state-of-the-art methods and specialization in other fields as future work.

ACKNOWLEDGMENT

We would like to thank Tomoki Komatsu, Takeru Miyato, Tomohiro Hayase, Kosuke Nakago, and Hirono Okamoto for insightful comments and discussion.

[1]

Mohiuddin

Ahmed , Abdun Naser Mahmood, and

Jiankun

Hu . 2016 . A survey of network anomaly detection techniques . Journal of Network and Computer Applications 60 (jan 2016 ), 19 - 31 . https://doi.org/10.1016/j.jnca. 2015 . 11 .016

[2] Donald

W. K.

Andrews . 1993 . Tests for Parameter Instability and Structural Change With Unknown Change Point . Econometrica 61 , 4 (jul 1993 ), 821 . https://doi.org/10.2307/2951764

[3]

Yasunori

Aoki , Ryouhei Ishii,

Roberto D.

Pascual-Marqui , Leonides Canuet, Shunichiro Ikeda, Masahiro Hata, Kaoru Imajo, Haruyasu Matsuzaki, Toshimitsu Musha,

Takashi

Asada ,

Masao

Iwase , and

Masatoshi

Takeda . 2015 . Detection of EEG-resting state independent networks by eLORETA-ICA method . Frontiers in Human Neuroscience 9 (feb 2015 ), 31 . https://doi.org/10.3389/ fnhum. 2015 .00031

[4] Reinaldo

Arellano-Valle , Luis M.

Castro , and Rosangela

Loschi . 2013 . Change Point Detection in The Skew-Normal Model Parameters . Communications in Statistics - Theory and Methods 42 , 4 (feb 2013 ), 603 - 618 . https://doi.org/10.1080/03610926. 2011 .611321

[5]

Jushan

Bai and

Serena

Ng . 2002 . Determining the Number of Factors in Approximate Factor Models . Econometrica 70 , 1 (jan 2002 ), 191 - 221 . https: //doi.org/10.1111/ 1468 - 0262 . 00273

[6]

Jushan

Bai and

Serena

Ng . 2008 . Large Dimensional Factor Analysis . Foundations and Trends® in Econometrics 3 , 2 ( 2008 ), 89 - 163 . https://doi.org/10.1561/ 0800000002

[7]

Matteo

Barigozzi , Haeran Cho, and

Piotr

Fryzlewicz . 2018 . Simultaneous multiple change-point and factor analysis for high-dimensional time series . Journal of Econometrics 206 , 1 (sep 2018 ), 187 - 225 . https://doi.org/10.1016/j. jeconom. 2018 . 05 .003

[8]

Vincent

Brault , Sarah Ouadah, Laure Sansonnet, and Céline Lévy-Leduc. 2018 . Nonparametric multiple change-point estimation for analyzing large Hi-C data matrices . Journal of Multivariate Analysis 165 (may 2018 ), 143 - 165 . https://doi.org/10.1016/j.jmva. 2017 . 12 .005

[9]

Jörg

Breitung and

Sandra

Eickmeier . 2011 . Testing for structural breaks in dynamic factor models . Journal of Econometrics 163 , 1 (jul 2011 ), 71 - 84 . https://doi.org/10.1016/j.jeconom. 2010 . 11 .008

[10]

E.J.

Candes and

Tao . 2005 . Decoding by Linear Programming . IEEE Transactions on Information Theory 51 , 12 (dec 2005 ), 4203 - 4215 . https: //doi.org/10.1109/tit. 2005 .858979

[11]

Emmanuel

Candes and

Terence

Tao . 2007 . The Dantzig selector: Statistical estimation when p is much larger than n . The Annals of Statistics 35 , 6 (dec 2007 ), 2313 - 2351 . https://doi.org/10.1214/009053606000001523

[12] Liang

Chen

Juan J.

Dolado , and

Jesús

Gonzalo . 2014 . Detecting big structural breaks in large factor models . Journal of Econometrics 180 , 1 (may 2014 ), 30 - 48 . https://doi.org/10.1016/j.jeconom. 2014 . 01 .006

[13] Ray-Bing

Chen

, Ying Chen, and Wolfgang

Härdle . 2014 . TVICA-Time varying independent component analysis and its application to financial data . Computational Statistics & Data Analysis 74 (jun 2014 ), 95 - 109 . https: //doi.org/10.1016/j.csda. 2014 . 01 .002

[14] Peter F Christofersen . 2001 . Forecasting Non-Stationary Economic Time Series . J. Amer. Statist. Assoc . 96 , 453 (mar 2001 ), 339 - 355 . https://doi.org/10. 1198/jasa. 2001 .s378

[15]

Cont . 2001 . Empirical properties of asset returns: stylized facts and statistical issues . Quantitative Finance 1 , 2 (feb 2001 ), 223 - 236 . https://doi.org/10.1080/ 713665670

[16]

Graham

Elliott and

Allan

Timmermann . 2008 . Economic Forecasting . Journal of Economic Literature 46 , 1 (feb 2008 ), 3 - 56 . https://doi.org/10.1257/jel.46. 1 . 3

[17] Kenneth

French . 2018 . Detail for 48 Industry Portfolios . http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/det_ 48_ind_port.html

[18]

Giampiero

Gallo . 2010 . The methodology and practice of econometrics by J . L. Castle;

Shephard . Journal of Economics 101 (01 2010 ), 99 - 101 . https: //doi.org/10.2307/41795703

[19]

Allan

Gibbard and Hal R. Varian and. 1978 . Economic Models . Journal of Philosophy 75 , 11 ( 1978 ), 664 - 677 . https://doi.org/10.5840/jphil1978751111

[20]

Han and

Atsushi

Inoue . 2014 . TESTS FOR PARAMETER INSTABILITY IN DYNAMIC FACTOR MODELS . Econometric Theory 31 (sep 2014 ), 1 - 36 . https://doi.org/10.1017/s0266466614000486

[21] Zaïd

Harchaoui

, Francis Bach, and

Éric

Moulines . 2008 . Kernel Changepoint Analysis . In Proceedings of the 21st International Conference on Neural Information Processing Systems (NIPS'08) . Curran Associates Inc., USA, 609 - 616 . http://dl.acm.org/citation.cfm?id= 2981780 . 2981856

[22] Guang-Bin

Huang

, Qin-Yu Zhu , and Chee-Kheong Siew . 2004 . Extreme learning machine: a new learning scheme of feedforward neural networks , In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541) . IEEE International Conference on Neural Networks - Conference Proceedings 2 , 985 - 990 vol. 2 . https://doi.org/10.1109/ijcnn. 2004 .1380068

[23]

Hyvärinen and

Oja . 2000 . Independent component analysis: algorithms and applications . Neural Networks 13 , 4 - 5 (jun 2000 ), 411 - 430 . https://doi. org/10.1016/s0893- 6080 ( 00 ) 00026 - 5

[24] Daichi

Kitamura

, Shinichi Mogami, Yoshiki Mitsui, Norihiro Takamune, Hiroshi Saruwatari, Nobutaka Ono,

Takahashi , and

Kazunobu

Kondo . 2018 . Generalized independent low-rank matrix analysis using heavy-tailed distributions for blind source separation . EURASIP Journal on Advances in Signal Processing 2018 , 1 (may 2018 ), 28 . https://doi.org/10.1186/s13634-018-0549-5

[25]

Jeremias

Knoblauch and

Theodoros

Damoulas . 2018 . Spatio-temporal Bayesian On-line Changepoint Detection with Model Selection . In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research) , Jennifer Dy and Andreas Krause (Eds.) , Vol. 80 . PMLR, Stockholmsmassan, Stockholm Sweden, 2718 - 2727 . http://proceedings.mlr. press/v80/knoblauch18a.html

[26]

Shuang

Li ,

Yao

Xie , Hanjun Dai, and

Song . 2015 . M-statistic for Kernel Change-point Detection . In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'15) . MIT Press, Cambridge, MA, USA, 3366 - 3374 . http://dl.acm.org/citation.cfm?id= 2969442 . 2969615

[27]

Shujie

Ma and

Liangjun

Su . 2018 . Estimation of large dimensional factor models with an unknown number of breaks . Journal of Econometrics 207 , 1 (nov 2018 ), 1 - 29 . https://doi.org/10.1016/j.jeconom. 2018 . 06 .019

[28]

David McLean and

Jefrey E.

Pontif . 2016 . Does Academic Research Destroy Stock Return Predictability? The Journal of Finance 71 , 1 (jan 2016 ), 5 - 32 . https://doi.org/10.1111/jofi.12365

[29] Inc . National Bureau of Economic Research . 2012 . US Business Cycle Expansions and Contractions . https://www.nber.org/cycles.html

[30]

Grace

Ngunkeng and

Wei

Ning . 2014 . Information Approach for the ChangePoint Detection in the Skew Normal Distribution and

Its

Applications . Sequential Analysis 33 , 4 (oct 2014 ), 475 - 490 . https://doi.org/10.1080/07474946. 2014 .961845

[31]

Wei

Ning and

A. K.

Gupta . 2009 . Change Point Analysis for Generalized Lambda Distribution . Communications in Statistics - Simulation and Computation 38 , 9 (oct 2009 ), 1789 - 1802 . https://doi.org/10.1080/03610910903125314

[32] Bradley

Paye and Allan

Timmermann . 2006 . Instability of return prediction models . Journal of Empirical Finance 13 , 3 (jun 2006 ), 274 - 315 . https://doi. org/10.1016/j.jempfin. 2005 . 11 .001

[33]

Mario

Pitsillis . 2005 . 1 - Review of literature on multifactor asset pricing models . In Linear Factor Models in Finance, John Knight and Stephen Satchell (Eds.) . Butterworth-Heinemann , Oxford, 1 - 11 . https://doi.org/10.1016/ B978-075066006-8. 50002 - 1

[34]

James

Stock and

Mark

Watson . 1994 . Evidence on Structural Instability in Macroeconomic Time Series Relations . Technical Report. National Bureau of Economic Research . https://doi.org/10.3386/t0164

[35]

Terence

Tao . 2008 . Random matrices: A general approach for the least singular value problem . https://terrytao.wordpress.com/ 2008 /05/22/ random-matrices -a-general-approach-for-the-least-singular-value-problem/

[36]

Terence

Tao and Van Vu . 2010 . Random Matrices: the Distribution of the Smallest Singular Values . Geometric and Functional Analysis 20 , 1 (mar 2010 ), 260 - 297 . https://doi.org/10.1007/s00039-010-0057-8

[37]

Jennifer

Thompson . 2017 . Smart beta funds pass $ 1tn in assets . https: //www.ft.com/content/bb0d1830-e56b - 11e7 - 8b99 - 0191e45377ec 2017 - 12 -27.

[38] Charles

Truong

, Laurent Oudre, and

Nicolas

Vayatis . 2018 . Selective review of ofline change point detection methods . arXiv:arXiv: 1801 .00718

[39] Stefan De Wachter and Elias Tzavalis . 2012 . Detection of structural breaks in linear dynamic panel data models . Computational Statistics & Data Analysis 56 , 11 (nov 2012 ), 3020 - 3034 . https://doi.org/10.1016/j.csda. 2012 . 02 .025

[40] Beilun

Wang

, arshdeep Sekhon, and

Yanjun

Qi . 2018 . Fast and Scalable Learning of Sparse Changes in High-Dimensional Gaussian Graphical Model Structure . In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research) , Amos Storkey and Fernando Perez-Cruz (Eds.) , Vol. 84 . PMLR, Playa

Blanca

, Lanzarote, Canary Islands, 1691 - 1700 . http://proceedings.mlr.press/v84/wang18f. html

[41] Wei

Xiao

, Xiaolin Huang, Jorge Silva, Saba Emrani, and

Arin

Chaudhuri . 2017 . Online Robust Principal Component Analysis with Change Point Detection . arXiv:arXiv:1702.05698

[42]

Ian

En-Hsu

Yen , Wei-Cheng

Lee , Sung-En

Chang , Arun Sai Suggala,

ShouDe

Lin , and

Pradeep

Ravikumar . 2017 . Latent Feature Lasso . In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research) , Doina Precup and Yee Whye Teh (Eds.) , Vol. 70 . PMLR, International Convention Centre, Sydney, Australia, 3949 - 3957 . http://proceedings.mlr.press/v70/yen17a.html