Ratio Test for Changes Heavy Index Based on M- Estimation under the Background of Big Data Meiting Liu*, Hao Jin, Yangyi Cheng, Zini Wang Department of Sciences, Xi’an University of Science and Technology, Xi’an, Shaanxi, China Abstract With the development of the big data age, the demand of modern emerging disciplines for the fine quantitative tail characteristics of financial data analysis is constantly developing, so the research method of tail index change point in the thick tail sequence is particularly important. In this paper, we extend the Ratio statistic proposed by Kim to the test of change point of tail index in infinite variance observations. The null distribution of the statistic and its consistency under alternative hypothesis were obtained. Prior with the least squares estimation method used in previous articles,we used robust M estimation to estimate unknown parameter ,making parameter estimation in the model more accurate.The Monte Carlo numerical simulation shows that our test works well. Keywords Heavy-Tailed, Tail Index, Ratio Test, M-Estimation, Big Data 1. Introduction 1 In the age of big data, pre-detecting structural changes allows us to better interpret data, predict data more accurately and avoid risk. Therefore, the study of structural change points has attracted wide attention from many scholars. Tail index change point is one of the core contens of change point research, which is widely used in pratical life. Such as finance, hydrology, communication engineering and other fields. Several tail index change-point tests have been proposed in the past decade. Quintos, Fan, and Phllips [1] employed three tests to verify tail index change-points in independent samples and ARCH models. And later Kim and Lee [4] studied the tail index change-point test based on autoregressive residuals. Reasonable estimates of the parameters in the models are also momentous. The least squares estimation is the most commonly used point estimation method in parameter estimation, but data with anomalies or strong influence points are often encountered in pratical problems, while the least-squares estimator is sensitive to the emergence of outliers in the data. Therefore, in order to eliminate or reduce the effect of outliers, The estimator is required to have a robustness. While M estimation is a commonly used class of estimates in current robust estimation methods, and there have been a series of extensive studies on M estimation. Richard, Keight and Liu [10] used M estimation to study the estimates of autogressive parameters in heavy-tailed sequences, and later Keith Knight proposed the limit theory of M estimation in heavy-tailed sequences. In this paper, We employ robust M estimation to estimate unkown parameters in the model and obtain the asymptotic behaviour of the estimator. To define the test statistic, we cite the test statistic based on detecting change point in the persistence by Kim[8] to test the change point of heavy index. This paper is arranged as follows. Section 2 we show the model and assumptions. Section 3 represent test statistic and asymptotic distribution. Section 4 the Monte Carlo simulation is made.Section 5 summarize this paper. ICBASE2022@3rd International Conference on Big Data & Artificial Intelligence & Software Engineering, October 21-23, 2022, Guangzhou, China * Corresponding author: Lmt137341@126.com (Meiting Liu) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 163 2. Model and Assumption Our model is presented as follows: y = u + ε , t = 1,2, ⋯ T (1) where μ is unknow parameter and we employ the M estimation to estimate μ. This paper is based on the following assumption: (𝐴 ){ε }is an i.i.d. sequence of r.v.’s with E(ε ) = ∞which are in the domain of attraction of a stable law of order α ∈ (0,2].The normal distribution corresponds to α = 2. (𝐴 )ρ = ψ(. );E ψ(η ) = 0;E(|ψ(η )| ) < ∞,for some γ > 1. (𝐴 )ψ (. ) is Lipschitz continuous; 0 < E ψ (η ) < ∞. Lemma.1. if the assumption (A ) holds, then a ∑[ ] ε ,a ∑[ ] ε ⇒ U (r), (dU ) (2) Where a = T / L(T),for slowly varying function, the U (r) is a standard stable process with | | index k and the characteristic function of U (1) has the form e where Γ(1 − k) cos( πk/2), k ≠ 1 c= (3) π/2, k = 1 Moreover, U (r) =r / U (1). Remark 1. Specific reference Phillips [13]. Lemma.2. If assumption (A ) − (A ) holds, then ( ) T / (μ − μ) → ( , t = 1,2, ⋯ [Tτ] (4) ⋅ ( )) ( ) ( ) T / (μ − μ) → ( )⋅ ( ( )) , t = [Tτ] + 1, [Tτ] + 2, ⋯ T (5) where μ , μ represents M estimation based on respectively y , y , ⋯ y[ ] and y[ ] ,⋯,y . B(τ) is Brownian motion. Proof of Lemma.2. We consider estimates μ defined by minimizing ∑ ρ(y − μ). Define the process [ ] Z (v) = ∑ ρ ε + vT − ρ (ε ) , and note that v which minimizes Z is simply T (μ − μ) .Then we using the Taylor series expansion of each summand of Z around v = 0,we get [ ] [ ] Z (v) = T v ∑ ψ(ε ) + T v ∑ ψ (ε∗ ), where ε∗ lies between ε and ε + T v.Using the fact that ψ is Lipschitz-continuous|ψ (ε ) − ψ (ε∗ )| ≤ K T v , and reference Keith Knight[11],we can get T ∑[ ] ψ(η ) → B(τ),T ∑ [ ] ψ(η ) → B(1) − B(τ) . [ ] T v |ψ (ε ) − ψ (ε∗ )| ≤ v T ⋅ [Tτ] ⋅ k T v →0 To see this ,note that sup T ∑[ ] ψ (ε ) − E(ψ (ε )) → 0, Thus, the finite dimensional distributions of Z (⋅)converge weakly to those of Z(⋅), ( ) where Z(v) = vB(τ) + v τE(ψ (ε )),let Z (v) = 0,we get v = T / (μ − μ) → , ⋅ ( ( )) ( ) ( ) In the same way, t = Tτ + 1, Tτ + 2, ⋯ T, v = T / (μ − μ) → ( )⋅ ( ( )) . specific reference Keith Knight[10]. Now,we take into account the following null hypothesis and alternative hypothesis: 164 H :k is constant H :k = k I{ ∗ } + k I{ ∗} 3. Test Statistic and Asymptotic Distribution We consider the Ratio statistic presented by Kim: ([ ] ) ∑ [ ] ∑ [ ] , Ξ (τ) = [ ] (6) [ ] ∑ ∑ , where ε , and ε , represents M estimation residuals based on respectively y , y , ⋯ y[ ] and y[ ] , ⋯ , y .According to ratio statistic, a Maximum Chow-type test is proposed Ξ = maxΞ (τ), T ∈ (0,1) (7) ∈ Theorem 3.1 Suppose the assumption A − A holds, then under the null hypothesis, ( ) ( ) Ξ (τ) ⇒ (8) ( ) And Ξ ⇒ Ξ (τ),G (r) = U (r),r ∈ (0, τ);G (r − τ) = U (r) − U (τ),r ∈ (τ, 1) Proof of Theorem 3.1 According to Lemma.1, Lemma.2, let t = Tr ,we have a ∑ ε , = a ∑ (ε − (u − u)) ⇒ U (r) = G (r),so [τT] ∑[ ] ∑ ε , ⇒ τ G (r). Similarly, a ∑ ε , =a ∑ (ε − (u − u)) ⇒ U (r − τ) = G (r − τ). Finally, we get the limit distribution of the statistics unde the null hypothesis (1 − τ) G (r − τ) dr Ξ (τ) ⇒ τ G (r) dr Theorem 3.2 If assumption A − A holds, and τ∗ is the break time, then under the alternative hypothesis, (1)k > k and 0 < τ ≤ τ∗ ,we have Ξ = maxΞ (τ) = O ( ), As T → ∞,Ξ → ∞ (9) ∈ For τ∗ < τ < 1,we have Ξ = maxΞ (τ) = O (1). ∈ (2) k < k and 0 < 𝜏 ≤ 𝜏 ∗ , we have Ξ = maxΞ (τ) = O (1), ∈ ∗ For τ < τ < 1,we have Ξ = maxΞ (τ) = O ( ) → ∞,As T → ∞,Ξ →∞ (10) ∈ a = T / L(T),a = T / L(T),L is slowly function. Proof of Theorem 3.2 We first consider the case (i) k > k ,0 < τ ≤ τ∗ ,similar to Proof of Theorem 3.1, the denominator [ ] is [τT] ∑ ∑ ε , = O (a ).The numerator is ∗ ([1 − τ]T) ∑[ [ ] ] ∑ [ ] ε , +∑ [ ∗ ] ∑ [ ] ε, =O a + O ( a ). When k > k ,Ξ (τ) = O ( ) → ∞.The case τ∗ < τ < 1, ∗ the denominator is [τT] ∑[ ] ∑ ε , +∑ [ ] [ ∗ ] ∑ ε, = O (a ) + O (a ), and the numerator is ([1 − τ]T) ∑ [ ] ∑ [ ] ε , = O (a ). 165 Above all,Ξ = maxΞ (τ) = O ( ).As T → ∞, Ξ → ∞. ∈ Then the case (ii) k < k ,similar to the case (i), The proof is omitted . Theorem 3.3 If Aussmption A − A holds, then under the alternative hypothesis, if k ≠ k ,we have Ξ ∗ = max Ξ , Ξ , As T → ∞, Ξ ∗ → ∞. (11) 4.Monte Carlo simulation This section we use the Monte Carlo numerical simulation method to verify the effectiveness of our Ratio test.We obtain the empirical level value and empirical potential function value . Consider the following data generation process: 𝑦 = 𝑢 + 𝜀 , t = 1,2, … T . The sample size is T = 200,500,1000 . heavy-tailed index {κ = 0.6, 0.8,0.9 ,1.6,1.8,1.9} .The test was repeated 2000 times and significance level α = 0.05.The original statistic and the inverted statistic are identically distributed, so we only give the critical value of the original statistic. Table 1 Critical values of Maximum-Chow k 0.6 0.8 0.9 1.6 1.8 1.9 T=200 Ξ (𝜏) 70.6102 64.2800 55.7735 18.3333 16.1118 14.4291 T=500 Ξ (𝜏) 70.3438 60.0030 59.5982 14.8721 18.5255 13.9622 T=1000 Ξ (𝜏) 71.2121 68.7985 52.4784 17.0133 18.4640 13.9873 Table 2 Empirical size of Maximum-Chow k 0.6 0.8 0.9 1.6 1.8 1.9 T=200 Ξ (𝜏) 0.0470 0.0500 0.0510 0.0470 0.0500 0.0500 T=500 Ξ (𝜏) 0.0490 0.0510 0.0500 0.0460 0.0510 0.0520 T=1000 Ξ (𝜏) 0.0498 0.0499 0.0510 0.0510 0.0490 0.0486 Table 3 power experience of Maximum-Chow T k →k 𝜏 = 0.3 𝜏 = 0.5 𝜏 = 0.7 𝑘 →𝑘 𝜏 = 0.3 𝜏 = 0.5 𝜏 = 0.7 200 1.9 → 1.8 0.2260 0.2050 0.2720 0.9 → 0.8 0.2230 0.2040 0.2300 1.9 → 1.6 0.2460 0.2640 0.3660 0.9 → 0.6 0.2240 0.2620 0.2940 1.8 → 1.6 0.2090 0.2130 0.2330 0.8 → 0.6 0.2050 0.2110 0.2290 1.8 → 1.5 0.2150 0.2220 0.2450 0.8 → 0.5 0.2120 0.2260 0.2330 500 1.9 → 1.8 0.2320 0.2480 0.2820 0.9 → 0.8 0.2130 0.2210 0.2200 1.9 → 1.6 0.2560 0.2670 0.3690 0.9 → 0.6 0.2330 0.2580 0.2850 1.8 → 1.6 0.2360 0.2540 0.2940 0.8 → 0.6 0.2540 0.2590 0.2860 1.8 → 1.5 0.2680 0.2720 0.2950 0.8 → 0.5 0.2580 0.2620 0.2880 1000 1.9 → 1.8 0.2350 0.2890 0.2990 0.9 → 0.8 0.2230 0.2520 0.2610 1.9 → 1.6 0.2590 0.2690 0.3740 0.9 → 0.6 0.2600 0.2640 0.2680 1.8 → 1.6 0.2660 0.2670 0.2950 0.8 → 0.6 0.2490 0.2540 0.2640 1.8 → 1.5 0.2690 0.2900 0.2990 0.8 → 0.5 0.2500 0.2620 0.2980 166 Table 1 is the critical value of the statistic,we can see that with the increase of the sample size, the critical value gradually stabilizes, Table 2 is the rejection rate of the statistic under the null hypothesis, and the empirical size is close to the significance level 5%. Table 3 is the empirical powers under the alternative hypothesis, we consider τ = 0.3,0.5,0.7,from the table we can get that :power increases as the jump amplitude increases; when the jump amplitude is constant, power decreases as τ increases and as the sample size increases,because the larger the sample size T, the more dispersed the statistics are.In general, the larger the sample size, the greater the jump amplitude, and the smaller the τ, the better the statistical test effect. Furthermore k < k ,We can get the same results.For reasons of space, I will not repeat them here. 5.Conclusions In this paper, we investigate the heavy index change point in thick tail sequence, and this theme is closely related to the big data setting and financial market. we adopt M-estimate to estimate unknow parameter, and absort to the statistic proposed by Kim, obtained the limit distribution of the statistics unde the null hypothesis and its consistency under alternative hypothesis.Numerical simulations verify that our statistics work very well. 6.References [1] Quintos, C., Fan, Z., & Phillips, P. C. B. (2001). Structural change tests in tail behaviour and the Asian crisis. Review of Economic Studies, 68, 633–663. [2] Relevant parameter changes in structural break models. Journal of Econometrics,2020,217(1): 46-78. [3] Peluso, S., S. Chib, and A. Mira. 2019. “Semiparametric Multivariate and Multiple Change-point Modelling.” Bayesian Analysis 14 (3): 727–751. [4] Kim, M. & S. Lee (2012) Change point test of tail index for autoregressive processes. Journal of the Korean Statistical Society 41, 305–312. [5] Page, E.S. Continue inspection schemes, Biometrika, 1954, 42(1): 100-114. [6] Chen J, A. K. Gupta. Parametric Statistical Change Point Analysis. Boston: Birkha-user,2000. [7] Kim, J.Y., Detection of change in persistence of a linear time series. Journal of Econometrics, 2000, 95:97-116. [8] Kim, J.Y., Corrigendum to“Detection of change in persistence of a linear time series”. Journal of Econometrics, 2002, 109: 389-392. [9] GAMAGE, RAMADHA D. PIYADI, NING, WEI. Empirical likelihood for change point detection in autoregressive models. 2021(1). [10] Davis Richard A., Knight Keith, Liu Jian. M-estimation for autoregressions with infinite variance. Stochastic Processes and their Applications,1992,40(1): [11] Keith Knight. Limit Theory for M-Estimates in an Integrated Infinite Variance Process. Econometric Theory,1991,7(2): [12] Maryam Sohrabi, Mahmoud Zarepour. Asymptotic theory for M-estimates in unstable AR( p ) processes with infinite variance innovations. Journal of Statistical Planning and Inference,2019,198: [13] Phillips P C B 1990 Econometrica Theory 6 44-62. 167