59 Time Series Forecasting with Metric Analysis Approach Victor V. Ivanov†∗ , Alexander V. Kryanev∗† , Leonid A. Sevastianov‡† , David K. Udumyan∗‡§ ∗ National Research Nuclear University “MEPhI” † Joint Institute for Nuclear Research ‡ People’ Friendship University of Russia (RUDN University) § University of Miam Email: ivanov@jinr.ru, a_v_kryanev@mtu-net.ru, sevastianov_la@rudn.university, mathudum@gmail.com Time series forecasting scheme based on the metric analysis approach is presented. This approach provides preliminary filtering of noisy components with the help of singular-spectral analysis. The scheme uses an auto regression model in which the predicted value is considered as a function of the m previous values of the filtered values of this series. Thus, the auto regression model reduces the prediction problem to the problem of nonlinear interpolation of a function of several variables that are the smoothed (filtered) values of the original chaotic time series. To solve the problem of interpolation of such functions, the article uses metric analysis, which makes it possible to reveal the patterns of auto regression dependence in the deterministic component of the investigated chaotic time process. In classical interpolation schemes, the interpolation function is recovered immediately in the entire region of independent variables under consideration. In the metric analysis, the interpolation values are restored separately at each point, taking into account the location of the point at which the value of the function is restored, with respect to the location of the points at which the values of the function are given. Therefore, metric analysis allows to take into account individual features of the location of points in which the values of the function are restored directly in the interpolation formula, which makes it possible to obtain a more accurate result in the absence of additional information about the function. The article reviews time series from various areas, such as stock prices, sales volumes, passenger traffic volumes in the metro, and electricity consumption in the Moscow region. The presented examples demonstrate the effectiveness of the presented scheme. In particular, it is shown that with the help of the presented scheme it is possible to predict the dynamics trend of the time chaotic time series under investigation with acceptable accuracy by several tens of time steps in advance. The accuracy of the forecast largely depends on the choice of the dimension of the forecast model — the number m of previous values of this auto regression series.The scheme allows one to select an optimal value of the dimension of the forecasting model, which, on average, provides the best accuracy for prediction. The publication was partially supported by the Ministry of Education and Science of the Russian Federation (the Agreement number 02.A03.21.0008). Key words and phrases: ime series, auto regression model, interpolation of functions of many variables, metric analysis, prediction scheme, forecasting examples . 1. Introduction One of the main problems of data processing in many areas is the problem of pre- dicting the values of time processes. To date, many different methods and schemes have been developed which solve various particular problems of forecasting time pro- cesses [1, 2]. Below there is a brief description of the metric analysis of predicting the values of time series [3–5] and its application for the prediction of specific time series [6]. Copyright © 2017 for the individual papers by the papers’ authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. In: K. E. Samouilov, L. A. Sevastianov, D. S. Kulyabov (eds.): Selected Papers of the VII Conference “Information and Telecommunication Technologies and Mathematical Modeling of High-Tech Systems”, Moscow, Russia, 24-Apr-2017, published at http://ceur-ws.org 60 ITTMM—2017 2. Forecasting Scheme The prediction scheme uses the interpolation method for the functional dependence: ~ Y = F (X1 , . . . , Xm ) = F (X), where the function F (X) ~ is unknown and is the subject for recovery, either at one point X~ ∗ or in a set of given points on the basis of known values of the function Yk , ~ k = (Xk1 , . . . , Xkm )T [3]. k = 1, . . . , n, at fixed points X According to the method of interpolation, based on metric analysis, interpolation values are found as solutions of problems to a minimum according to ~ z = (z1 , . . . , zn )T the measure of metric uncertainty [3–5] 2 ∗ ~ ∗; X ~ 1; . . . ; X ~ n )~ σN D (Y ; ~ z ) = (W (X z, ~ z ), and the interpolation value is determined by a linear combination n X Y∗ = z i Yi , i=1 and is given by ~ , ~1) (W −1 Y Y∗ = . (W −1~1, ~1) The matrix of the metric W uncertainty is defined by         ρ2 X ~ 1, X ~∗ ~ 1, X X ~2 ... ~ 1, X X ~n    w ~  w ~   w ~   X ~ 2, X ~1 ρ2 X ~ 2, X ~∗ ... ~ 2, X X ~n  W = ~ ,   w ~ w ~ w   ...  . . . . . . . . .        ~ Xn , X1~ ~ Xn , X2 ~ 2 ~ . . . ρ Xn , X , ~ ∗ w ~ w ~   m ~ i, X ~∗ = 2 2 X wk Xik − Xk∗ , P  where ρw ~ k=1   m X ~ i, X ~j wk (Xik − Xk∗ ) · Xjk − Xk∗ ,  X = i, j = 1, . . . , n. w ~ k=1 Consider the function of time ty = f (t) with values Y1 = f (t1 ), . . . , Yn = f (tn ) for t1 < . . . < tn ∈ [t1 , tn ]. It is required to find the predicted value yn+1 for tn+1 . The problem of finding the predicted value yn+1 is reduced to the problem of in- terpolation of functions of several variables by means of a nonlinear autoregressive model [3–5] y(tm+1 ) = ym+1 = F (y1 , . . . , ym ), y(tm+2 ) = ym+2 = F (y2 , . . . , ym+1 ), ................................................. y(tN ) = yN = F (yN −m , . . . , yN −1 ). Ivanov Victor V. et al. 61 Then the prediction of the function y = f (t) is reduced to interpolating the function of m variables Y = F (y1 , y2 , . . . , ym ) with values in n − m points X~ 1 = (Y1 , . . . , Ym )T , ~ X2 = (Y2 , . . . , Ym+1 )T , ............................ X~ n−m = (Yn−m , . . . , Yn−1 )T . The predicted value yfor = yn+1 is defined as the interpolation value of the function Y = F (y1 , y2 , . . . , ym ) at the point X ~ ∗: ~) (W −1~1, Y ~ ∗) = yn+1 = F (X , (W 1, ~1) −1 ~ where, X~ ∗ = (Yn−m+1 , . . . , Yn )T , W −1 is the inverse matrix to the (n − m) × (n − m) matrix of metric uncertainty, Y ~ = (Ym+1 , . . . , Yn )T is the (n − m)-dimensional vector of the values of the predicted time process. The natural number m determines the dimensionality of the space of vectors X ~ and its value is found as the solution of the extremal problem [3–5] ~ −Y m = argminkY ~for k. In the scheme proposed in this paper, preliminary filtering (trend isolation) of the original time series is used with the help of singular-spectral analysis [1]. 3. Numerical Results Fig. 1 shows one example of forecasting the daily closing prices for a company’s shares using the metric analysis scheme. Fig. 2 shows the forecast of the sums of one-time shoe store sales (data provided by A. Khokhlov). Figs. 3 and 4 show the forecasting of the daily passenger traffic (in thousands of passengers) on the Moscow metro during various periods of 2014 (the source of the data is the Moscow Metro) (data provided by E. Osetrov, see also [6]). Figs. 5 and 6 show the forecasting the daily electricity consumption (in billion kilowatt-hours) in the Moscow region (Moscow and Moscow region) only on working days during different periods of 2014 (data source — System Operator of the Unified Energy Systems / JSC “SO UES” branch of the Moscow Regional Dispatch Office) (data provided by E. Osetrov, see also [6]). 4. Conclusion The forecasting scheme presented in this article allows one to predict the trend dynamics of chaotic time series under analysis. The obtained numerical results of pre- diction with respect to time processes taken from various fields show that the presented scheme yields acceptable results in the accuracy of the forecast. 62 ITTMM—2017 5 x 10 4 3.5 3 2.5 2 1.5 0 20 40 60 80 100 120 Figure 1. Forecasting 20 steps ahead. The continuous line is original row data, the dashed line is the filtered component, the solid line is the forecast (the optimal value of m is 20) 3000 2500 2000 1500 1000 500 0 0 10 20 30 40 Figure 2. Forecasting 10 steps ahead. The continuous line is the original row data, the dashed line is the filtered component, the solid line is the forecast (the optimal value of m is 10) References 1. N. Golyandina, V. Nekrutkin, A. Zhigljavsky. Analysis of Time Series Structure. SSA and Related Techniques. Chapman & Hall / CRS, 2001. 2. A. V. Kryanev, G. V Lukin. Mathematical methods for processing indeterminate data. Leningrad: Nauka, 2006 [in Russian]. 3. A. V. Kryanev, D. K. Udumyan, Metric Analysis, Properties and Applications as a Tool for Forecasting. International Journal of Mathematical Analysis (2014), Vol. 8, no. 60, pp. 2971–2978. 4. V. V. Ivanov, A. V. Kryanev, D. K. Udumyan, G. V. Lukin, Metric Analysis Ap- proach for Interpolation and Forecasting of Time, Processes. Applied Mathematical Ivanov Victor V. et al. 63 10000 9000 8000 7000 6000 0 50 100 150 200 250 Figure 3. Forecasting 30 steps ahead. The continuous line is the original row data, the dashed line is the filtered component, the solid line is the forecast (the optimal value of m is 20) 9000 8500 8000 7500 7000 6500 6000 0 20 40 60 80 100 120 Figure 4. Forecasting 40 steps ahead. The continuous line is the original row data, the dashed line is the filtered component, the solid line is the forecast (the optimal value of m is 30) Sciences, Vol. 8, 2014, no. 22, pp. 1053–1060. 5. A. V. Kryanev, G. V. Lukin, D. K. Udumyan, Metric Analysis and data processing. Leningrad: Nauka, 2012 [in Russian]. 6. V. V. Ivanov, A. V. Kryanev, E. S. Osetrov, Forecasting daily electricity consump- tion in the Moscow region using artificial neural networks, Physics of Particles and Nuclei Letters (2017), Issue 2. 64 ITTMM—2017 5 x 10 3.1 3 2.9 2.8 2.7 2.6 2.5 2.4 0 20 40 60 80 Figure 5. Forecasting 20 steps ahead. The continuous line is the original row data, the dashed line is the filtered component, the solid line is the forecast (the optimal value of m is 10) 5 x 10 3.5 3 2.5 2 0 50 100 150 200 250 Figure 6. Forecasting 50 steps ahead. The continuous line is the original row data, the dashed line is the filtered component, the solid line is the forecast (the optimal value of m is 12)