-

Data analysis of sunspot time series with SSA and HHT information adaptive methods

Nikolai T. Sa ullin

n.t.safiullin@urfu.ru 1

Sergey V. Porshnev

porshnev@mail.ru 1

Nathan I. Kleeorin

nat@bgu.ac.il 0 0 Ben-Gurion University of Negev , Beer-Sheva , Israel 1 Ural Federal University , Yekaterinburg , Russia

110 119

In the paper, characteristics of the monthly Wolf numbers time series data, containing information about individual solar cycles, are investigated by decomposition into components with two time series methods, namely, the Singular Spectrum Analysis (SSA) and HuangHilbert Transform (HHT). These methods do not require any a priori knowledge about analyzed data, making them information adaptive. As a result, some of the known cycles such as Schwabe-Wolf Cycle, Hale Cycle, Gleisberg Cycle, and Suess Cycle have been identi ed. These components and their properties are compared with each other, as well as with known characteristics of sunspot cycles.

data analysis information handling time series empirical mode decomposition singular spectrum analysis sunspot numbers

Solar activity de nes strength of solar ares, coronal mass ejections, amount of energetic accelerated particles, and power of solar electromagnetic eld. Moreover, it a ects valuable low orbit satellites and other sensitive instruments in space [ 1 ], as well as terrestrial climate [ 2 ] and people's health to some extent [ 3 ]. To measure value of the solar activity, the Wolf sunspot numbers R (relative Zurich sunspot numbers) are often used

R = k (10g + s) ; (1) where s is the number of isolated sunspots, g is the number of sunspot group, k is the observatory factor taking into account various observation conditions. At present, all the works of estimation and spreading the Wolf numbers from 1749 year are stored by the Royal Observatory of Belgium in Brussels [ 4 ]. Represented time series with averaged monthly Wolf sunspot numbers from January 1749 till January 2017 are shown in Fig. 1.

It is important to note that the main di culty with analysis of these time series is caused by its non-stationary characteristic [ 3 ]. This fact signi cantly reduces number of methods for analysis and forecasting the mentioned time series. (2) Majority of time series analysis methods (smoothing and regression methods, autoregressive models, etc.) are based on the a priori assumption of weak stationarity [ 6 ]. So, it would be more appropriate to apply to Wolf numbers the time series analysis and forecasting methods with no a priori assumptions (they are called the adaptive methods), such as Singular Spectrum Analysis [ 7 ] and novel Huang-Hilbert Transform [ 8, 9 ]. These methods decompose the initial data into components u (tj ) = n 1 X ci (tj ) + rn (tj ) ; i=1 where u (tj ) is the initial time series, ci (tj ) is the i-th component, rn (tj ) is the residual time series, at the j-th time point tj . Some of these components contain meaningful (from a physical viewpoint) time-and-frequency characteristics.

In this paper, we present the results of the comparative analysis of the Wolf sunspot time series examination by two methods: Singular Spectrum Analysis (SSA) and Huang-Hilbert Transform (HHT) that are also compared with contemporary astrophysical facts of the solar activity cycles. 2

Basic SSA algorithm

Following [ 7 ], we present a brief description of the basic SSA algorithm. Consider a real-valued time series F = (f0; :::; fN 1) of length N > 2. Assume that F is a non-zero series. So, it is usually assumed that fi = f (i t) for a certain function of time f (t) with some time interval t. Moreover, the numbers 0; :::; N 1 can be interpreted not only as discrete time instants but, also, as labels of any other sequentially ordered structure. The Basic SSA consists of two complementary stages: decomposition and reconstruction.

First stage: decomposition The 1st step. Embedding

The embedding procedure maps the original time series to a sequence of multidimensional lagged vectors. Let L be an integer (the window length), 1 < L < N . The embedding procedure forms K = N L + 1 lagged vectors Xi = (fi 1; :::; fi+L 2)T ; 1 i

K; which have dimension L. If we need to emphasize the dimension of the Xi, then we shall call them the L-lagged vectors. The L-trajectory matrix (or simply trajectory matrix) of the series F is 0 f0 f1 f2 ::: fK 1 1

B f1 f2 f3 ::: fK CC X = (xij )iL;j;K=1 = BB@BB f...2 f...3 f...4 .::.:. fK...+1 CCAC : fL 1 fL fL+1 ::: fN 1 (3) Obviously, xij = fi+j 2 and the matrix X has equal elements on the diagonals i+j = const. Thus, the trajectory matrix is a Hankel matrix. Certainly, if N and L are xed, then there is a one-to-one correspondence between the trajectory matrices and the time series.

The 2nd step. Singular value decomposition

The result of this step is the Singular Value Decomposition (SVD) of the trajectory matrix (3). Let S = XXT . Denote by 1; :::; L the eigen-values of S taken in the decreasing order of their magnitudes ( 1 ::: L 0) and by U1; :::; UL the orthonormal system of the eigen-vectors of the matrix S corresponding to these eigen-values.

Let d = max fi : i > 0g. If we denote Vi = XT Ui=p i; i = 1; :::; d, then the SVD of the trajectory matrix X can be written as

X = X1 + ::: + Xd; (4) where Xi = p iUiViT . The matrices Xi have rank 1; therefore, they are elementary matrices. The collection p i; Ui; Vi will be called the i-th eigen-triple of the SVD (4). 2.2

Second stage: reconstruction The 3rd step. Grouping

Once the expansion (4) has been obtained, the grouping procedure breaks down the set of indices f1; :::; dg into m disjoint subsets I1; :::; Im.

Let I = fi1; :::; ipg. Then the resultant matrix XI corresponding to the group I is de ned as XI = Xi1 + ::: + Xip . These matrices are computed for I = I1; :::; Im, and expansion (4) leads to a new decomposition. The procedure of choosing the sets I1; :::; Im is called the eigen-triple grouping.

The 4th step. Diagonal averaging

The last step in the Basic SSA transforms each matrix of the grouped decomposition XI = XI1 + ::: + XIm into a new series of length N .

Let Y be an L K matrix with elements yij , 1 i L, 1 j K. We set L = min (L; K) ; K = max (L; K) and N = L + K 1. Let yij = yij if L K and yij = yji otherwise. The diagonal averaging transfers the matrix Y to the series (g0; :::; gN 1) by the formula gk = 8 > > > > > > > < > > >> 1 >:>> N k k +11 mkP+=11 ym;k m+2; L1 mPL=1 ym;k m+2;

N K +1;

P m=k K +2 for 0 k < L

1; for L 1 k < K ; (5) ym;k m+2 for K k < N:

Expression (5) corresponds to averaging of matrix elements over the diagonals i + j = k + 2: the choice k = 0 gives g0 = y11, for k = 1 we have g1 = (y12 + y21) =2, and so on. Note that if the matrix Y is the trajectory matrix of some series (h0; :::; hN 1) (in other words, if Y is the Hankel matrix), then gi = hi for all i.

The diagonal averaging (5) applied to a resultant matrix XIk produces the series F~(k) = f~0(k); :::; f~N(k) 1 , and, therefore, the initial time series (f0; :::; fN 1) is decomposed into the sum of m series like in the additive model (2). 3 3.1

Huang-Hilbert transform and its modi cations

The

rst stage: empirical mode decomposition Using the Huang-Hilbert transform, components ci(t) in (2) are obtained by the ensemble empirical mode decomposition (EEMD) [ 9 ]. This procedure is a modi cation of the basic empirical mode decomposition (EMD) [ 8 ], which diagram is illustrated in Fig. 2.

The EMD main idea is based on the assumption that any data consist of di erent simple intrinsic modes of oscillation. These modes can be empirically estimated with enveloping curves (upper curve U and lower curve L) constructed by cubic splines passing through the maximum and minimum points of the time series. The arithmetical mean m of those curves U and L is removed from the analysed data. So, the received residual r is subjected to this enveloping procedure again until calculated residual r ts with a certain criteria and then it is noted as the found intrinsic component ci. By removing the found component ci from the initial time series u0(t), we obtain a new time series u(t) to decompose iteratively based on de nition (2). This decomposition is repeated until the residual rn(t) becomes a monotonic function.

The EEMD procedure [ 9 ] is a more complex version of EMD; a diagram for EEMD is presented in Fig. 3. In contrast to EMD, overall decomposition (2) is applied to the initial time series u0(t) not only once, but several NE (Ensemble Number) times. Also, in order to achieve the initial discrepancy of time series for the subsequent ensemble averaging procedure, the white Gaussian noise (wGn) with known signal-to-noise ratio SN E is added to the initial data u0(t).

The second stage: time-and-frequency characteristics

The decomposition is only the rst step in the HHT method (and to some extent in SSA as well). Time-and-frequency characteristics of received components that were found meaningful from a physical viewpoint are extremely signi cant in investigation of sunspot time series; for example, for comparison of the results with known aspects of the sunspot cycles (especially, periods) and in further analysis of non-stationary time series and forecasting of the Wolf numbers.

On the second stage of the HHT method, a concept of analytical signal is used to calculate time-and-frequency characteristics of data. This idea was introduced in 1946 by Gabor [ 5 ]. Analytical signal allows one to introduce simple and unambiguous de nitions for amplitude a(t), phase ' (t), and instantaneous frequency ! (t) of signal u(t).

Based on [ 5 ], the analytical signal w(t) is de ned as

w (t) = u (t) + jv (t) = a (t) ej'(t); where v(t) is the signal conjugated to the initial data u(t) with Hilbert transform v (t) =

+1 1 Z 1 u ( ) t d : De nition (6) allows one to estimate naturally the amplitude a(t) for the signal u(t)

a (t) = jw (t)j = pu2 (t) + v2 (t); phase ' (t) ' (t) = arctan v (t) u (t) = arccos

= arcsin u (t) a (t) v (t) a (t) ; and instantaneous frequency ! (t) ! (t) = '_ (t) = u (t) v_ (t) u_ (t) v (t) a2 (t) :

The same method (6 { 10) can be used to calculate time-and-frequency characteristics of components received by the SSA decomposition and reconstruction. Note that in the case of the monthly sunspot numbers (1), we are more interested not in the instantaneous frequency ! (t), but in the function of time for the instantaneous period T (t) = ! 1 (t). (6) (7) (8) (9) (10)

Analysis of results for the monthly numbers Wolf sunspot

With decomposition of the initial Wolf sunspot numbers data (Fig. 1) by two methods (SSA and EEMD), we received ve signi cant (from physical point of view) components (Fig. 4).

In the SSA method, we used the dimension window L = 1592 in accordance with recommendations from [ 7 ]. Grouping was made with eigen-triples 1, 2{5, 6{ 7, 16{17, 18{19. Reconstructed data justify 84.09% of dispersion from the initial time series. In the HHT method, we used NE = 100, SN E = 11dB in accordance with technique for the Huang-Hilbert transform [ 9 ].

From Figure 4 it is seen that 1. it is actually possible to decompose non-stationary Wolf sunspot numbers' time series into physically meaningful components by the SSA and HHT methods without any a priori information; 2. corresponding components found by two di erent adaptive methods are visually close to each other; 3. with increasing order number of components, their frequency domain is decreasing; 4. the fth component c5 represents trend for the monthly sunspot numbers; 5. all found components are amplitude-modulated and frequency-modulated signals; so, we need to calculate their time-and-frequency characteristics for their detailed analysis;

Thus, for the received components, we calculated time-and-frequency characteristics, such as instantaneous period T (t) = ! 1 (t) by equations (6 { 10). Obtained results of instantaneous period were subjected to a smoothing at ve points to reduce noise e ects; nal results are summarized in Table 1. The wellknown cycles are also shown in this table. These corresponding cycles were taken from contemporary astrophysical facts of solar activity cycles.

SSA

HHT

E Fig. 4. Components received by decomposition with the HHT (left) and SSA (right) methods

From Fig. 4 and Table 1 some facts can be noted. 1. Instantaneous periods are more distinguished between two methods, because equation (10) uses di erentiation procedure. 2. Nevertheless, the mean value and variance (to some extent) of periods for the rst four components are quite close to each other. Last component c5 is a trend; thus, its time-and-frequency characteristics are unreliable due to insu cient time interval of time series analysis. 3. The rst component actually represents the 11-year Schwabe-Wolf Cycle and shows a tendency to oat between 7 and 15 years. This fact is well-known in the solar activity astrophysics [1{3]. 4. The second component corresponds to the 22-year Hale Cycle. This is the most inaccurate component received by those two methods. This fact can be explained by multiplicity of this cycle with 11-year cycle, which leads to inadequate decomposition by adaptive methods that are evidently weak in the case of multiple frequencies. 5. The third component represents the known Half-Century Cycle, which is noted not to be valuable [ 3 ], but often found by other methods of time series analysis [ 6 ]. 6. The fourth component corresponds to the century Gleisberg Cycle and shows a tendency to oat between 80 and 120 years (in the HHT method). 7. The latter component represents a trend that can be compared with the known Suess Cycle (period is more than 200 years). Facts about this cycle are not estimated from the solar activity, but from total irradiance research of the terrestrial radio-isotopes [ 3 ], hence, it is di cult to compare the results. 8. Distortion in instantaneous period of components c1; c3; c5 next to 1790{1820 years corresponds to the known Dalton Minimum [ 3 ]. 5

Conclusion

In the paper, comparative analysis was performed for components received by decomposition of Wolf sunspot numbers with the SSA and HHT information adaptive methods. It is possible to make the following conclusions. 1. Both methods (SSA and HHT) can be successfully used in non-stationary time series analysis to decompose the initial data into components with meaningful (from physical viewpoint) properties. 2. Instantaneous periods received by the Hilbert transform after decomposition can be compared with the well-known facts about time-and-frequency characteristics of the solar activity. 3. Distinctions between components calculated by the SSA and HHT methods are connected with di erent approaches in their basis: the SSA has a more theoretical approach with orthogonal dimensions in its core, while the HHT is clearly the empirical method based on extrema points.

We expect that this information can be helpful in further time series data analysis techniques and, especially, in the forecast of solar activity by the contemporary informational technologies.

1. Pulkkinen , T. : Space Weather . Living Rev. Solar Phys . 4 ( 1 ), ( 2007 )

2. Haigh , J.: The sun and the earths climate . Living Rev. Solar Phys . 4 ( 2 ), ( 2007 )

3. Hathaway , D. : The solar cycle . Living Rev. Solar Phys . 7 ( 1 ), ( 2010 )

SILSO

World Data Center: The international sunspot number . http://sidc.be/silso/data les

5. Gabor , D. : Theory of communication . Jour. Inst. Elect. Eng . 93 , 429 { 457 ( 1946 )

6. Box , G. & Jenkins , G. : Time Series Analysis: Forecasting and Control . Prentice Hall, 3rd edition ( 1994 )

7. Golyandina , N. : Analysis of Time Series Structure: SSA and Related Techniques . Boca Raton: Chapman & Hall/CRC ( 2001 )

8. Huang , N. & Shen , Z. , et al.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis . Proc. R. SOC. London, Ser. A . 454 , 903 { 995 ( 1998 )

9. Huang , N.: Ensemble Empirical Mode Decomposition: a noise assisted data analysis method . Adv. in Adaptive Data Analysis . 1 ( 1 ), 1 { 41 ( 2008 )