<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data analysis of sunspot time series with SSA and HHT information adaptive methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nikolai T. Sa ullin</string-name>
          <email>n.t.safiullin@urfu.ru</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergey V. Porshnev</string-name>
          <email>porshnev@mail.ru</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nathan I. Kleeorin</string-name>
          <email>nat@bgu.ac.il</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ben-Gurion University of Negev</institution>
          ,
          <addr-line>Beer-Sheva</addr-line>
          ,
          <country country="IL">Israel</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ural Federal University</institution>
          ,
          <addr-line>Yekaterinburg</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>110</fpage>
      <lpage>119</lpage>
      <abstract>
        <p>In the paper, characteristics of the monthly Wolf numbers time series data, containing information about individual solar cycles, are investigated by decomposition into components with two time series methods, namely, the Singular Spectrum Analysis (SSA) and HuangHilbert Transform (HHT). These methods do not require any a priori knowledge about analyzed data, making them information adaptive. As a result, some of the known cycles such as Schwabe-Wolf Cycle, Hale Cycle, Gleisberg Cycle, and Suess Cycle have been identi ed. These components and their properties are compared with each other, as well as with known characteristics of sunspot cycles.</p>
      </abstract>
      <kwd-group>
        <kwd>data analysis</kwd>
        <kwd>information handling</kwd>
        <kwd>time series</kwd>
        <kwd>empirical mode decomposition</kwd>
        <kwd>singular spectrum analysis</kwd>
        <kwd>sunspot numbers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Solar activity de nes strength of solar ares, coronal mass ejections, amount of
energetic accelerated particles, and power of solar electromagnetic eld.
Moreover, it a ects valuable low orbit satellites and other sensitive instruments in
space [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], as well as terrestrial climate [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and people's health to some extent [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
To measure value of the solar activity, the Wolf sunspot numbers R (relative
Zurich sunspot numbers) are often used
      </p>
      <p>
        R = k (10g + s) ;
(1)
where s is the number of isolated sunspots, g is the number of sunspot group, k
is the observatory factor taking into account various observation conditions. At
present, all the works of estimation and spreading the Wolf numbers from 1749
year are stored by the Royal Observatory of Belgium in Brussels [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Represented
time series with averaged monthly Wolf sunspot numbers from January 1749 till
January 2017 are shown in Fig. 1.
      </p>
      <p>
        It is important to note that the main di culty with analysis of these time
series is caused by its non-stationary characteristic [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This fact signi cantly
reduces number of methods for analysis and forecasting the mentioned time series.
(2)
Majority of time series analysis methods (smoothing and regression methods,
autoregressive models, etc.) are based on the a priori assumption of weak
stationarity [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. So, it would be more appropriate to apply to Wolf numbers the
time series analysis and forecasting methods with no a priori assumptions (they
are called the adaptive methods), such as Singular Spectrum Analysis [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
novel Huang-Hilbert Transform [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. These methods decompose the initial data
into components
u (tj ) =
n 1
X ci (tj ) + rn (tj ) ;
i=1
where u (tj ) is the initial time series, ci (tj ) is the i-th component, rn (tj ) is the
residual time series, at the j-th time point tj . Some of these components contain
meaningful (from a physical viewpoint) time-and-frequency characteristics.
      </p>
      <p>In this paper, we present the results of the comparative analysis of the Wolf
sunspot time series examination by two methods: Singular Spectrum Analysis
(SSA) and Huang-Hilbert Transform (HHT) that are also compared with
contemporary astrophysical facts of the solar activity cycles.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Basic SSA algorithm</title>
      <p>
        Following [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we present a brief description of the basic SSA algorithm. Consider
a real-valued time series F = (f0; :::; fN 1) of length N &gt; 2. Assume that F is a
non-zero series. So, it is usually assumed that fi = f (i t) for a certain function
of time f (t) with some time interval t. Moreover, the numbers 0; :::; N 1 can
be interpreted not only as discrete time instants but, also, as labels of any other
sequentially ordered structure. The Basic SSA consists of two complementary
stages: decomposition and reconstruction.
      </p>
      <sec id="sec-2-1">
        <title>First stage: decomposition</title>
      </sec>
      <sec id="sec-2-2">
        <title>The 1st step. Embedding</title>
        <p>The embedding procedure maps the original time series to a sequence of
multidimensional lagged vectors. Let L be an integer (the window length), 1 &lt; L &lt; N .
The embedding procedure forms K = N L + 1 lagged vectors
Xi = (fi 1; :::; fi+L 2)T ; 1
i</p>
        <p>K;
which have dimension L. If we need to emphasize the dimension of the Xi, then
we shall call them the L-lagged vectors. The L-trajectory matrix (or simply
trajectory matrix) of the series F is
0 f0 f1 f2 ::: fK 1 1</p>
        <p>B f1 f2 f3 ::: fK CC
X = (xij )iL;j;K=1 = BB@BB f...2 f...3 f...4 .::.:. fK...+1 CCAC :
fL 1 fL fL+1 ::: fN 1
(3)
Obviously, xij = fi+j 2 and the matrix X has equal elements on the diagonals
i+j = const. Thus, the trajectory matrix is a Hankel matrix. Certainly, if N and
L are xed, then there is a one-to-one correspondence between the trajectory
matrices and the time series.</p>
      </sec>
      <sec id="sec-2-3">
        <title>The 2nd step. Singular value decomposition</title>
        <p>The result of this step is the Singular Value Decomposition (SVD) of the
trajectory matrix (3). Let S = XXT . Denote by 1; :::; L the eigen-values of S taken
in the decreasing order of their magnitudes ( 1 ::: L 0) and by U1; :::; UL
the orthonormal system of the eigen-vectors of the matrix S corresponding to
these eigen-values.</p>
        <p>Let d = max fi : i &gt; 0g. If we denote Vi = XT Ui=p i; i = 1; :::; d, then the
SVD of the trajectory matrix X can be written as</p>
        <p>X = X1 + ::: + Xd;
(4)
where Xi = p iUiViT . The matrices Xi have rank 1; therefore, they are
elementary matrices. The collection p i; Ui; Vi will be called the i-th eigen-triple of
the SVD (4).
2.2</p>
      </sec>
      <sec id="sec-2-4">
        <title>Second stage: reconstruction</title>
      </sec>
      <sec id="sec-2-5">
        <title>The 3rd step. Grouping</title>
        <p>Once the expansion (4) has been obtained, the grouping procedure breaks down
the set of indices f1; :::; dg into m disjoint subsets I1; :::; Im.</p>
        <p>Let I = fi1; :::; ipg. Then the resultant matrix XI corresponding to the group
I is de ned as XI = Xi1 + ::: + Xip . These matrices are computed for I =
I1; :::; Im, and expansion (4) leads to a new decomposition. The procedure of
choosing the sets I1; :::; Im is called the eigen-triple grouping.</p>
      </sec>
      <sec id="sec-2-6">
        <title>The 4th step. Diagonal averaging</title>
        <p>The last step in the Basic SSA transforms each matrix of the grouped
decomposition XI = XI1 + ::: + XIm into a new series of length N .</p>
        <p>Let Y be an L K matrix with elements yij , 1 i L, 1 j K. We set
L = min (L; K) ; K = max (L; K) and N = L + K 1. Let yij = yij if L K
and yij = yji otherwise. The diagonal averaging transfers the matrix Y to the
series (g0; :::; gN 1) by the formula
gk =
8
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&lt;
&gt;
&gt;
&gt;&gt; 1
&gt;:&gt;&gt; N k
k +11 mkP+=11 ym;k m+2;
L1 mPL=1 ym;k m+2;</p>
        <p>N K +1;</p>
        <p>P
m=k K +2
for 0
k &lt; L</p>
        <p>1;
for L
1
k &lt; K ;
(5)
ym;k m+2
for K
k &lt; N:</p>
        <p>Expression (5) corresponds to averaging of matrix elements over the diagonals
i + j = k + 2: the choice k = 0 gives g0 = y11, for k = 1 we have g1 =
(y12 + y21) =2, and so on. Note that if the matrix Y is the trajectory matrix
of some series (h0; :::; hN 1) (in other words, if Y is the Hankel matrix), then
gi = hi for all i.</p>
        <p>The diagonal averaging (5) applied to a resultant matrix XIk produces the
series F~(k) = f~0(k); :::; f~N(k) 1 , and, therefore, the initial time series (f0; :::; fN 1)
is decomposed into the sum of m series like in the additive model (2).
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Huang-Hilbert transform and its modi cations</title>
      <p>The</p>
      <p>
        rst stage: empirical mode decomposition
Using the Huang-Hilbert transform, components ci(t) in (2) are obtained by the
ensemble empirical mode decomposition (EEMD) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This procedure is a
modi cation of the basic empirical mode decomposition (EMD) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which diagram
is illustrated in Fig. 2.
      </p>
      <p>The EMD main idea is based on the assumption that any data consist of
di erent simple intrinsic modes of oscillation. These modes can be empirically
estimated with enveloping curves (upper curve U and lower curve L) constructed
by cubic splines passing through the maximum and minimum points of the time
series. The arithmetical mean m of those curves U and L is removed from the
analysed data. So, the received residual r is subjected to this enveloping
procedure again until calculated residual r ts with a certain criteria and then it is
noted as the found intrinsic component ci. By removing the found component
ci from the initial time series u0(t), we obtain a new time series u(t) to
decompose iteratively based on de nition (2). This decomposition is repeated until the
residual rn(t) becomes a monotonic function.</p>
      <p>
        The EEMD procedure [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is a more complex version of EMD; a diagram for
EEMD is presented in Fig. 3. In contrast to EMD, overall decomposition (2) is
applied to the initial time series u0(t) not only once, but several NE (Ensemble
Number) times. Also, in order to achieve the initial discrepancy of time series for
the subsequent ensemble averaging procedure, the white Gaussian noise (wGn)
with known signal-to-noise ratio SN E is added to the initial data u0(t).
      </p>
      <sec id="sec-3-1">
        <title>The second stage: time-and-frequency characteristics</title>
        <p>The decomposition is only the rst step in the HHT method (and to some extent
in SSA as well). Time-and-frequency characteristics of received components that
were found meaningful from a physical viewpoint are extremely signi cant in
investigation of sunspot time series; for example, for comparison of the results
with known aspects of the sunspot cycles (especially, periods) and in further
analysis of non-stationary time series and forecasting of the Wolf numbers.</p>
        <p>
          On the second stage of the HHT method, a concept of analytical signal
is used to calculate time-and-frequency characteristics of data. This idea was
introduced in 1946 by Gabor [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Analytical signal allows one to introduce simple
and unambiguous de nitions for amplitude a(t), phase ' (t), and instantaneous
frequency ! (t) of signal u(t).
        </p>
        <p>
          Based on [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], the analytical signal w(t) is de ned as
        </p>
        <p>w (t) = u (t) + jv (t) = a (t) ej'(t);
where v(t) is the signal conjugated to the initial data u(t) with Hilbert transform
v (t) =</p>
        <p>+1
1 Z
1
u ( )
t
d :
De nition (6) allows one to estimate naturally the amplitude a(t) for the signal
u(t)</p>
        <p>a (t) = jw (t)j = pu2 (t) + v2 (t);
phase ' (t)
' (t) = arctan
v (t)
u (t)
= arccos</p>
        <p>= arcsin
u (t)
a (t)
v (t)
a (t)
;
and instantaneous frequency ! (t)
! (t) = '_ (t) =
u (t) v_ (t) u_ (t) v (t)
a2 (t)
:</p>
        <p>The same method (6 { 10) can be used to calculate time-and-frequency
characteristics of components received by the SSA decomposition and reconstruction.
Note that in the case of the monthly sunspot numbers (1), we are more
interested not in the instantaneous frequency ! (t), but in the function of time for
the instantaneous period T (t) = ! 1 (t).
(6)
(7)
(8)
(9)
(10)</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Analysis of results for the monthly numbers</title>
    </sec>
    <sec id="sec-5">
      <title>Wolf sunspot</title>
      <p>With decomposition of the initial Wolf sunspot numbers data (Fig. 1) by two
methods (SSA and EEMD), we received ve signi cant (from physical point of
view) components (Fig. 4).</p>
      <p>
        In the SSA method, we used the dimension window L = 1592 in accordance
with recommendations from [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Grouping was made with eigen-triples 1, 2{5, 6{
7, 16{17, 18{19. Reconstructed data justify 84.09% of dispersion from the initial
time series. In the HHT method, we used NE = 100, SN E = 11dB in accordance
with technique for the Huang-Hilbert transform [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>From Figure 4 it is seen that
1. it is actually possible to decompose non-stationary Wolf sunspot numbers'
time series into physically meaningful components by the SSA and HHT
methods without any a priori information;
2. corresponding components found by two di erent adaptive methods are
visually close to each other;
3. with increasing order number of components, their frequency domain is
decreasing;
4. the fth component c5 represents trend for the monthly sunspot numbers;
5. all found components are amplitude-modulated and frequency-modulated
signals; so, we need to calculate their time-and-frequency characteristics for
their detailed analysis;</p>
      <p>Thus, for the received components, we calculated time-and-frequency
characteristics, such as instantaneous period T (t) = ! 1 (t) by equations (6 { 10).
Obtained results of instantaneous period were subjected to a smoothing at ve
points to reduce noise e ects; nal results are summarized in Table 1. The
wellknown cycles are also shown in this table. These corresponding cycles were taken
from contemporary astrophysical facts of solar activity cycles.</p>
      <p>SSA</p>
      <p>HHT</p>
      <p>E
Fig. 4. Components received by decomposition with the HHT (left) and SSA (right)
methods</p>
      <p>
        From Fig. 4 and Table 1 some facts can be noted.
1. Instantaneous periods are more distinguished between two methods, because
equation (10) uses di erentiation procedure.
2. Nevertheless, the mean value and variance (to some extent) of periods for
the rst four components are quite close to each other. Last component c5
is a trend; thus, its time-and-frequency characteristics are unreliable due to
insu cient time interval of time series analysis.
3. The rst component actually represents the 11-year Schwabe-Wolf Cycle and
shows a tendency to oat between 7 and 15 years. This fact is well-known
in the solar activity astrophysics [1{3].
4. The second component corresponds to the 22-year Hale Cycle. This is the
most inaccurate component received by those two methods. This fact can
be explained by multiplicity of this cycle with 11-year cycle, which leads to
inadequate decomposition by adaptive methods that are evidently weak in
the case of multiple frequencies.
5. The third component represents the known Half-Century Cycle, which is
noted not to be valuable [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], but often found by other methods of time series
analysis [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
6. The fourth component corresponds to the century Gleisberg Cycle and shows
a tendency to oat between 80 and 120 years (in the HHT method).
7. The latter component represents a trend that can be compared with the
known Suess Cycle (period is more than 200 years). Facts about this cycle
are not estimated from the solar activity, but from total irradiance research of
the terrestrial radio-isotopes [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], hence, it is di cult to compare the results.
8. Distortion in instantaneous period of components c1; c3; c5 next to 1790{1820
years corresponds to the known Dalton Minimum [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
5
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In the paper, comparative analysis was performed for components received by
decomposition of Wolf sunspot numbers with the SSA and HHT information
adaptive methods. It is possible to make the following conclusions.
1. Both methods (SSA and HHT) can be successfully used in non-stationary
time series analysis to decompose the initial data into components with
meaningful (from physical viewpoint) properties.
2. Instantaneous periods received by the Hilbert transform after
decomposition can be compared with the well-known facts about time-and-frequency
characteristics of the solar activity.
3. Distinctions between components calculated by the SSA and HHT methods
are connected with di erent approaches in their basis: the SSA has a more
theoretical approach with orthogonal dimensions in its core, while the HHT
is clearly the empirical method based on extrema points.</p>
      <p>We expect that this information can be helpful in further time series data
analysis techniques and, especially, in the forecast of solar activity by the
contemporary informational technologies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Pulkkinen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Space Weather</article-title>
          .
          <source>Living Rev. Solar Phys</source>
          .
          <volume>4</volume>
          (
          <issue>1</issue>
          ), (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Haigh</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The sun and the earths climate</article-title>
          .
          <source>Living Rev. Solar Phys</source>
          .
          <volume>4</volume>
          (
          <issue>2</issue>
          ), (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hathaway</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The solar cycle</article-title>
          .
          <source>Living Rev. Solar Phys</source>
          .
          <volume>7</volume>
          (
          <issue>1</issue>
          ), (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>SILSO</given-names>
            <surname>World Data</surname>
          </string-name>
          <article-title>Center: The international sunspot number</article-title>
          . http://sidc.be/silso/data les
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gabor</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Theory of communication</article-title>
          .
          <source>Jour. Inst. Elect. Eng</source>
          .
          <volume>93</volume>
          ,
          <issue>429</issue>
          {
          <fpage>457</fpage>
          (
          <year>1946</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Box</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Jenkins</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Time Series Analysis: Forecasting and Control</article-title>
          .
          <source>Prentice Hall, 3rd edition</source>
          (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Golyandina</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>Analysis of Time Series Structure: SSA and Related Techniques</article-title>
          . Boca Raton: Chapman &amp; Hall/CRC (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , et al.:
          <article-title>The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis</article-title>
          .
          <source>Proc. R. SOC. London, Ser. A</source>
          .
          <volume>454</volume>
          ,
          <issue>903</issue>
          {
          <fpage>995</fpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Huang</surname>
          </string-name>
          , N.:
          <article-title>Ensemble Empirical Mode Decomposition: a noise assisted data analysis method</article-title>
          .
          <source>Adv. in Adaptive Data Analysis</source>
          .
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          {
          <fpage>41</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>