<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Synchrosqueezed Wavelet Transform assisted machine learning framework for time series forecasting</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Stocchi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ilaria Lunesu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simona Ibba</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gavina Baralla</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Marchesi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Electric and Electronics Engineering, University of Cagliari</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The attention of researchers in the eld of signal analysis has recently been captured by several kinds of reassignment techniques. Among the reallocation methods the Synchrosqueezing approach maps a continuous wavelet transform from the time-scale to the time-frequency plane, allowing a much more de nite and consistent representation of the frequency content of a signal. Its mathematical foundations prove that the Synchrosqueezing Wavelet Transform is directly related to the Empirical Mode Decomposition - EMD, since both methods allow the decomposition of a signal having nite energy in its intrinsic mode functions, each having time-varying frequency and amplitude. We develop a fast1 C++ implementation of the Synchrosqueezed Wavelet Transform SST suitable for the synchronic extrusion of instantaneous frequency information from univariate time series. Such module, totally con gurable and adaptable to input datasets having di erent statistical properties, can be coupled with a predictor system to the purpose of real value forecasting. We project such a composite application and plan to research the forecasting accuracy achievable, using di erent types of non parametric statistical estimators or neural regressors.</p>
      </abstract>
      <kwd-group>
        <kwd>query trends</kwd>
        <kwd>machine learning</kwd>
        <kwd>wavelets</kwd>
        <kwd>synchrosqueezing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In the last few years signal decomposition has been often employed to improve
the forecasting capabilities of predictor systems. Among all the time-frequency
1 ISO/IEC 14882:2014 conformant C++ implementation compiled with speed
maximization optimization, inline functions expansion and intrinsic functions enabled,
target machine x64.
analysis models suitable to be applied to univariate data series, one of the most
simple yet e ective method is the Empirical Mode Decomposition, researched in
the late 90s by Huang et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        EMD is an empirical method, based on a sifting procedure aimed to identi
cate a signal's components, each of them featured by both amplitude and phase
modulation. Sometimes it is compared to other analysis methods, such as Fourier
Transforms or Wavelet decomposition. This approach is suitable to study signals
which exhibits non-linearities and non-stationarity properties. The procedure
allows signal decomposition in completely and closely orthogonal basis which may
be expressed as the sum of the components of the form A(t)cos( (t)), plus a
residual series. Such decomposition can be considered as a generalized Fourier
series, in which the components phases and amplitudes are not constrained to
be constant. Interest in such approach motivated Wu and Huang [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in
further research innovations, that brought them to formulate the Ensemble EMD
(EEMD), in which the xed number of intrinsic mode functions - IMFs of a series
are found after having perturbated the signal with gaussian noise; the result is an
improved IMF identi cation at a slightly higher computational cost. The interest
and absolute generalized utility of EMD and EEMD intrigued researchers to
explore the possible mathematical foundations involved in the aforesaid empirical
approaches.
      </p>
      <p>
        In recent years, Daubechies et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] developed a new method called
Synchrosqueezed Wavelet Transform, (SST), with the same objective of EMD, but
providing a solid mathematical basis. They started from the theory of the
Continuous Wavelet Transform, (CWT), adding a procedure that reassigns the signal
energy in the frequency domain. We believe that the use of SST method, used as
a pre-processing signal module, could greatly enhance the forecasting accuracy
of a predictor. Signal decomposition allows to extract some extra information
which can be used in the machine learning eld. This is the main technical
argument of the present paper and it will be described in the next sections.
      </p>
      <p>The authors are planning to develop and develop a prediction system suitable
for the analysis of streaming datasets such as the search engine query trends.
We develop a fast C++ implementation of the Synchrosqueezing Wavelet
Transform (see footnote 1), such system being con gurable according to the speci c
properties of di erent types of input datasets.</p>
      <p>This paper is organized as follows: Section 2 provides the theoretical
foundations of the SST, section 3 provides a description of several aspects of the
implementation, referencing the features of the SST algorithm. Also, it depicts
plannings for the most important machine learning aspects of the forecasting
system. The last section provides the authors conclusions and portrays further
research directions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Description and implementation of SST</title>
      <p>A Continuous Wavelet Tranform - CWT allows the time-varying analysis of the
frequency content of a signal f (t) having nite energy. Choosing an appropriate
mother wavelet function (t) such that its Fourier transform ^( ) = 0; &lt; 0,
(hence a complex wavelet), the CWT of f (t) is de ned as a series of parametric
convolutions of the signal and the complex conjugate of scaled versions of the
wavelet function. After the reassignment of the CWT, the vector of frequencies
! can be partitioned in equivalence classes having a chosen bandwidth, and
the reconstruction is a piecewise performed function, thus creating the set of
intrinsic mode functions mi(t) meant to be the original series components. From
each mi(t), instantaneous amplitudes and phases can be recovered. Our actual
position is to explore the possible machine learning approaches based on an input
series preprocessing operation performed via the SST.</p>
      <p>For the scope of the present demonstration we develop a fast C++
implementation of the SST algorithm, in which the complex wavelet function (t) can
be selected via parametric polymorphism. In order to be able to e ectively use
the system for prediction purposes, given the evident heaviness of the stepwise
calculations, e ciency is considered paramount. Also, considering the structure
of the CWT algorithm, hyperthreading is possible and, depending on the
target machine features, compiled code could run even more e ciently. Note that
a concurrent implementation of the CWT could be feasible even implementing
it using alternative algorithms; as such, a parallel implementation of the CWT
portion of the SST is already under development.</p>
      <p>Several search engine query trends series are downloaded for automated
testing and debugging purposes. We perform extensive analysis and synthesis tests
on the gathered datasets in order to compare the reconstruction accuracies.
Finally we are going to choose the wavelet function which gives the lower
reconstruction error results. The system must be able to read streaming datasets such
as the above mentioned search trends chunked with a xed length window, the
latter sliding from the beginning of the series to the last known data sample.</p>
      <p>Following the obtained empirical results we can consider feasible the
creation of a prediction system based on learning the features extracted by a SST
preprocessing module.</p>
      <p>IMFs can be used as a training set for a prediction system such as a group
of cooperating neural networks, hence forecasting the one step ahead value of
each IMF using neural regression and summing up all the di erent
contributions to synthesize the nal real value forecast. Also, a di erent approach could
be to train a set of Nf detached perceptrons to forecast the one-step ahead
instantaneous frequencies of the series.</p>
      <p>Let us recall that a Synchrosqueezed Wavelet Tranforms outputs a matrix
Tf (!; b) of size fNf ; N g, where Nf is the size of the frequencies vector F , and
N is the size of an unpadded (i.e. raw) input series. Hence the number of
independent neural regressors would be con ned to be Nf . Each perceptron atom
would be responsible to learn the !f (a; b) rate of change in time direction, i.e.
@@t !f (a; b), in order to be able to analytically nd the one-step ahead !f (a; b+1)
value having empirically read the previous !f (a; b).</p>
      <p>Having presented di erent possibile development directions, it should be now
clear that the use of the SST as an assisting module to a time series prediction
system might introduce many innovation features in the machine learning eld.
We are going to explore all the above mentioned implementation paths and test
them using the Bitcoin search engine queries datasets. Also, since it is possible to
nd in the literature several benchmark methods to the Bitcoin price prediction,
in order to focus on the estimate of the forecasting accuracy of the system, we
plan to test the latter using the Bitcoin price series (the hourly close data of the
Bitcoin - US Dollar exchange rates - BTCUSD).</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>The Synchrosqueezing Wavelet Transform allows an adaptive signal
decomposition on time-frequency plane and it is useful for analyzing multicomponent
signals. After a decomposition, signals can be written as a sum of intrinsic modes,
this means that the SST can be used for prediction purposes. It is our opinion
that such transform paves the way to novel machine learning approaches that
focus on learning the rates of change of time series instead of tting their values;
such approaches could easily become a very active area of research in the next
few years. Advantages to have such a type of system, totally con gurable and
exible, are multiple. It is a completely dedicated implementation, customizable
and reliable enough to be adapted to every kind of input dataset. Furthermore it
allows to change or modify the algorithm code according to speci c requirements.
Independent graphics modules help us to detect errors and misconceptions
during the development phase as well as when testing the selected datasets for
research purposes. The combination of the SST and a prediction system might
open the path to di erent and novel research directions. Actually we are
investigating about the applicability of the system in several interesting domains:
In ation forecasting, smart cities energy consumption, in uenza prediction,
economic uctuations forecasts.</p>
      <p>Acknowledgments. We would like to express gratitude to Eugene Brevdo for
his contribution to determine several theoretical misconceptions related to the
SST, allowing us to nd both poisonous and subtle bugs on the rst
implementation of the system preprocessing module.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>I.</given-names>
            <surname>Daubechies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          , and
          <string-name>
            <surname>Hau-Tieng</surname>
            <given-names>W.</given-names>
          </string-name>
          <article-title>Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool</article-title>
          .
          <source>Applied and Computational Harmonic Analysis</source>
          ,
          <volume>30</volume>
          :
          <fpage>243</fpage>
          {
          <fpage>261</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>N.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis</article-title>
          .
          <source>Proc. R. Soc. Lond.</source>
          ,
          <volume>454</volume>
          :
          <fpage>903</fpage>
          {
          <fpage>995</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.E.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>Ensemble empirical mode decomposition: a noise-assisted data analysis method</article-title>
          .
          <source>Advances in Adaptive Data Analysis</source>
          ,
          <volume>1</volume>
          :1{
          <fpage>41</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>