<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Classification of Multifractal Time Series by Decision Tree Methods</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          ,
          <addr-line>Lyudmyla Kirichenko</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Bulakh Vitalii</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Kharkiv National University of Radioelectronics</institution>
          ,
          <addr-line>Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The article considers classification task of model fractal time series by the methods of machine learning. To classify the series, it is proposed to use the meta algorithms based on decision trees. To modeling the fractal time series, binomial stochastic cascade processes are used. Classification of time series by the ensembles of decision trees models is carried out. The analysis indicates that the best results are obtained by the methods of bagging and random forest which use regression trees.</p>
      </abstract>
      <kwd-group>
        <kwd>multifractal time series</kwd>
        <kwd>binomial stochastic cascade</kwd>
        <kwd>classification of time series</kwd>
        <kwd>Random Forest</kwd>
        <kwd>Bagging</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Many complex systems have a fractal structure and their dynamics is represented by
time series that have fractal (self-similar) properties. Fractal time series take place in
technical, physical, biological and information systems. The results of time series
fractal analysis are widely used in various fields, in particular, in recognition and
classification.</p>
      <p>
        The tasks of classification of fractal realizations arise when medical diagnosis
clarification by ECG and EEG [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], when DDoS-attacks are detected by the incoming
traffic [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], when critical situations in financial markets are forecasted by economic
indicators series [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], etc. Most often, such tasks are solved by estimating and
analyzing fractal characteristics. However, in recent years, there has been a growing interest
in machine learning methods to analyze and classify fractal series [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4-6</xref>
        ].
      </p>
      <p>Key to correctly solving the classification problem is the choice of the
classification method. To answer the question which method is best for analyzing multifractal
properties, we present the results of study in which the classification of model
realizations possessing fractal properties was carried out. The aim of the work is a
comparative analysis of the classification of multifractal stochastic time series performed by
methods based on decision trees.</p>
    </sec>
    <sec id="sec-2">
      <title>Multifractal Time Series</title>
      <p>
        The self-similarity of random process is to preserve distribution law when changing
the time scale. A stochastic process X (t) is self-similar with a parameter H if the
process aH X at  is described by the same finite-dimensional distribution laws as
X (t) . The parameter H , 0  H  1, called the Hurst exponent, represents the
measure of self-similarity and the long-term dependence. Multifractal stochastic processes
are inhomogeneous self-similar ones and have more flexible scaling relations. The
multifractal properties of process are defined the scaling exponent (q) . In the
general case (q) is a nonlinear function for which the value ( 1) / 2 coincides with
the value of Hurst exponent H . For monofractal process (q) is linear. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>One of the frequently used models of the multifractal process is the conservative
stochastic binomial multiplicative cascade. To its construction, an iterative procedure
based on two main parts is used. The first represents geometric detailing by iterative
partitioning of intervals, and the second guarantees randomness of the weighting
coefficients. For each iteration n , n 1 , we have a time series (cascade) with
multifractal properties.</p>
      <p>
        The fractal characteristics of stochastic multiplicative cascade obtained using beta
distribution random variable Beta( ,  ) are completely determined by the
parameters  ,   0 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The change of value  of the symmetric distribution Beta( ,  ) ,
allows to generate of multiplicative cascades with specified multifractal properties
and Hurst parameter in the range 0.5  H  1 .
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Classification Methods</title>
      <p>The decision tree method is effective method of classification. It is applicable to
solving classification problems arising in various fields. It consists in the process of
partition the original data set into groups, until homogeneous subsets are obtained. The set
of rules that give such a partition allows then make a conclusion for new data.
However the decision tree models are unstable: a slightest change in the training set can
bring to the essential changes in the tree structure. In this case, it is expedient to use
ensembles of elementary classifiers. The components of the ensemble can be the same
type or different.</p>
      <p>
        The bagging method [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is based on the statistical method of bootstrap aggregating.
Bagging is a classification technique where all the elementary classifiers are trained
and operate independently of each other. The basis of the bagging method is the
classification technology, called "perturbation and combination". Perturbation is
understood as the introducing of some random changes in training data and the construction
of several alternative models on the modified data. From a single training set several
samples containing the same number of objects are extracted by sampling. To obtain
the result of the work of the ensemble of models, the voting or averaging are usually
used. The effectiveness of bagging is achieved due to the fact that the basic
algorithms, trained in different subsamples, are obtained quite different and their mistakes
are mutually compensated in the voting process.
      </p>
      <p>
        Random forest is also a method of bagging, but it has several features [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]: it uses
an ensemble of only regression or classifying decision trees; in the sampling
algorithm the random selection of features is also carried; the decision trees are built up to
the full completion of the training objects and are not subjected to post pruning.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Research results</title>
      <p>
        To build decision tree models, Python with libraries that implement machine learning
methods was used [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Classification of time series obtained by generating stochastic
binomial cascades with different multifractal properties was carried out. Each class
was a set of model time series with the same Hurst exponent. Hurst exponent values
were varied in the range of 0.5 to 1 in increments of 0.05. Thus, the training of models
was carried out in 11 classes.
      </p>
      <p>In the work, to determine the time series belonging to one of the 11 classes, the
methods of bagging and random forest were used. In this case the objects were the
cascade time series, and the features were the values of this series. In each of the
methods, the ensembles of decision trees, both classifications and regressions, were
involved. In the case of regression decision trees, the result of the classification is the
probability of matching of multifractal cascade to given class. The models for each
class were trained on five hundred examples of time series and were tested on fifty
test ones.</p>
      <p>The probabilities are calculated by the formula: Pi  1  mi  C , where mi is the
regression result for the i-th example, С is theoretically known class number. If the
condition Pi  0.5;1 is met, the classification is considered as correct. If Pi  0.5
and mi  C then the cluster number is overvalued, otherwise it is understated.
In this paper, the comparative analysis of the classification of model multifractal
stochastic time series using meta-algorithms based on decision trees has been performed.
Binomial multiplicative stochastic cascades were used as input time series.</p>
      <p>Time series were divided into classes depending on their fractal properties.
Random forest and bagging methods were used to classify the series. In each method,
ensembles of decision trees, both of classification and regression, were involved.</p>
      <p>From the research that has been carried it is possible to conclude that the
classification of series by fractal properties using decision trees methods gives good results.
The best results were obtained with the use of regression trees. In the classification of
the series with a small length random forest method showed greater accuracy.</p>
      <p>The obtained results can be used for practical applications related to the
classification or clustering of real time series with fractal properties. In our future researches
we intend to concentrate on the classification of real series using additional features
such as fractal characteristics.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alghawli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kirichenko</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Multifractal Properties of Bioelectric Signals under Various Physiological States</article-title>
          .
          <source>Information Content &amp; Processing International Journal</source>
          <volume>2</volume>
          (
          <issue>2</issue>
          ),
          <fpage>138</fpage>
          -
          <lpage>163</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kaur</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saxena</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
          </string-name>
          , J.:
          <article-title>Detection of TCP targeted high bandwidth attacks using self-similarity</article-title>
          .
          <source>Journal of King</source>
          Saud University - Computer and Information Sciences,
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kristoufek</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Fractal Markets Hypothesis and the Global Financial Crisis: Scaling, Investment Horizons and Liquidity</article-title>
          . https://arxiv.org/pdf/1203.4979.
          <string-name>
            <surname>pdf</surname>
          </string-name>
          (
          <year>2012</year>
          <article-title>) last accessed</article-title>
          <year>2018</year>
          /03/26.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>André</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coelho</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clodoaldo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lima</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Assessing fractal dimension methods as feature extractors for EMG signal classification</article-title>
          .
          <source>Engineering Applications of Artificial Intelligence</source>
          <volume>36</volume>
          ,
          <fpage>81</fpage>
          -
          <lpage>98</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Symeonidis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Sentiment analysis via fractal dimension</article-title>
          .
          <source>In: Proceedings of the 6th Symposium on Future Directions in Information Access</source>
          ,
          <fpage>48</fpage>
          -
          <lpage>50</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Arjunan</surname>
            ,
            <given-names>S. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>D. K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naik</surname>
            ,
            <given-names>G. R.:</given-names>
          </string-name>
          <article-title>A machine learning based method for classification of fractal features of forearm sEMG using Twin Support vector machines</article-title>
          .
          <source>In: Annual International Conference of the IEEE Engineering in Medicine and Biology</source>
          ,
          <volume>4821</volume>
          -
          <fpage>4824</fpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Riedi</surname>
            ,
            <given-names>R.H.</given-names>
          </string-name>
          :
          <article-title>Multifractal processes</article-title>
          , in Doukhan P.,
          <string-name>
            <surname>Oppenheim</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taqqu</surname>
            <given-names>M.S.</given-names>
          </string-name>
          (Eds.),
          <source>Long Range Dependence: Theory and Applications: Birkhuser</source>
          .
          <fpage>625</fpage>
          -
          <lpage>715</lpage>
          (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kirichenko</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radivilova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kayali</surname>
          </string-name>
          , E.:
          <article-title>Modeling telecommunications traffic using the stochastic multifractal cascade process</article-title>
          . Problems of Computer Intellectualization,
          <fpage>55</fpage>
          -
          <lpage>63</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Bagging predictors</article-title>
          .
          <source>Machine Learning</source>
          .
          <volume>24</volume>
          (
          <issue>2</issue>
          ),
          <fpage>123</fpage>
          -
          <lpage>140</lpage>
          (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.: Random</given-names>
          </string-name>
          <string-name>
            <surname>Forests</surname>
          </string-name>
          .
          <source>Machine Learning 45 (1)</source>
          ,
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Cielen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meysman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Introducing Data Science: Big Data, Machine Learning, and more, using Python tools</article-title>
          .
          <source>Manning Publications</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>