=Paper= {{Paper |id=Vol-2258/paper61 |storemode=property |title=Using a Neural Network to Select Methods for Predicting Time Series in a Hybrid Combined Model |pdfUrl=https://ceur-ws.org/Vol-2258/paper61.pdf |volume=Vol-2258 |authors=Dmitry Yashin }} ==Using a Neural Network to Select Methods for Predicting Time Series in a Hybrid Combined Model== https://ceur-ws.org/Vol-2258/paper61.pdf
Using a neural network to select methods for predicting time
series in a hybrid combined model

                D V Yashin1
                1
                    Ulyanovsk State Technical University, 32, Severny Venets street, Ulyanovsk, Russia


                Abstract. In this paper, a technique for selecting individual methods in a combined time series
                forecasting model is described. A neural network at the input of which a vector of time series
                metrics is proposed. The metrics corresponds to significant characteristics of the time series.
                The values of the metrics are easily computed. The neural network calculates the estimated
                prediction error for each model from the base set of the combined model. The proposed
                selection method is most effective for short time series and when the base set contains a lot of
                complex prediction methods. The developed system was tested on the time series from the CIF
                2015-2016 competitions. According to the result of the experiment, the application of the
                developed system allowed to reduce the average forecast error from 13 to 9 percent.



1. Introduction
The constant need to improve the accuracy of forecasting time series leads to an increase in the
number of publications on this topic. It also contributes to the emergence of new forecasting methods,
with the growing number of which, the problem of choosing the most appropriate method is becoming
increasingly important.
    One of the possible solutions to this problem is the use of the combined forecasting models, which
allow to obtain an aggregated forecast based on the results of several methods. However, in this case,
the choice of methods from a base set is also a problem since individual inexact forecasts can
significantly reduce the overall accuracy.
    In this paper, one of the approaches to solving the problem of selecting individual models from a
base set is proposed. Used specially developed neural network that selects methods according to the
values of the metrics corresponding to the characteristics of the time series. Characteristics relevant to
the subject area have been selected for this purpose.
    Also in this paper, the set of metrics that is optimal for the solution of the problem is determined,
the efficiency of the developed system is estimated.

2. Combined time series forecasting model. Selecting methods from the base set

2.1. Hybrid systems in artificial intelligence
The concept of hybridization arises as a consequence of the development of the principles of the
system approach to the study of complex artificial objects in computer science [1]. Professor N.G.
Yarushkina highlights the development of hybrid integrated and synergetic systems as one of the
leading trends in modern computer science. She defines hybridization as the integration of methods
and technologies at a deep level, when the system includes interacting modules that implement various
methods for solving problems of artificial intelligence [2]. A.V. Gavrilov, noting the hybridization
process as the main trend in the development of artificial intelligence, considers hybridization as a

                                                                                                            519
variant of combining the methods of representation and processing of knowledge in hybrid intelligent
systems [3].
   Hybrid intelligent information systems are an interdisciplinary scientific direction, within the
framework of which the applicability of several methods from different classes of the method of
formalized representation of systems (analytical, statistical, logical-linguistic, fuzzy, and others) to
solving decision-making problems is investigated. While none of the above classes can be considered
universal [4]. The interaction of several methods makes it possible to compensate for their drawbacks
due to the appearance of an integrative property.

2.2. Combined forecasting model
There are many methods for forecasting time series, each of which has its advantages and
disadvantages. To take advantage of several methods, combined models are used. According to [5],
the combined forecasting model is a prediction model consisting of several individual models, called
the base set.
    In [6] listed a number of factors that emphasize the effectiveness of the combined model, among
which: the inability to choose a single model from experimental data; the presence in each rejected
forecast of important for modeling information; the need to choose from a group of models that have
similar statistical characteristics when trying to select a single best model.
    There are two types of combined models: selective and hybrid. In the selective model, the forecast
is calculated by one method selected from the base set at each time point. In the hybrid model, the
predicted value is obtained by aggregating the prediction results for several models from the base set
(usually, the final forecast is a weighted sum of individual forecasts) [5].

2.3. Selection of models from the base set
According to [7], the main problem of constructing hybrid combined forecasting models is the finding
of the optimal weights of individual model predictions that ensure the minimum value of the forecast
error of the combined model. However, no less important is the problem of selecting models from the
base set.
    In this paper, a technique for selecting individual methods in a hybrid combined forecasting model
is proposed. This approach can also be used for a selective combined model.
    One of the possible ways to solve the sampling problem is to divide the time series into a training
and control part with subsequent prediction of the control values by each method from the base set,
then it becomes possible to select the models according to the obtained prediction error values. This
method of selecting models is ineffective for combined models with a large number of methods in the
base set, as well as for forecasting short time series. Another way to solve the problem is an expert
evaluation based on the analysis of characteristics and the time series chart by a human expert. A
combination of the two methods described above is also used. In this case forecasting the test values of
the time series is applied only for models selected by the expert from the base set.
    The method proposed in this work is most similar to the expert approach and involves the selection
of models from the base set by a neural network that uses a set of simply computed time series
metrics.

3. Time series metrics for solving the prediction problem
There are many characteristics of a time series, such as the presence of a trend and the seasonal
component, stationarity, length, anomaly of values, variance, mean. To predict the time series, the
most important are the indicators of its dynamics. According to [8], the two main statistical elements
of dynamics are the trend and volatility.

3.1. Trend metrics
The trend is the characteristic of the process of changing the phenomenon over a long time, freed from
random fluctuations [9]. There are many criteria for determining the presence of the trend and for
assessing its severity. Below are considered the most popular criteria.



                                                                                                    520
3.1.1. Wallis-Moore criterion
This criterion assumes the calculation of the differences in the values of the time series (yt+1-yt). Null
hypothesis: the signs of these differences form a random sequence. A sequence of identical difference
signs is called a phase. The number of phases h (without the first and last phase) is calculated. If the
signs form a random sequence, then the actual value of the criterion is calculated by the formula (1)
[10].

                                                                      ,                                  (1)

where n is the length of time series, h is the number of phases.

3.1.2. Cumulative T-criterion
Null hypothesis: no trend in the original time series. The calculated value of the criterion is defined as
the ratio of the accumulated sum of squared deviations of the empirical values of the time series
observations from their mean value and the deviations themselves according to the formula (2) [10]

                                                              ,                                          (2)

where Zn is the accumulated result of deviations of empirical values from the mean level of the
original time series; is the total sum of squared deviations, defined by formula (3)
                                                                          .                              (3)
   If a sufficiently long time series is analyzed, then a standardized deviation (4) can be used to
calculate the criterion values

                                                                      .                                  (4)

   The calculated values of the cumulative T-criterion and tp are compared with the critical values for
a given significance level α.

3.1.3. Foster-Stewart criterion
This criterion is used to test the trend of both mean and variance [8]. The statistics of the criterion have
the form (5):
                                                                              ,                          (5)
where


   Statistics S is used to check the trend in variances, statistics d  to detect the trend in the mean. In
the absence of a trend, the values of t and (6) have a Student's distribution with  = n degrees of
freedom:

                                                                      ,                                  (6)

where

3.1.4. Koks-Stewart criterion
To test the trend hypothesis, the normalized statistics (7) are applied [8]:

                                                                  ,                                      (7)



                                                                                                        521
where                                   .
    Statistics Sl is calculated by the formula (8):

                                                                        ,                               (8)

where                            .
   Under (9) the hypothesis of the average trend is rejected.
                                                                ,                                       (9)

3.1.5. The criterion based on the mean comparison method
To check the presence of a trend, the time series is divided into 2 parts. This criterion is based on the
statement: if the time series has a tendency, the averages calculated for each set separately should
differ significantly among themselves [10]. Null hypothesis on the absence of a tendency reduces to
testing the hypothesis of the equality of the means of two normally distributed populations. The
hypothesis is checked using Student's t-test, the calculated value of which is calculated by the formula
(10):

                                                                               ,                       (10)


where      and    are the mean values,        and  are the dispersions and  and     are the lengths of
the first and second part of the time series.
   The calculated value (tr) of the criterion is compared with its critical (tabular) value (tcr) at a
significance level α and the number of degrees of freedom ν = n - 2.

3.2. Seasonal metrics
There are many methods for estimating the degree of seasonal deviations of the time series. One of
these methods is based on the calculation of seasonality indices. The seasonality index reflects the
degree of seasonal variability of the phenomenon relative to the mean of the time series. This index is
calculated by the formula (11):
                                                                ,                                      (11)
where is the average value of the indicator for the month (quarter);        is the overall average value of
the indicator.

3.3. Dispersion and dispersion trend metrics
As a metric, can be considered both the value of the variance and the degree of the variance trend. To
test the hypothesis of the presence of a trend in dispersion, the Foster Stewart criterion is used. Also
used the criterion based on the partitioning of the time series and the comparison of the variances of
the two parts. Null hypothesis: the variances of two normally distributed sets are equal (
   ).
    The hypothesis is tested on the base of the Fisher-Snedecor F-criterion [10], the calculated value of
which is obtained by the formula (12):

                                                                    ,                                  (12)

   The hypothesis is tested on the base of comparison of the calculated and critical values of the F-test
obtained for a given level of significance α and the number of degrees of freedom ν1 and ν2 (13).

                                                                               ,                       (13)


                                                                                                        522
3.4. Anomaly degree metrics
According to [8], anomalous observations are considered to be separate observations of the time
series, the values of which differ significantly from the remaining observations. To assess the degree
of anomaly of individual observations of the time series, special statistical criteria are used, for
example, the Irwin criterion and the Chauvenet criterion.

3.4.1. Irwin criterion
For all or only for suspected abnormalities in observations, t is calculated (14). If the calculated value
exceeds the table level, then the observation yt is considered anomalous [8].

                                                                  ,                                   (14)

where                   .

3.4.2. Chauvenet criterion
The sample element yt is an outlier if the probability of its deviation from the mean is not more than
1/12n (where n is the sample size). For checked observations, the value of K (15) is calculated, which
is compared with the tabulated value [8].

                                                              ,                                       (15)

where                   .

3.5. Calculating the values of the metrics
To create and train a neural network for selecting forecasting methods, it was decided to establish the
requirement that the value of each metric belong to the interval from 0 to 1. Before calculating the
metrics, the time series are normalized. It is obvious that the variance of the normalized time series
will be within the range from 0 to 1.
    To obtain the final values of the metrics that involve computing the calculated value and comparing
it with the critical value, we use formula (16).

                                                                          ,                           (16)

where        calculated value of the criterion,     critical value of the criterion.
   To calculate the metric of the degree of seasonality on the basis of seasonality indexes, the formula
(17) is applied.


                                                                      ,                               (17)

where T is the period of seasonality of the time series, is the mean of the time series, is the value
of the average seasonality index for the period t.
    To calculate the values of the metrics corresponding to the degree of anomaly of the time series, the
ratio of the number of anomalous (according to the criterion) values to the length of the time series is
used.

4. Neural network for selecting prediction methods
A specially designed neural network is used to select the prediction methods from the base set of the
hybrid combined model (see Figure 1). This neural network is created using the R language and the
built-in package “neuralnet”.



                                                                                                       523
              time series                                                        selected
                                                                                 models
                                   …                         …




                                metrics       hidden       models
                                              layers

                                   Figure 1. Neural network circuit.
   The input values of the neural network are the time series metrics. Each method (m1, m2..., mk.)
from the base set corresponds to one of the output layer's neurons (M1, M2..., Mt.).
   The training set consists of a set of time series represented as a set of metrics and the corresponding
values of the prediction error (SMAPE) for each forecasting model. In this system, time series are
taken from the competition "Computational Intelligence in Forecasting" (CIF) [11] for 2015-2016. The
SMAPE values for the training set are obtained as a result of the forecasting of competitive time series
using the methods of the combined hybrid forecasting system “Combination of fuzzy and exponential
models” (CFEM) [12] developed at the Department of Information Systems of the Ulyanovsk State
Technical University. This system was the best in the CIF competition in 2015
   The metrics (and the methods for calculating them) for the input layer of the neural network are
described in Section 3 of this work. The prediction models (corresponding to the output layer of the
neural network) are taken from the combined CFEM model. The neural network learns from the
metric values to calculate the estimated error values for each model from the base set. Then it is
possible to determine which models from the base set are more efficient for predicting the values of
the current time series. If the aggregating method involves adjusting the weights of the models, the
weights can be set inversely proportional to the predicted error values of each model.

5. The best combination of metrics. Results of experiments

5.1. The base set of metrics
Initially, 11 metrics were used in the developed neural network of selection of prediction methods: 5
metrics for the mean trend, 2 for the dispersion trend, 1 for the variance, 1 for the degree of seasonal
variation, and 2 for the degree of anomaly in the time series.
    The results of evaluations for one characteristic according to various criteria differ due to the
differences in the regularities on which valuation methods are based. For example, the Wallis-Moore
phase-frequency criterion for determining the trend of the mean is based on the comparison of the
neighboring observations of the time series. The Foster-Stewart criterion  on the comparison of the
current value with all the previous ones. The Cox-Stewart criterion  on the dividing the time series
into three parts with the subsequent observation comparison between the third and the first group. The
cumulative T-criterion  on the calculating the accumulated total of deviations of the empirical values
from the time series mean [8], [10]. Proceeding from the foregoing, it can be argued that metrics that
correspond to the same characteristic of a time series, but calculated by different criteria, may not have
a high degree of correlation with each other.
    In determining the optimal combination of time series metrics, the greatest difficulty is observing a
balance between the completeness of the coverage of the regularities of the time series and the low
degree of correlation between the characteristics. It was decided to perform an experiment where the


                                                                                                      524
metrics were sequentially excluded from the general set and the effect of this exception on the
accuracy of the neural network was estimated.

5.2. The experiment on finding the optimal set of metrics
Computational experiments were performed for several configurations of a neural network that differ
in the number of neurons in the hidden layers. In some configurations, the neural network learning
algorithm (based on the back propagation algorithm) did not converge. Such configurations were not
taken into account. The experiments are carried out for neural networks with one hidden layer with the
number of neurons in it from 1 to 11 and for networks with two hidden layers and the number of
neurons in each of them from 1 to 8.
   The training set of the neural network was divided 5 times into the training and control parts. After
that, for each configuration (a set of metrics and the number of neurons in the hidden layers), the
neural network was trained and the deviation of the output values from the test ones was calculated.
   The results of the experiment are presented in Table 1. The columns of the table correspond to sets
of metrics and show the sequence of excluding metrics from the base set, which allowed increasing the
accuracy of the neural network. Each column is divided into two sub-columns containing information
about the configuration of the neural network and the corresponding value of the network error. The
rows of the table correspond to the best 7 configurations for each set of metrics and the average error
for all configurations.
                               Table 1. The best combination of metrics.
  All metrics              All metrics except the     All metrics except the     All metrics except the
                           dispersion trend by the    dispersion trend by the    cumulative T-test and
                           Foster-Stewart criterion   Foster-Stewart criterion   both variance trend
                                                      and the cumulative T-      metrics
                                                      criterion
  number of     SMAPE      number of     SMAPE        number of SMAPE            number of    SMAPE
  hidden                   hidden                     hidden                     hidden
  neurons                  neurons                    neurons                    neurons
  2             9,764      7             8,994        7             9,476        10           9,016
  (8,1)         9,794      (4,5)         9,314        (1,3)         9,667        8            9,103
  9             9,870      (2,6)         9,314        10            9,720        7            9,392
  (4,1)         9,968      (6,2)         9,788        (3,4)         9,777        1            9,553
  (2,1)         10,075     (7,3)         9,869        2             9,782        2            9,592
  (3,3)         10,099     (1,6)         9,924        (1,4)         9,795        (6,3)        9,777
  (8,2)         10,194     4             10,050       6             9,957        (4,3)        9,788
  …             …          …             …            …             …            …            …
  average       10,967     average       10,851       average       10,656       average      10,769
  SMAPE                    SMAPE                      SMAPE                      SMAPE
   According to Table 1, the greatest average accuracy for all configurations was achieved for a set
containing all metrics except the dispersion trend metric by the Foster-Stewart criterion and the
metricsbased on the cumulative T-criterion. However, the subsequent exclusion of another metric of
the dispersion trend allowed to obtain the most accurate results for individual configurations.
   Another experiment was performed to evaluate the effectiveness of using the neural network to
improve the accuracy of the final forecast. To obtain the aggregated forecast only methods with a low
expected forecast error were taken from the base set. It was established that the preliminary selection
of methods by a neural network containing all eleven metrics made it possible to reduce the error of
the final forecast from 13.131% to 9.764%. The use of the best neural network configuration from the
previous experiment (without a cumulative T-criterion and both variance trend metrics, with 10
neurons in the hidden layer) reduced the average final forecast error to 9.016%.
   Also as a result of the experiment, it was established that the logistic activation function for the
considered neural network allows to obtain significantly more accurate results than the hyperbolic
tangent.



                                                                                                          525
6. Conclusions
In this paper, a technique for selecting individual methods in a combined time series forecasting model
is described. This approach is based on the use of a specially developed neural network that selects
methods according to time series metrics.
    We proposed a neural network structure and the algorithms for calculating metric values based on
the characteristics of the time series. According to the obtained experimental data, the optimal set of
the metrics is determined and the efficiency of the neural network application is estimated: the average
forecast error decreased from 13,131% to 9,016%.
    The application of the developed neural network is especially effective for short time series and for
combined models with a large number of prediction methods (since it uses easily calculated metrics).
In these cases, the approach that involves splitting the time series into the training and control parts
and then calculating control values by each method is inefficient.

7. References
[1] Tarasov V B: From multi-agent systems to intellectual organizations: philosophy, psychology,
informatics. Editorial, Moscow (2002)
[2] Yarushkina N G: Fuzzy hybrid systems. Theory and practice. Fizmatlit, Moscow (2007).
[3] Gavrilov A V: Hybrid Intelligent Systems. NSTU, Novosibirsk (2003)
[4] Kolesnikov A V, Kirikov I A, Listopad S V, Rumovskaya S B, Domanicky A A: Solution of
complex traveling salesman tasks using the methods of functional hybrid intelligent systems. Institute
of Informatics Problems of the Russian Academy of Sciences, Moskov (2011.
[5] Lukashin Y P: Adaptive methods of short-term forecasting of time series. Finansy i statistika,
Moscow (2003).
[6] Vasiliev A A: The Genesis of Hybrid Forecasting Models on the Basis of Combining Forecasts.
Vestnik TvGU, vol. 1(23), pp 316–331 (2014)
[7] Gorelik N A, Frenkel A A: Statistical problems of economic forecasting. Statistical methods of
analysis of economic dynamics, Vol 46, pp 9-48, Nauka, Moscow (1983).
[8] Kobzar A I: Applied Mathematical Statistics. Fizmatlit, Moscow (2006).
[9] Afanasyev V N, Yuzbashev M M: Time series analysis and forecasting. Finansy i statistika
Moscow (2001).
[10] Sadovnikova N A , Shmoylova R A: Time series analysis and forecasting. MGUESI, Moscow
(2001).
[11] CIF Homepage, http://irafm.osu.cz/cif/main.php, last accessed 2018/04/09.
[12] Afanasieva T, Yarushkina N, Zavarzin D, Guskov G, Romanov A: Time series forecasting
using combination of exponential models and fuzzy techniques. Advances in intelligent systems and
computing, Vol. 450, pp 41-50 (2016)
[13] Yarushkina N G, Afanasyeva T V, Perfilieva I G: Intelligent analysis of time series. ULSTU,
Ulyanovsk (2010).
[14] Yarushkina N G: Principles of the theory of fuzzy and hybrid systems. Finansy i statistika,
Moscow (2004).
[15] Ribina G V: Modern expert systems: trends towards integration and hybridization. Devices and
systems. Handling. Control. Diagnostics, vol. 8, pp 18-21 (2001)
[16] Averkin A N, Yarushev S A., Povidailo I S: Hybrid neural networks in time series forecasting
problems. National Research Nuclear University MEPhI (Moscow Engineering Physics Institute)
(2015).
[17] Yarushkina N G, Afanasyeva T V, Romanov A A, Timina I A: Extraction of knowledge about
the dependencies of time series for forecasting problems. Radiotekhnika, vol. 7, pp 141-146 (2014)

Acknowledgments
The author expresses his gratitude to my scientific adviser professor Yarushkina N.G. for valuable
advice in planning the research.




                                                                                                     526