=Paper=
{{Paper
|id=Vol-3091/paper01
|storemode=property
|title=Modeling of forecasts variance reduction at multiple time series prediction averaging with ARMA (1, q) functions
|pdfUrl=https://ceur-ws.org/Vol-3091/paper01.pdf
|volume=Vol-3091
|authors=Danila Musatov,Denis Petrusevich
}}
==Modeling of forecasts variance reduction at multiple time series prediction averaging with ARMA (1, q) functions==
Modeling of forecasts variance reduction at multiple time series
prediction averaging with ARMA (1, q) functions
Danila Musatov 1 and Denis Petrusevich 1
1
MIREA – Russian Technological University, Prospekt Vernadskogo, 78, Moscow, 119454, Russia
Abstract
Combination of time series forecasts is usually considered as good technique in practice. But
it has got weak theoretical explanation. In this research variance of time series forecasts and
variance of combined models are considered. One is interested in the view of variance of
forecasts function over one, two and three periods. Conditions which can lead to improvement
of averaged time series predictions are in scope of this research. In this paper a few examples
of the most popular time series models are observed: the moving average models MA(q), the
autoregressions AR(p) models and their combination in the form of ARIMA(p, d, q) or
ARMA(p, q) model. In particular, AR(1) and ARMA(1, q) are investigated. Nowadays there
are researches about time series averaging. Approaches based on bagging and boosting are
implemented very often in classification and regressions. It’s very appealing to use such
strategy in tine series modeling. At the same time it’s easy to construct learning set and test set
in classification tasks. But it’s a complex task in case of time series processing. There’s a need
of two sets: to train time series models and to construct their combinations. Thus nowadays
combination of time series models, combination of their forecasts or of their prediction
intervals are in scope of view of a few complex researches. In this paper we investigate
behaviour of time series predictions’ variance in order to have another useful approach in time
series prediction averaging. Russian macroeconomical time series statistics is used as
experimental time series.
Keywords
Time series forecasting, prediction averaging, ARIMA, forecast variance, information criteria
1. Introduction
Mathematical models used to predict certain value are often used in combinations. The most simple
combination function in case of time series processing is just averaging of all models’ predictions or
selection of the best one [1]. It’s often widely believed that if averaged models are “good” enough and
reflect some part of described process’ behaviour their averaging also leads to a0 “good” model.
Though, mathematical statement of this problem isn’t researched well. There are attempts to choose the
best model [1], to construct mean model [2], to implement bagging strategy to processed models of
certain time series [3, 4]. Selection of the best model is traditional way but in practice there’s always a
lot of models and there’s no mathematically strict way to choose the best one. Then researchers thought
that multiple models can describe various sorts of processed time series behaviour from various points
of view and thus their combination is better than selection of the best one [5-7]. Bagging [8] of time
series is very appealing but complex task. Usually time series models are built in two stages: there’s a
training set used to construct them and a test set to evaluate their quality. But in case of bagging one
needs three parts or training set should be subdivided in some parts to use them in construction of the
models and in evaluation of their combinations [9-11]. In this research combination of forecasts is
Proceedings of MIP Computing-V 2022: V International Scientific Workshop on Modeling, Information Processing and Computing,
January 25, 2022, Krasnoyarsk, Russia
EMAIL: danilamusatov20@mail.ru (Danila Musatov), petrdenis@mail.ru (Denis Petrusevich)
ORCID: 0000-0003-0673-5393 (Danila Musatov); ORCID: 0000-0001-5325-6198 (Denis Petrusevich)
© 2022 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
investigated from the prediction intervals point of view. Behaviour of prediction intervals is often out
of scope of research in a lot of ensemble models but this question is especially important in case of time
series forecasting [11-12]. In the paper ARIMA models are considered. Though some steps considering
combination of GARCH models (in case if time series is heteroscedastic) have already been done in
[7], combination of heterogeneous models are still in scope of future work. These models are the most
popular but sometimes frequential analysis is also a good tool [13, 14], so further analysis should
consider combinations of such models. Here we get variance of time series forecasts at 1, 2, 3 periods
of time and investigate its behaviour in case of model averaging.
2. Time series prediction intervals
In order to evaluate prediction interval one has to transform time series into MA( ) form. According
to Wald theorem [15] this transformation (also called psi-representation) can be implemented to any
stationary time series. If one treats ARIMA models, stationary process can be achieved with time series
differentiation [16]. Each process in such view can be expressed via infinite (in common case)
summation of moving averages MA(q). Coefficients j in this series are usually called psi-weights. .
Thus, the simplest case is, of course, handling moving average MA(q) models:
X t t 1 t 1 ... q t q .
Here today value of the X time series is expressed via white noises t q of order less than or equal to q
[16]. These models are already considered in the form when all coefficients are equal to psi-weights.
Thus, variance of these models’ predictions [16] is (1):
n 1 (1)
Var ( xˆn xn ) j ,
2 2
j 0
here xˆn is a predicted value at time n, xn is a real value of time series, is a standard error value (got
at learning procedure) and j j for MA(q) series. If the assumption of normally distributed errors
is hold, a 95% prediction interval of xˆn is [ xˆn 1.96 Var ( xˆn xn ), xˆn 1.96 Var ( xˆn xn )]. Here
variance can be found via psi-weights be means of (1).
Another significant part of time series models consists of the autoregression AR(p) models. In this
case today value of the X time series depends on its own past values of order less than or equal to p:
X t c 1 X t 1 ... q X t p .
Consideration of Wald’s theorem to autoregression processes AR(p) leads to (2):
0 1,
1 1 ,
(2)
2 1 1 2 ,
3 1 2 2 1 3 .
These expressions are recurrent and can be reinterpreted in term of summation (3):
k (3)
k
i k i .
i 1
And in common case handling ARIMA (p, d, q) models (or ARMA(p, q)) consideration of psi-
weights leads to (4):
0 1,
1 1 1 ,
(4)
2 1 1 2 2 ,
3 1 2 2 1 3 3 .
Thorough analysis of these expressions can be found in [16, 17].
In this research we consider AR(p) and ARMA(p, q) models with p<3. The main goal is to express
variance via terms of the models and to explain conditions at which averaging technique is going to
lead its improvement. Here only averaging of the models is taken into account. But the same approach
can be used in case of bagging and non-linear combinations of models. In further calculations and at
averaging stage of experiments all models are supposed to have the same model of seasonality because
this part is non-linear and its summation will lead to models with another complex seasonality.
2.1. Variance of the prediction made with AR(1) model
First of all, here AR(p) models are analyzed. Considering the simplest case of AR(1) model psi-
weights (2) are elements of geometric progression:
0 1,
1 1 ,
(5)
2 1 1 12 ,
k 1 k 1 1n .
Variance of its prediction at time n is proportional to sum of geometric progression with ratio equal
to 12 :
n 1 n 1
1 12 n (6)
VarAR (1) ( xˆn xn ) 2 j 2 2 (1 12 j ) 2 .
j 0 j 1 1 1
This sum (6) gets finite if | 1 | 1 even if n . Thus, in case of AR(1) process variance of the
forecast over infinite period can be finite number. Logically it’s close to the case of fluctuations with
descending amplitude (7).
1 (7)
lim VarAR (1) ( xˆn xn )
2
.
n 1 1
Variance of prediction over 1, 2 and 3 timesteps is presented in expressions (8):
VarAR (1) ( xˆ1 x1 ) 2 ,
VarAR (1) ( xˆ2 x2 ) 2 (1 12 ), (8)
VarAR (1) ( xˆ3 x3 ) 2 (1 12 14 ).
Variance of prediction over 1 timestep ahead depends only on the standard error of model under
investigation. Plots of these functions for predictions over 2 and 3 timesteps are presented in Figure 2,
Figure 3. They are bulging downward (all degrees are even and there are only plus signs in the
expression (8)). There’s only one minimum (which is global one) in case when 1 0 and its prediction
is also zero. If there are a few models their prediction variances can be various points at plots 1, 2. If
predictions of a few models AR(1) are averaged, one can describe it as averaging of the very models
(because the ARMA(p, q) model is linear). Thus, averaged model can be considered as a new ARMA(p,
q) model with the same orders p, q but different coefficients.
Variance of combined model is lower than variance of the source ones if one moves towards
minimum at plots 1, 2. If there are two models with variances situated at the same side from zero,
variance of their combination will be situated between them. So, in the case of two models one of them
is going to have higher variance and one of them is going to have lower variance than variance of the
combined model. At the same time the combined model is going to have lower level of variance if
variances of the source models are situated from different sides of zero.
Basically, this discourse is true in a case of a lot of models. The variance function (8) is bulging
downward and it has got only one minimum. So, if all models are marked as points at plots 1,2, variance
of averaged model is the same function of averaged parameter. Because variance function is bulging
downward, point marking variance of averaged model is situated under line connecting models with
extreme values of parameter 1 (minimum and maximum).
M max VarM ,
N
i
min VarM Var
i
i
i
(9)
N
here M i denotes enumeration of N models of AR(1) type.
Figure 1: Forecast variance of AR(1) model at 2 timesteps ahead
Figure 2: Forecast variance of AR(1) model at 3 timesteps ahead
But this combination gets the best model (with lower variance) only if equal numbers of models are
situated to the left of zero and to the right of it. This situation can be seen at Figure 3. Thus, averaging
leads to the best model (usually to a “good” model because there’s too much of them) only if all models
are divided into two equal parts: with negative value of 1 and positive ones. Basically, it means that
half of models predict that value of time series is going to be lower and the other ones predict tendency
to grow. Such situation can take place if investigated time series has got complex behaviuor of
researchers haven’t got enough data to make prediction. In case of “usual” situation there’s always some
model better than averaged one (with lower variance). If there’s no tool to choose the best model, the
averaged one is better than worse models. So, it can be used as another tool for prediction and it has got
“good” quality.
Figure 3: Variance of handled AR(1) models (shown with transparent dots) and variance of averaged
model (black dot)
2.2. Variance of the prediction made with ARMA(1, q) model
Psi-weights of ARMA(1, q) models form geometric progression starting from time q:
0 1,
1 1 1 ,
(10)
2 1 1 2 1 (1 1 ) 2 ,
q 1
1 q.
Thus, in case if order of moving average part is less than time of prediction ( q n ) there’s no
geometric progression. For example, considering variance of the ARMA(1, 2) model for 3 timesteps,
one has got:
2
VarARMA(1,2) ( xˆ3 x3 ) 2 j 2 2 1 (1 1 )2 (1 (1 1 ) 2 ) 2 . (11)
j 0
For q>n time series variance includes sum of geometric progression starting from time q:
n 1
VarARMA(1,q ) ( xˆn xn ) 2 j 2
j 0
(12)
1 12 n
1 (1 1 ) 2 (1 (1 1 ) 2 )2 ... q 12 q 2
2
.
1 1
Here, the last term denotes sum of the progression starting at q-th psi-function.
2.3. Variance of the prediction made with ARMA(1, 1) model
Variances of predictions over 1, 2 and 3 timesteps of ARMA(1, 1) model are shown at (13):
VarARMA (1,1) ( xˆ1 x1 ) 2 , (13)
VarARMA (1,1) ( xˆ2 x2 ) 2 1 (1 1 ) 2 ,
VarARMA (1,1) ( xˆ3 x3 ) 2 1 (1 1 ) 2 12 (1 1 ) 2 .
First of all, one considers variance of prediction over 2 timesteps ahead. Extreme value of this
function is situated at the point where first derivatives are zeros:
(VarARMA (1,1) ( xˆ2 x2 )) |1 (VarARMA (1,1) ( xˆ2 x2 )) |1 2 2 (1 1 ) 0. (14)
It happens if 1 1 , predictions of this model ( X t c 1 X t 1 1 t 1 ) and its variance are
minimal (it can be seen at plot 4). To check whether this is minimum, one uses the second derivatives:
(Var ( xˆ2 x2 )) | 2 (Var
ARMA (1,1) 1
( xˆ2 x2 )) | 2 (Var
ARMA (1,1)
( xˆ2 x2 )) | 2 2 . (15)
1 ARMA (1,1) 1 1
Hessian of variance is zero:
2 2 2 2
H ARMA(1,1) ( xˆ2 x2 ) 0. (16)
2 2 2 2
1 2 2 , 2 0. One can’t make definite conclusions by means of Sylvester’s theorem. It means
that the function hasn’t got one minimum. But taking a glance at Figure 4 one can see that minimum is
achieved at models with 1 1. At the same time this function is also bulging downward and results
for the AR(1) are also true in this case.
Using the same approach we’ve considered variance of prediction in three timesteps. Extreme values
of variance are obtained equalizing first derivatives to zero:
(Var ( xˆ x )) | 2 2 ( )(2 2 1) 0,
ARMA (1,1) 3 3 1 1 1 1
(17)
1 1
(VarARMA (1,1) ( xˆ3 x3 )) |1 2 2 (1 1 )(12 1) 0.
Here one can see the same result as in the case of prediction over two timesteps. The second equation
in (17) has got solution 1 1.
The second derivatives for prediction of ARMA(1, 1) model over three timesteps are:
(Var ( xˆ3 x3 )) | 2 2 2 (6(1 1 ) 12 1),
ARMA (1,1) 1
(18)
(VarARMA (1,1) ( xˆ3 x3 )) | 2 2 2 (12 1),
1
(VarARMA (1,1) ( xˆ3 x3 )) |11 2 2 (312 211 1).
In case if 1 1 , all derivatives in (18) are equal to 2 2 (12 1) .
Hessian has got the form:
2 2 (6(1 1 ) 12 1) 2 2 (312 211 1) (19)
H ARMA(1,1) ( xˆ3 x3 ) .
2 2 (312 211 1) 2 2 (12 1)
Hessian 2 and minor 1 have got complicated form. But in the case of extreme
1 1 : 1 2 2 (12 1) 0, 2 0. One can’t confirm that case of 1 1 leads to the model
with minimal variance. But it’s clearly seen at the Figure 5.
Figure 4: Forecast variance of ARMA(1, 1) model at 2 timesteps ahead
Figure 5: Forecast variance of ARMA (1, 1) model at 3 timesteps ahead (13)
2.4. Variance of the prediction made with ARMA(1, 2) model
Variances of predictions over 1, 2 and 3 timesteps of ARMA(1, 1) model are shown at (20):
VarARMA (1,2 ) ( xˆ1 x1 ) 2 , (20)
VarARMA (1,2 ) ( xˆ2 x2 ) 2 1 (1 1 ) 2 ,
VarARMA (1,2 ) ( xˆ3 x3 ) 2 1 (1 1 ) 2 (1 (1 1 ) 2 2 ) 2 .
Variance of forecast over 1 and 2 timesteps are the same as in the case of ARMA(1,1) model (13).
So, all calculations and conclusions made there remain true. The same result is repeated in case of
ARMA(1, q) models, q>2. For predictions over 1, 2 and 3 timesteps variance functions will be the same
as in (20). Differences are expected only for forecasts over more timesteps. In case of prediction over
3 timesteps extreme point is situated in the place where:
(VarARMA (1,2 ) ( xˆ3 x3 )) |1 2 2 1 1 (1 (1 1 ) 2 )(21 1 ) 0,
(VarARMA (1,2 ) ( xˆ3 x3 )) |1 2 2 1 1 21 (1 (1 1 ) 2 ) 0, (21)
(VarARMA (1,2 ) ( xˆ3 x3 )) |2 2 2 1 (1 1 ) 2 0
Substituting the last equation into the first one in the system (21), one gets 1 1 and after that
from the last equation one has got 2 0. So, structure of the model that has got minimal variance of
prediction is the same as in cases of ARMA(1, 1) and of ARMA(1, 2) but for prediction at 2 timesteps.
Determinant of hessian gets very large and here only values of the second derivatives are shown in
expression (22):
1
( xˆ x )) | 2 2 1 2 ( ) 2 ,
2 (22)
(VarARMA (1,2 ) 3 3 12 1 1 2 1 1
(VarARMA (1,2 ) ( xˆ3 x3 )) | 2 2 2 (12 1),
1
(VarARMA (1,2 ) ( xˆ3 x3 )) | 2 2 2 ,
2
(VarARMA (1,2 ) ( xˆ3 x3 )) |11 2 2 1 1 21 1 1 (1 1 ) 2 ,
(VarARMA (1,2 ) ( xˆ3 x3 )) |12 2 2 21 1 ,
(VarARMA (1,2 ) ( xˆ3 x3 )) |12 2 21.
Structure of hessian after simplification with summation of rows and columns with appropriate
multipliers is presented in (23):
1 (1 1 ) 2
2
H ARMA(1,2) ( xˆ3 x3 ) 6 2 . (23)
2
It’s positive except solutions of the equations (21). At extreme points it’s zero. This result is close
to the previous cases. There’s a hyperplane at which variation is minimal: 1 1 , 2 0. For models
ARMA(1, q), q>2, variance of predictions over 1, 2, 3 timesteps has got the same form as in cases
presented above. Differences are going to appear only in predictions over more timesteps.
3. Experiments
The Dynamic series of macroeconomic statistics of the Russian Federation (monthly wage index
and monthly income index) [18] have been handled in the experimental part. Last 12 values of the time
series data have been used as test while the first 300 values were used to train models. In previous part
only ARIMA(1, d, q) models have been handled. So, here they are used as parts of combined model.
At the same time the best model by value of information criteria [16, 17] is also highlighted. In order
to compare forecasts RMSE and MAE metrics of errors are used:
( (t ) ts(t )) 2 (24)
RMSE t
,
N
| (t ) ts(t ) |
MAE t
.
N
Here N is length of test period, τ(t) denotes predicted values of the processed time series, ts(t) is a
part of the investigated time series at the test period (real data).
Two experiments have been performed. In the first one 200 values were used to train models and 12
values to test them. Order of moving average part was limited with 3. Order of the autoregression part
was set to 1. One can see its results in the Table 1. The best model (by value of information criteria) is
ARIMA(0, 1, 1) one. It’s marked with bold font in the table.
Table 1
The ARIMA (1, d, q), q<4, models of the wage index
ARIMA(p, d, q) models Akaike RMSE of MAE of
information forecast forecast
criterion
ARIMA(1, 1, 0) 1518.43 18.68 12.41
ARIMA(1, 1, 1) 1496.72 17.94 11.62
ARIMA(1, 1, 2) 1498.41 17.91 11.60
ARIMA(1, 1, 3) 1495.42 17.75 11.46
ARIMA(0, 1, 1) 1494.6 15.14 9.21
Combined model 1495.73 17.75 11.46
Among four tested models there are signs of terms: 1 0 one time and 1 0 3 times, 1 0 one
time and 1 0 2 times (for ARIMA(1, 1, 0) it’s zero), there’s one positive 2 term and one negative.
So, theoretically averaging is expected to achieve “good” results. One can see that combined model
looks very close to the ARIMA(1, 1, 3) model and it’s better than other ones (of 1st order) by value of
Akaike information criterion and by quality of forecast at tested period. Especially it exceeds results of
the worst model ARIMA(1, 1, 0) . The best model has got different orders and wasn’t used in
combination but its results are close to the combined model and to the ARIMA(1, 1, 3) one.
In the second experiment models ARIMA(1, d, q), q<6, were used. Length of the training set was
300 values. Volume of the training set was also 12. Its results are shown in the Table 2.
Table 2
The ARIMA (1, d, q), q<6, models of the wage index
ARIMA(p, d, q) models Akaike RMSE of MAE of
information forecast forecast
criterion
ARIMA(1, 1, 0) 942.44 37.59 36.50
ARIMA(1, 1, 1) 911.05 25.79 14.54
ARIMA(1, 1, 2) 901.17 22.11 14.20
ARIMA(1, 1, 3) 901.51 22.24 14.16
ARIMA(1, 1, 4) 903.51 22.23 14.14
ARIMA(1, 1, 5) 905.48 22.47 14.13
Combined model 906.7 22.47 14.13
Here one can see that the combined model is better than the worst ones but is worse than the best
models. In this case almost all terms in various models have got the same signs. That’s why averaging
doesn’t give results close to the best ones. But it’s far better than the worst models used in combination.
The same experiment as presented in the Table 1 has been done in case the income index [18].
Among four tested models with orders q<4 there are signs of terms: 1 0 2 times and 1 0 also
2 times, 1 0 3 times (for ARIMA(1, 1, 0) it’s zero), 2 0 for the both models with non-zero MA(2)
term. Here, we expect that averaging is going to be good practice because signs of AR(1) part vary.
Results are shown in the Table 3.
Table 3
The ARIMA (1, d, q), q<4, models of the income index
ARIMA(p, d, q) models Akaike RMSE of MAE of
information forecast forecast
criterion
ARIMA(1, 1, 0) 1742.29 26.08 15.64
ARIMA(1, 1, 1) 1685.3 27.02 16.61
ARIMA(1, 1, 2) 1676.16 28.13 18.21
ARIMA(1, 1, 3) 1678.15 28.13 18.21
ARIMA(2, 1, 2) 1678.15 28.13 18.21
Combined model 1678.15 28.13 18.21
Here four models (including the best one by information criteria values and the combined model)
make almost the same predictions by quality. At the same time models with q<2 make better predictions
but they’ve got worse information criteria values. It can be explained with overfitting to the learn data.
Thus, researcher can have one more model that’s close by quality to some number of the best ones. It
can be used to analyze signs of terms where they vary in implemented models.
In the last experiment models ARIMA(1, d, q), q<6, were used. Length of the training set (income
index) was 300 values. Volume of the training set was also 12. Its results are shown in the Table 4.
Table 4
The ARIMA (1, d, q), q<6, models of the income index
ARIMA(p, d, q) models Akaike RMSE of MAE of
information forecast forecast
criterion
ARIMA(1, 1, 0) 1012.57 29.16 15.28
ARIMA(1, 1, 1) 976.78 28.91 16.49
ARIMA(1, 1, 2) 956.53 29.60 14.66
ARIMA(1, 1, 3) 958.36 29.89 14.58
ARIMA(1, 1, 4) 903.51 29.34 14.74
ARIMA(1, 1, 5) 959.62 29.44 14.74
ARIMA(0, 0, 1) 973.22 29.60 20.84
Combined model 962.75 29.44 14.74
There are a few models better than the best one (chosen by values of combination of information
criteria). Again in case of a lot of models the combined model shows results which are better than some
models (the worst ones) and worse than some of them (the best ones by quality of prediction).
Also, again one can see that parameters of combined models are close to traits of the model with the
highest order of moving average part. Various approaches to averaging technique have also been tested
in [2].
4. Conclusion
In this research averaging of predictions of ARIMA(p, d, q) time series models is investigated. To
describe variance of various time series models it has been expressed in form of psi-weights for AR(1)
and ARMA(1, q) models. It should also be mentioned that MA(q) series have already got appropriate
form and their coefficients are equal to psi-weights. Having got explicit form of prediction variance
over 1, 2, 3 periods of time it’s possible to find models with the lowest variance and to evaluate whether
averaging leads to improvement of prediction. Models AR(1), ARMA(1, 1) and ARMA(1, 2) have been
considered. Models ARMA(1, q), q>2, have got the same variance of prediction as in considered cases.
Differences in their structure are going to appear only for predictions over more timesteps.
If there‘s a few models averaged one can be the best. It happens when models predict various
behaviour of investigated time series. In common case averaged model has got variance better than the
worst model by variance and worse than the best one (expression (9)). These results are based on the
form of variance of prediction function. It’s shown that for models ARMA(1, q) it’s bulging downward.
Thus, combined models are going to give the best results or close to the best ones in case when there
are equal number of terms with various signs (terms with large modules are the most important because
they’ve got larger weights in averaging). This situation may take place if investigated time series is
difficult for analysis and forecasting and models at hand predict various behaviour (in case of models
with low orders; if orders increase, some terms in models have got various signs).
Time series from the Russian macroeconomical statistics [18] have been used as test data for
computational experiments. Combined models are always better than the worst models. So, they can be
used as another tool of analysis. In some cases results of combined models are close to results of the
best ones.
Nowadays this theme is very important and there are papers on averaging and bagging of time series
predictions, bagging with use of non-linear functions of time series terms [3, 4, 9-11,19-21]. In future
work bagging of time series and averaging of ARMA(p, q) models with higher orders of autoregression
parts (p>1) are going to be investigated.
5. References
[1] P. Hansen, A. Lunde , J. Nason, Model confidence sets for forecasting models, Econometrica 79.2
(2005) 453-497.
[2] D. Petrusevich, Improvement of time series forecasting quality by means of multiple models
prediction averaging, in: E. Semenkin, I. Kovalev (eds.), Proceedings of the III International
Workshop on Modeling, Information Processing and Computing (MIP: Computing-2021),Volume
2899, Krasnoyarsk, Russia, 2021, pp. 109-117. doi: 10.47813/dnit-mip3/2021-2899-109-117.
[3] W. Chen, H. Xu, Z. Chen, M. Jiang, A novel method for time series prediction based on error
decomposition and nonlinear combination of forecasters, Neurocomputing 426 (2021) 85-103. doi:
10.1016/j.neucom.2020.10.048.
[4] K. Chen, Y. Peng, S. Lu, B. Lin, X. Li, Bagging based ensemble learning approaches for modeling
the emission of PCDD/Fs from municipal solid waste incinerators, Chemosphere 274 (2021)
129802. doi: 10.1016/j.chemosphere.2021.129802.
[5] R. J. Hyndman, R. A. Ahmed, G. Athanasopoulos, H. L. Shang, Optimal combination forecasts
for hierarchical time series, Computational Statistics & Data Analysis 55.9 (2011) 2579-2589. doi:
10.1016/j.csda.2011.03.006.
[6] N. Shafik, G. Tutz, Boosting nonlinear additive autoregressive time series, Computational
Statistics & Data Analysis 53.7 (2009) 2453-2464 doi: 10.1016/j.csda.2008.12.006.
[7] J. M. Matías, M. Febrero-Bande, W. González-Manteiga, J.C. Reboredo, Boosting GARCH and
neural networks for the prediction of heteroskedastic time series, Mathematical and Computer
Modelling 51.3-4 (2010) 256-271. doi: 10.1016/j.mcm.2009.08.013.
[8] L. Breiman, Random forests, Machine Learning 45 (2021) 5-32. doi: 10.1023/A:1010933404324
[9] M. H. Dal Molin Ribeiro, L. dos Santos Coelho, Ensemble approach based on bagging, boosting
and stacking for short-term prediction in agribusiness time series, Applied Soft Computing 86
(2020) 105837. doi: 10.1016/j.asoc.2019.105837.
[10] F. Petropoulos, R.J. Hyndman, C. Bergmeir, Exploring the sources of uncertainty: Why does
bagging for time series forecasting work?, European Journal of Operational Research 268.2 (2018)
545-554. doi: 10.1016/j.ejor.2018.01.045.
[11] E. Meira, F. L. C. Oliveira, J. Jeon, Treating and Pruning: New approaches to forecasting model
selection and combination using prediction intervals, International Journal of Forecasting 37.2
(2021) 547-568. doi: 10.1016/j.ijforecast.2020.07.005.
[12] S. Pellegrini, E. Ruiz, A. Espasa, Prediction intervals in conditionally heteroscedastic time series
with stochastic components, International Journal of Forecasting 27.2 (2011) 308-319. doi:
10.1016/j.ijforecast.2010.05.007.
[13] K. A. Boikov, M. S. Kostin, G. V. Kulikov, Radiosensory diagnostics of signal integrity in-circuit
and peripheral architecture of microprocessor devices, Russian Technological Journal (In Russ.)
9.4 (2021) 20-27. doi: 10.32362/2500-316X-2021-9-4-20-27.
[14] N. M. Legkiy, N. V. Mikheev, Selection of location of radiators in a non-equivident antenna
array, Russian Technological Journal (In Russ.) 8(6) (2020) 54-62. (In Russ.) doi: 10.32362/2500-
316X-2020-8-6-54-62.
[15] H. Wald, A study in the analysis of stationary time series, 2nd edition, Almqvist and Wiksell Book
Co., Uppsala, 1954.
[16] R. J. Hyndman, G. Athanasopoulos, Forecasting: principles and practice, 2nd edition, OTexts,
Melbourne, Australia, 2018.
[17] J. H. Stock, M.W. Watson, Introduction to Econometrics, Pearson, 2019.
[18] Dynamic series of macroeconomic statistics of the Russian Federation. Wage index, income index,
2021. URL: http://sophist.hse.ru/hse/nindex.shtml
[19] S. F. Stefenon, M. H. Dal Molin Ribeiro, A. Nied, K.-C. Yow, V .C. Mariani, L. dos Santos Coelho,
L.O. Seman, Time series forecasting using ensemble learning methods for emergency prevention
in hydroelectric power plants with dam, Electric Power Systems Research 202 (2022) 107584. doi:
1016/j.epsr.2021.107584.
[20] M. Larrea, A. Porto, E. Irigoyen, A.J. Barragán, J.M. Andújar, Extreme learning machine ensemble
model for time series forecasting boosted by PSO: Application to an electric consumption problem,
Neurocomputing, 452 (2021) 465-472. doi: 10.1016/j.neucom.2019.12.140.
[21] R. Godahewa, K. Bandara, G. I. Webb, S. Smyl, C. Bergmeir, Ensembles of localised models for
time series forecasting, Knowledge-Based Systems 233 (2021) 107518. doi:
10.1016/j.knosys.2021.107518.