The Use of The Kolmogorov–Wiener Filter for Prediction of
Heavy-Tail Stationary Processes
Vyacheslav Gorev, Alexander Gusev and Valerii Korniienko
Dnipro University of Technology, 19 Dmytra Yavornytskoho Ave., 49005 Dnipro , Ukraine


                 Abstract
                 We investigate the possibility of the practical use of the Kolmogorov–Wiener filter for the
                 prediction of a heavy-tail stationary random process. A discrete process and a discrete filter
                 are considered. Nowadays telecommunication traffic in telecommunication systems with data
                 packet transfer is considered to be a heavy-tail random process, so the problem under
                 consideration may be applied to the prediction of telecommunication traffic, which may be
                 important, for example, for the prevention of network congestion, for the maximization of the
                 network utilization rate and for cyber security, because a comparison of the actual traffic with
                 the predicted one may help to detect cyber-attacks. There are a lot of different and rather
                 sophisticated approaches to traffic prediction, for example, the ARIMA approach, neural
                 network approaches and so on, which may be applicable to the prediction of a non-stationary
                 traffic in various cases. However, in the rather simple case of a stationary telecommunication
                 traffic, more simple approaches may be applied. For example, such a simple prediction
                 approach as the Kolmogorov–Wiener filter is not sufficiently developed in the literature. In
                 this paper it is shown that if a stationary heavy-tail random process is smooth enough, then
                 the Kolmogorov–Wiener filter may be used for its practical prediction. The obtained results
                 may be taken into account for practical telecommunication traffic prediction in
                 telecommunication systems with data packet transfer.

                 Keywords 1
                 Kolmogorov–Wiener filter, prediction, heavy-tail stationary random process, power-law
                 correlation function, telecommunication traffic

1. Introduction and related works
    The problem of telecommunication traffic prediction is important for telecommunications. For
example, it is important for the prevention of network congestion and for the maximization of the
network utilization rate [1]; it is significant for understanding future market dynamics and reducing
the decision risks [2]. The telecommunication traffic prediction is also important for cyber security [3]
because the comparison of the actual traffic with the predicted one may help to detect cyber-attacks.
     There are a lot of different approaches to traffic prediction. For example, the following ones can
be indicated: Auto Regressive Integrated Moving Average (ARIMA), Markov Modulated Poisson
Process models (MMPP), Kalman filtering, Seasonal ARIMA (SA), a neural network approach
(including deep neural networks [4]), wavelet transforms [1], the least-squares support vector machine
(LSSVM), gray models [2], Holt-Winters models [3]. Of course, rather complicated approaches
should be used for non-stationary randomly fluctuating traffic prediction. But if the traffic is
stationary and rather smooth, sophisticated approaches may not be needed. For example, in [2] some
methods are presented for a description of rather simple cases. In [2] it is stressed that in stationary


IntelITSIS’2022: 3rd International Workshop on Intelligent Information Technologies and Systems of Information Security, March 23–25,
2022, Khmelnytskyi, Ukraine
EMAIL: lordjainor@gmail.com (V. Gorev); gusev1950@ukr.net (A. Gusev); vikor7@ukr.net (V. Korniienko)
ORCID: 0000-0002-9528-9497 (V. Gorev); 0000-0002-0548-728X (A. Gusev); 0000-0002-0800-3359 (V. Korniienko)
            ©️ 2021 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)
cases the ARMA approach may be used too, and in the case of a smooth monotone process the gray
model may be applied.
    As is known [5], such a simple filter as the Kolmogorov–Wiener one may be used for the
prediction of stationary random processes. However, as far as we know, such an approach is not
sufficiently developed in the literature for traffic prediction even for rather simple cases. The
Kolmogorov–Wiener filter is widely used for signal extraction in different fields of knowledge [6]. It
is widely used in econometric analyses [7, 8] and in image restoration [9]. The theoretical
fundamentals of the Kolmogorov–Wiener filter for continuous telecommunication traffic prediction
are developed in our recent paper [10]. The paper [10] is dedicated to the solution of the Wiener–Hopf
integral equation in the unknown filter weight function for two telecommunication traffic models: the
power-law structure function model and the model of fractional Gaussian noise; the solutions based
on the truncated polynomial expansion method and the truncated trigonometric Fourier series method
are obtained.
    However, the possibility of using the Kolmogorov–Wiener filter for practical traffic prediction is
still under question. The aim of this work is to show that the Kolmogorov–Wiener filter may be
applicable to traffic prediction if the traffic is stationary and smooth enough. As is known [11, 12], the
telecommunication traffic in systems with data packet transfer is considered to be a self-similar
heavy-tail random process. So, if we show that the Kolmogorov–Wiener filter is applicable to the
prediction of simulated data of a stationary random self-similar heavy-tail process, then we will be
able to conclude that it may be applied to practical telecommunication traffic prediction. In this paper
we restrict ourselves to the investigation of a discrete process and a discrete filter. The corresponding
simulated data may be generated via the symmetric moving average approach [13], the generated
process is in fact similar to the fractional Gaussian noise process, which may describe
telecommunication traffic, see [14].
    The paper is organized as follows. In Sec. 1 the introduction and the literature review are given. In
Sec. 2 the discrete Kolmogorov–Wiener filter and the symmetric moving average approach for
obtaining simulated stationary heavy-tail data are described. In Sec. 3 heavy-tail simulated data are
obtained. In Sec. 4 the prediction results are described, and in Sec. 5 conclusions are made.

2. Description of the discrete Kolmogorov–Wiener filter and of the method
   of generation of heavy-tail simulated data

    Let the filter input 𝑥𝑡 be a stationary random process which is the sum of the signal 𝑠𝑡 and the
noise 𝑛𝑡 :
                                            𝑥′𝑡 = 𝑠𝑡 + 𝑛𝑡 .                                     (1)
    The Kolmogorov–Wiener filter output 𝑦𝑡 should be «the closest» to the value 𝑠𝑡+𝑧 where 𝑧 is the
number of points for which the prediction is made, so we have the following requirement:
                                       〈(𝑦𝑡 − 𝑠𝑡+𝑧 )2 〉 → min.                                  (2)
    The correlation function 𝑅𝑥′ (𝑡) of the filter input 𝑥′𝑡 and the cross-correlation function of the
processes 𝑠𝑡 and 𝑥′𝑡 𝑅𝑠𝑥′ (𝑡) are considered to be given. The Kolmogorov–Wiener filter is considered
to be a linear one, so the filter output is expressed in terms of the filter input as follows:
                                              𝑇

                                       𝑦𝑡 = ∑ ℎ𝑖 𝑥′𝑡−𝑖                                             (3)
                                             𝑖=0
where ℎ𝑖 are the unknown filter weight coefficients and the input data are given for 𝑡 = 0,1,2, . . , 𝑇.
                                                                2 〉
The coefficients ℎ𝑖 should minimize expression (2). The term 〈𝑠𝑡+𝑧  is a constant and does not depend
on the weight coefficients ℎ𝑖 , so (2) can be rewritten as
                                    〈𝑦𝑡2 〉 − 2〈𝑦𝑡 𝑠𝑡+𝑧 〉 → min,                                 (4)
which in view of (3) gives
            𝑇                          𝑇

           ∑ ℎ𝑖 ℎ𝑗 〈𝑥′𝑡−𝑖 𝑥′𝑡−𝑗 〉 − 2 ∑ ℎ𝑖 〈𝑥′𝑡−𝑖 𝑠𝑡+𝑧 〉 = 𝑓(ℎ0 , ℎ1 , … , ℎ 𝑇 ) → min.            (5)
           𝑖,𝑗=0                      𝑖=0
   With account for the facts that
                                    〈𝑥′𝑡−𝑖 𝑥′𝑡−𝑗 〉 = 𝑅𝑥′ (𝑖 − 𝑗)                                  (6)
and
                                    〈𝑥′𝑡−𝑖 𝑠𝑡+𝑧 〉 = 𝑅𝑠𝑥′ (𝑖 + 𝑧)                                  (7)
one can finally write
             𝑇                         𝑇

            ∑ ℎ𝑖 ℎ𝑗 𝑅𝑥′ (𝑖 − 𝑗) − 2 ∑ ℎ𝑖 𝑅𝑠𝑥′ (𝑖 + 𝑧) = 𝑓(ℎ0 , ℎ1 , … , ℎ 𝑇 ) → min.              (8)
           𝑖,𝑗=0                      𝑖=0
   The function 𝑓(ℎ0 , ℎ1 , … , ℎ 𝑇 ) is a quadratic one, and thus it has one minimum, which is described
by the conditions
                           𝜕𝑓(ℎ0 , ℎ1 , … , ℎ 𝑇 )
                                                  = 0; 𝑘 = 0,1,2, … , 𝑇.                           (9)
                                     𝜕ℎ𝑘
These conditions with account for the evenness of the correlation function and the fact that
                                        𝜕ℎ𝑖           1, 𝑖 = 𝑗
                                            = 𝛿𝑖𝑗 = {                                             (10)
                                        𝜕ℎ𝑗           0, 𝑖 ≠ 𝑗
lead to
                         𝑇

                        ∑ ℎ𝑖 𝑅𝑥′ (𝑖 − 𝑘) = 𝑅𝑠𝑥′ (𝑘 + 𝑧); 𝑘 = 0,1,2, … , 𝑇,                       (11)
                        𝑖=0
which is a set of linear equations in the unknown coefficients ℎ𝑖 . In matrix form, this set may be
presented as
                                           𝑅𝑥′ ∙ ℎ = 𝑅𝑠𝑥′                                       (12)
where
                        𝑅𝑥′ (0)       𝑅𝑥′ (1)        𝑅𝑥′ (2)      ⋯    𝑅𝑥′ (𝑇)
                        𝑅𝑥′ (1)       𝑅𝑥′ (0)        𝑅𝑥′ (1)      ⋯  𝑅𝑥′ (𝑇 − 1)
                𝑅𝑥′ = 𝑅𝑥′ (2)         𝑅𝑥′ (1)        𝑅𝑥′ (0)      ⋯  𝑅𝑥′ (𝑇 − 2)                (13)
                            ⋮            ⋮              ⋮          ⋱       ⋮
                                                                  ⋯
                      (𝑅𝑥′ (𝑇) 𝑅𝑥′ (𝑇 − 1) 𝑅𝑥′ (𝑇 − 2)                 𝑅𝑥′ (0) )
is the correlation matrix [5], ℎ is the vector column of the unknown weight coefficients, and 𝑅𝑠𝑥 is the
vector column of the free terms:
                                     ℎ0                   𝑅𝑠𝑥′ (𝑧)
                                     ℎ1                𝑅𝑠𝑥′ (𝑧 + 1)
                              ℎ = ℎ2 , 𝑅𝑠𝑥′ = 𝑅𝑠𝑥′ (𝑧 + 2) .                                    (14)
                                      ⋮                      ⋮
                                   (ℎ 𝑇 )            (𝑅𝑠𝑥′ (𝑧 + 𝑇))
So, the vector column ℎ may be found as
                                                 −1                                             (15)
                                          ℎ = 𝑅𝑥′   ∙ 𝑅𝑠𝑥′ .
Then the filter output may be obtained by formula (3).
       It should be noticed that all the above-mentioned calculations are described in [6]. The
Kolmogorov–Wiener filter may be used both for the extraction of a signal form the sum of a signal
and a noise and for the signal prediction. In the case where the input signal is non-noisy, the
Kolmogorov–Wiener filter may be used for the prediction of the stationary process given at the filter
input. In the non-noisy case, the filter weight coefficients are given by formula (15) with account for
the fact that
                𝑅𝑠𝑥′ = (𝑅𝑥′ (𝑧) 𝑅𝑥′ (𝑧 + 1) 𝑅𝑥′ (𝑧 + 2) … 𝑅𝑥′ (𝑧 + 𝑇))𝑇 .                       (16)
       Now let us describe the method of the generation of heavy-tail simulated data, which is used in
the paper. In this paper we use the symmetric moving average approach, which is described in detail
in [13]. Such an approach was chosen because of its simplicity.
       Let 𝑉𝑡 be a stationary white noise process with an average value equal to zero and a variance
equal to 1. Then a heavy-tail process 𝑋𝑖 similar to the fractional Gaussian noise may be generated as
follows [13]:
                             𝑞

                   𝑋𝑖 = ∑ 𝑎|𝑗| 𝑉𝑖+𝑗 = 𝑎𝑞 𝑉𝑖−𝑞 + 𝑎𝑞−1 𝑉𝑖−𝑞+1 + ⋯ + 𝑎𝑞 𝑉𝑖+𝑞 ,                      (17)
                         𝑗=−𝑞
theoretically, 𝑞 should be infinite; in practical calculation it may be a rather large, but finite number;
and the coefficients 𝑎𝑗 are as follows:
                                            √(2 − 2𝐻)𝛾0                                             (18)
                                       𝑎0 =
                                               1.5 − 𝐻
and
                              𝑎0
                       𝑎𝑗 = ((𝑗 + 1)𝐻+0.5 + (𝑗 − 1)𝐻+0.5 − 2𝑗𝐻+0.5 ),                               (19)
                              2
here, 𝛾0 is the variance and 𝐻 is the Hurst exponent of the process 𝑋𝑖 . The number 𝑞 may be very
large, it is estimated as follows [13]:
                                                               1
                                           𝐻 2 − 0.25 1.5−𝐻
                             𝑞 ≥ max (𝑚, (           )      )                                     (20)
                                               2𝛽
where 𝑚 is the number of correlation function points of the process 𝑋𝑖 which should be obtained and
a small number 𝛽 is in fact the given accuracy of the coefficient 𝑎𝑗 in (17); the values 𝑎𝑗>𝑞 should
be less than 𝛽𝑎0 . The accuracy of this method depends on 𝑞, and the method is not exact even in the
case where 𝑞 → ∞. However, for a rather large 𝑞 the method may lead to good practical results [13].

3. The generation of non-smooth and smooth heavy-tail simulated data

    106 points of the white noise process 𝑉𝑡 with an average value equal to 0 and a variance equal to 1
are generated on the basis of the generator built in the Wolfram Mathematica package. The following
parameters were chosen:
                       𝑚 = 105 ,     𝛽 = 10−4 , 𝐻 = 0.8, 𝛾0 = 1.                               (21)
The corresponding number 𝑞 = 3 ∙ 10 is chosen. In fact, the inequality (21) holds even for 𝑞 = 105,
                                       5

the value 𝑞 = 3 ∙ 105 was chosen for a higher accuracy. On the basis of the idea (17)–(19), 105 points
of the process 𝑋𝑖 were generated as follows:
                                            𝑞

                                    𝑋𝑖 = ∑ 𝑎|𝑗| 𝑉𝑖+𝑗+𝑞 ,                                          (22)
                                          𝑗=−𝑞
in fact, the quantities 𝑉𝑖+𝑗+𝑞 and 𝑉𝑖+𝑗 are independent because 𝑉𝑡 is the white noise, no matter
whether formula (17) or formula (22) is used; formula (22) is chosen in order to avoid indices beyond
the array 𝑉𝑖 bounds. The coefficients 𝑎𝑗 are calculated on the basis of (19).
    The average value of 𝑋𝑖 is close to zero. We have to construct simulated data that may describe
telecommunication traffic, which is obviously non-negative. So we build the array 𝑥𝑖 as follows:
                                 𝑥𝑖 = 𝑋𝑖 + |min(𝑋)| + 10−3 ,                                   (23)
                       -3
a small summand 10 is added in order to avoid obtaining an infinite value of the prediction mean
average percentage error (MAPE). The process 𝑥𝑖 is a non-negative random stationary heavy-tail
process; its graph is given in Fig. 1.
    Let us make sure that the generated process 𝑥𝑖 is a heavy-tail one. Let us describe the
corresponding centralized process 𝑥𝑐𝑖 :
                                        𝑥𝑐𝑖 = 𝑥𝑖 − 〈𝑥〉                                         (24)
where the average value 〈𝑥〉 is
                                                  105
                                            1
                                      〈𝑥〉 = 5 ∑ 𝑥𝑖 ,                                              (25)
                                           10
                                                  𝑗=1
here we take into account the fact that the number of points of the generated array 𝑥𝑖 is equal to 105.
The correlation function of the process 𝑥𝑐𝑖 is built as follows:
                                                        105 −𝜏
                                               1
                    𝑅𝑥 (𝜏) = 〈𝑥𝑐𝑖 ∙ 𝑥𝑐𝑖+𝜏 〉 = 5     ∑ (𝑥𝑐𝑖 ∙ 𝑥𝑐𝑖+𝜏 ).                             (26)
                                             10 − 𝜏
                                                         𝑖=1
The corresponding correlation function and its least-square fit are given in Fig.2.
Figure 1: The values of the simulated non-smooth heavy-tail non-negative random process


Figure 2: The correlation function of the simulated non-smooth heavy-tail random process and its
least-square power-law fit; t≥1.

The corresponding least-square fit is sought as
                                         𝑅fit (𝑡) = 𝑎 ∙ 𝑡 𝑏 ,                                     (27)
The following numerical coefficients were obtained:
                                   𝑎 = 0.39,        𝑏 = −0.44,                                    (28)
here, the coefficients are rounded off to two significant digits. So,
                                     𝑅fit (𝑡) = 0.39 ∙ 𝑡 −0.44 ,                                  (29)
and on the basis of formula (29) and Fig.2 one can conclude that the correlation function exhibits a
power law decay rather than an exponential one. So, indeed, the generated process is a heavy-tail one.
    It should also be noticed that according to [13] the following property should be valid for ≥ 1 :
                                           𝑅𝑥 (𝑡)~𝑡 2𝐻−2,                                         (30)
so, according to the least-square fit
                                        2𝐻 − 2 = −0.44,                                           (31)
which leads to
                                             𝐻 = 0.78,                                            (32)
which is very close to the value 0.8, see (21). The variance of the process is equal to
                                           𝑅𝑥 (0) = 0.93,                                         (33)
which is rather close to the value 𝛾0 = 1, see (21). So one can conclude that the generated process is
close to the fractional Gaussian noise with given variance and Hurst exponent.
   The generated process is non-smooth, i.e. it is really highly fluctuating, so it is rather difficult to
predict it. So it is reasonable to investigate smooth heavy-tail processes. In order to obtain smoother
processes, we use a very simple smoothing algorithm [15]:
                                                    𝑙
                                               1
                                    𝑋̃𝑖 =          ∑ 𝑋𝑖+𝑗                                           (34)
                                            2𝑙 + 1
                                                  𝑗=−𝑙
where 𝑋̃𝑖 are the values of a smooth process, expression (34) is valid for every point except for the
first 𝑙 and the last 𝑙 ones. The first 𝑙 and the last 𝑙 points of the process 𝑋̃𝑖 may be obtained as the
corresponding linear least-square fit of the first 𝑙 and the last 𝑙 points of the process 𝑋𝑖 , respectively.
The corresponding non-negative process may be expressed similarly to (23):
                                𝑥̃𝑖 = 𝑋̃𝑖 + |min(𝑋̃)| + 10−3 ,                                      (35)
and the corresponding centralized process
                                         ̃𝑖 = 𝑥̃𝑖 − 〈𝑥̃〉
                                         𝑥𝑐                                                         (36)
where the average value
                                                  105
                                             1
                                      〈𝑥̃〉 = 5 ∑ 𝑥̃𝑖 .                                              (37)
                                            10
                                                  𝑗=1
The simulated data for the process 𝑥̃𝑖 for 𝑙 = 3 are given in Fig.3.


Figure 3: The values of the simulated smooth heavy-tail non-negative random process for 𝑙 = 3


Figure 4: The correlation function of the simulated smooth heavy-tail random process and its least-
square power-law fit; t≥1.

   It should be stressed that the obtained smooth process 𝑥̃𝑖 is also a heavy-tail one. Let us consider
the corresponding correlation function:
                                                             105 −𝜏
                                              1
                    𝑅𝑥̃ (𝜏) = 〈𝑥𝑐   ̃𝑖+𝜏 〉 = 5
                               ̃𝑖 ∙ 𝑥𝑐             ∑ (𝑥𝑐   ̃𝑖+𝜏 ).
                                                      ̃𝑖 ∙ 𝑥𝑐                                       (38)
                                            10 − 𝜏
                                                              𝑖=1
   For example, for 𝑙 = 3 the following correlation function and its fit are obtained, see Fig. 4.
The least-square fit is sought in the form (27) , the following numerical coefficients were obtained:
                                   𝑎 = 0.43,       𝑏 = −0.46,                                      (39)
here, the coefficients are rounded off to two significant digits. So,
                                    𝑅fit (𝑡) = 0.43 ∙ 𝑡 −0.46 .                                    (40)
As can be seen form Fig.4, the correlation function of a smooth process is also well described by a
power-law function, the obtained smooth process 𝑥̃𝑖 is also a heavy-tail one, and, in fact, this process
may also be roughly considered as fractional Gaussian noise.

4. Prediction on the basis of the Kolmogorov–Wiener filter
   The prediction for non-smooth data is built as follows. In fact, the prediction for the centralized
process is used. The filter weight coefficients are built on the basis of (13)–(16); the corresponding
correlation function is taken in the form (26).
   First of all, the points 𝑥𝑐1 , 𝑥𝑐2 ,…, 𝑥𝑐𝑇+1 of the simulated process 𝑥𝑐 are taken as the filter input,
and the points 𝑥𝑐𝑇+2 , 𝑥𝑐𝑇+3 ,…, 𝑥𝑐𝑇+𝑧+1 are predicted. Then the points 𝑥𝑐2 , 𝑥𝑐3 , … , 𝑥𝑐𝑇+2 are taken
from the simulated data, and the points 𝑥𝑐𝑇+3 , 𝑥𝑐𝑇+4 ,…, 𝑥𝑐𝑇+𝑧+2 are predicted, and so on.
   At the 𝑖 th iteration of the algorithm the predcition is calculated as follows. The filter input data are
                              𝑥′0 = 𝑥𝑐𝑖 , 𝑥′1 = 𝑥𝑐𝑖+1 , … , 𝑥′ 𝑇 = 𝑥𝑐𝑖+𝑇 ,                            (41)
so
                                              𝑥′𝑗 = 𝑥𝑐𝑖+𝑗 .                                           (42)
The filter output 𝑦𝑡 is the predicted value for 𝑥′𝑡+𝑧 (the non-noisy case is investigated). According to
(3) we have
                                                 𝑡

                                       𝑦𝑡 = ∑ ℎ𝑘 𝑥′𝑡−𝑘 ,                                            (43)
                                                𝑘=0
here, the upper bound of summation is changed in order to avoid obtaining indices beyond the array
of the input data. Such a change of the bound does not lead to a significant error for the prediction
under consideration. On the basis of (41)–(43) one can conclude that
                                                     𝑡

                                   𝑥𝑐
                                   ̂𝑖+𝑡+𝑧 = ∑ ℎ𝑘 𝑥𝑐𝑡+𝑖−𝑘                                            (44)
                                                    𝑘=0
where 𝑥𝑐̂𝑖+𝑡+𝑧 is the predicted value of 𝑥𝑐𝑖+𝑡+𝑧 . Obviously, the prediction is made only for the values
𝑖 + 𝑡 + 𝑧 = 𝑇 + 1 + 𝑖, 𝑇 + 2 + 𝑖, … , 𝑇 + 𝑧 + 𝑖. We should also remember that we should make the
prediction for the non-negative simulated data. So, the predicted non-negative data may be expressed
as
                                                         𝑡

                       ̂𝑖+𝑡+𝑧 + 〈𝑥〉 = 〈𝑥〉 + ∑ ℎ𝑘 𝑥𝑐𝑡+𝑖−𝑘 , 𝑡 = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
             𝑥̂𝑖+𝑡+𝑧 = 𝑥𝑐                                      𝑇 + 1 − 𝑧, 𝑇.                        (45)
                                                     𝑘=0
The MAPE and MAE errors for the corresponding prediction are calculated as
                                        𝑇
                               1                    𝑥̂𝑖+𝑡+𝑧 − 𝑥𝑖+𝑡+𝑧
                        MAPE =         ∑        |                    | ∙ 100%                       (46)
                               𝑧                         𝑥𝑖+𝑡+𝑧
                                    𝑡=𝑇+𝑧−1
and
                                            𝑇
                                   1
                             MAE =          ∑ |𝑥̂𝑖+𝑡+𝑧 − 𝑥𝑖+𝑡+𝑧 |.                                  (47)
                                   𝑧
                                        𝑡=𝑇+𝑧−1
The corresponding prediction errors are calculated at each iteration.
   Let us tell a few words why the above-mentioned change of the upper bound of summation has no
significant effect on the result. In order to make the prediction for 𝑥𝑐
                                                                      ̂ 𝑇+1+𝑖 , one should calculate the
sum of 𝑇 + 2 − 𝑧 summands, in order to make the prediction for 𝑥𝑐     ̂ 𝑇+2+𝑖 one should calculate the
sum of 𝑇 + 3 − 𝑧 summands, and so on. We obviously deal with the case where 𝑇 ≫ 𝑧, so the value
𝑇 + 1 − 𝑧 is rather close to 𝑇 + 1, so the above-mentioned change of the upper bound is not
significant for the calculations.
    Similarly, the prediction for the smooth heavy-tail process is made as follows. At the 𝑖 th iteration
of the algorithm the prediction is calculated as follows:
                                       𝑡

                                            ̃𝑡+𝑖−𝑘 , 𝑡 = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
                     𝑥̂̃𝑖+𝑡+𝑧 = 〈𝑥̃〉 + ∑ ℎ𝑘 𝑥𝑐           𝑇 + 1 − 𝑧, 𝑇                            (48)
                                      𝑘=0
and the corresponding MAPE and MAE errors are
                                 𝑇
                             1        𝑥̂̃𝑖+𝑡+𝑧 − 𝑥̃𝑖+𝑡+𝑧
                      MAPE =    ∑ |                      | ∙ 100%                                (49)
                             𝑧              𝑥̃𝑖+𝑡+𝑧
                                   𝑡=𝑇+𝑧−1
and
                                            𝑇
                                  1
                            MAE =          ∑ |𝑥̂̃𝑖+𝑡+𝑧 − 𝑥̃𝑖+𝑡+𝑧 |.                              (50)
                                  𝑧
                                       𝑡=𝑇+𝑧−1
   The MAPE and MAE are calculated for each above-mentioned iteration. The 105 − 𝑇 − 𝑧 MAPE
and MAE values both for smooth and for non-smooth processes are obtained. The following
parameters are chosen:
                              T = 100,       z = 1.                                    (51)


Figure 5: The MAPE and MAE histograms for the prediction of a non-smooth heavy-tail process

                   Table 1
                   The prediction results for a smooth heavy-tail process
                       𝑙         〈𝑥̃〉     Average MAPE, %        Average MAE
                       1        2.98             9.11                0.235
                       2        2.52             6.26                0.142
                       3        2.34             4.85                0.103
                       4        2.31             3.92                0.081
                       5        2.22             3.37                0.067
                       6        2.11             2.98                0.057
                       7        2.04             2.68                0.050

   The following results are obtained. The MAPE and MAE histograms in the case of the non-smooth
process are shown in Fig.5. The y-axes of the histograms indicate the number of MAPE and MAE
values that belong to the corresponding intervals. For the non-smooth process the average MAPE is
24.7%, and the average MAE is 0.70 (the average value of the process is 〈𝑥〉 = 3.88). It should also
be stressed that for some points the MAPE are more than 100%. So one can conclude that the
prediction accuracy is not high in the case of the non-smooth process. So, if the process is a highly
fluctuating one, then the prediction based on the Kolmogorov–Wiener filter may not lead to good
results.
   But if the process is rather smooth, the prediction results are much better. The corresponding
results are given in Table 1. In Table 1 𝑙 is the parameter used in (34), i.e. 2𝑙 + 1 is the number of
smoothing points. As can be seen, the smother the process is, the better the prediction results are, and
the prediction accuracy increases with 𝑙. The corresponding histograms for 𝑙 = 3 are given in Fig.6.
The predictions for 𝑙 ≥ 6 have an average MAPE value less than 3%.


Figure 6: The MAPE and MAE histograms for the prediction of a non-smooth heavy-tail process

    For example, for 𝑙 = 3 the average MAPE is less than 5%. As can be seen from the corresponding
histogram, the MAPE for the overwhelming majority of points is less than 10%. For some very rare
points the MAPE may be rather high (up to 40%), but in our opinion this may be explained as
follows. As can be seen from Fig. 3, the values for some points of the process 𝑥̃ are rather close to
zero, and the MAPE may not be an adequate characteristic for the prediction of points close to zero.
So, one can conclude that the Kolmogorov–Wiener filter may give good results for the prediction of a
stationary heavy-tail random process if the process is smooth enough.

5. Conclusions and plans for the future
    The use of the Kolmogorov–Wiener filter for the prediction of stationary random heavy-tail
processes is considered. The attention is paid to the discrete case. The problem under consideration
may be connected with the telecommunication traffic prediction, which is important, for example, for
cyber security, see [3]. There are many rather sophisticated approaches to telecommunication traffic
prediction [1]. For rather simple cases (stationary or smooth traffic) the ARMA or gray model
approaches may be used [2]. The traffic in telecommunication systems with data packet transfer is
considered to be a self-similar heavy-tail process, see [11]. Such a simple filter as the Kolmogorov–
Wiener one may be used in the prediction of stationary random processes [6]. However, as far as we
know, the corresponding approach for traffic prediction is not sufficiently developed in the literature.
    In this paper we generate data for a stationary heavy-tail process on the basis of the symmetric
moving average approach [13]. The corresponding non-smooth and smooth data are generated. The
prediction for 1 point forward on the basis of the previous 101 points is investigated. It is shown that
the Kolmogorov–Wiener filter is not good for non-smooth processes, but may give a good prediction
for a stationary random heavy-tail process if the process is rather smooth. So, if the traffic is
stationary and rather smooth, the Kolmogorov–Wiener filter may be used for its prediction. The
advantage of the corresponding approach is the simplicity of the method in contrast with, for example,
neural networks or ARIMA models.
    The plans for the future are as follows. In this paper only the values T = 100 and z = 1 are
investigated. So the prediction investigation for a wider range of parameters may be a plan for the
future. In our recent paper [10] the theoretical approach to the Kolmogorov–Wiener filter construction
in the continuous case is considered. In this paper we generated a large number of data points, which
may allow one to try to investigate the continuous case, so the investigation of the applicability of the
method [10] may be another plan for the future. This paper is based on the generation of simulated
data, so the investigation of real experimental traffic data may be another plan for the future. It should
also be stressed that the use of the Kolmogorov–Wiener filter for the prediction of stationary
processes may be useful not only in telecommunications, but also in other fields of knowledge, for
example, in electrical engineering, see [16].

6. References
[1] Q. H. Do, T. T. H. Doan, T. V. A. Nguyen, N. T. Duong, V. Van Linh, Prediction of Data Traffic
     in Telecom Networks based on Deep Neural Networks, Journal of Computer Science 16 (2020)
     1268-1277. doi: 10.3844/jcssp.2020.1268.1277.
[2] J.-X. Liu, Z.-H. Jia, Telecommunication Traffic Prediction Based on Improved LSSVM,
     International Journal of Pattern Recognition and Artificial Intelligence, 32, No. 3 (2018)
     1850007 (16 pages), doi: 10.1142/S0218001418500076.
[3] H. Brugner, Holt-Winters Traffic Prediction on Aggregated Flow Data, Proceedings of the
     Seminars Future Internet and Innovative Internet Technologies and Mobile Communication
     Focal Topic: Advanced Persistent Threats. Summer Semester 2017 (2017), 25-32. doi:
     10.2313/NET-2017-09-1_04.d
[4] P. Kaushik, S. Singh, P. Yadav, Traffic Prediction in Telecom Systems Using Deep Learning,
     Proceedings of 7th International Conference on Reliability, Infocom Technologies and
     Optimization (ICRITO) (Trends and Future Directions), August 29-31, 2018, Noida, India
     (2018), 302-207, doi: 10.1109/ICRITO.2018.8748386.
[5] P. S. R. Diniz, Adaptive Filtering Algorithms and Practical Implementation, 5th ed., Springer
     Nature Switzerland AG, Cham, 2020, doi: 10.1007/978-3-030-29057-3.
[6] T. Bao, J. Duffy, Signal extraction: experimental evidence, Theory and Decision 90 (2021), 219–
     232. doi: 10.1007/s11238-020-09785-x
[7] S. G. Pollock, Filters, Waves and Spectra, Econometrics 6 (2018), 35 (33 pages). doi:
     10.3390/econometrics6030035
[8] S. G. Pollock, E. Mise, A Wiener–Kolmogorov Filter for Seasonal Adjustment and the Cholesky
     Decomposition of a Toeplitz Matrix, Computational Economics 59 (2022), 913–933. doi:
     10.1007/s10614-020-10087-1
[9] V. Pronina, F. Kokkinos, D.V. Dylov, S. Lefkimmiatis, Microscopy Image Restoration with
     Deep Wiener-Kolmogorov Filters, in: A. Vedaldi, H. Bischof, T. Brox, JM. Frahm (Eds.),
     Lecture Notes in Computer Science, vol 12365, Springer, Cham, 2020, pp. 185–201.
     doi:10.1007/978-3-030-58565-5_12
[10] V. Gorev, A. Gusev, V. Korniienko, M. Aleksieiev, Kolmogorov–Wiener Filter Weight Function
     for Stationary Traffic Forecasting: Polynomial and Trigonometric Solutions, in: P. Vorobiyenko,
     M. Ilchenko, I. Strelkovska (Eds.), Lecture Notes in Networks and Systems, vol 212, Springer,
     2021, pp. 111–129. doi:10.1007/978-3-030-76343-5_7
[11] D. Zhuang, C. Li, Loss Analysis for Networks based on Heavy-Tailed and Self-Similar Traffic,
     Journal of Physics: Conference Series 1584 (2020), 012054 (8 pages). doi: 10.1088/1742-
     6596/1584/1/012054.
[12] D. Radev, I. Lokshina, Advanced models and algorithms for self-similar IP network traffic
     simulations and pefformance analysis, Journal of Electrical Engineering 61, No. 6 (2010), 341-
     349. doi: 10.2478/v10187-010-0053-0.
[13] D. Koutsoyiannis, The Hurst phenomenon and fractional Gaussian noise made easy,
     Hydrological Sciences Journal, 47 (2002), 573-595. doi: 10.1080/02626660209492961.
[14] M. Li, Generalized fractional Gaussian noise and its application to traffic modeling, Physica A
     579 (2021), 126138 (22 pages). doi: 10.1016/j.physa.2021.126138.
[15] K. Molugaram, G. S. Rao, Statistical Techniques for Transportation Engineering, Butterworth-
     Heinemann (Elsevier), Oxford, 2017, doi: 10.1016/B978-0-12-811555-8.00012-X.
[16] Yu. A. Papaika, O. H. Lysenko, Ye. V. Koshelenko, I. H. Olishevskyi, Mathematical modeling
     of power supply reliability at low voltage quality, Naukovyi Visnyk Natsionalnoho Hirnychoho
     Universytetu, No. 2 (2021), 97-103. doi: 10.33271/nvngu/2021-2/097.