Forecasting Network Exchange Time Series
Aleksandr Moshnikova , Aleksandr Syrovb
a
    ITMO University, 49 Kronverksky Pr., St. Petersburg, Russian Federation
b
    ITMO University, 49 Kronverksky Pr., St. Petersburg, Russian Federation


                                         Abstract
                                         This article deals with the problem of network traffic forecasting using the time series tool. A special
                                         feature of the proposed task is to consider the traffic of an individual network user. We consider 5 types
                                         of user traffic characterized by different parameters of the volume and number of transmitted packets. 13
                                         main models for traffic forecasting are considered. To consider the effectiveness the parameters - RMSE,
                                         p-value, MAPE were evaluated. Leung-Box test is used to model assessment. To solve the problem the
                                         R software, the forecast package, is used. The results of an experiment using 13 different models are
                                         considered.

                                         Keywords
                                         Network traffic forecasting, Network modelling, R programming language, Time series


1. Introduction
Modeling network traffic is a notoriously difficult task. This is primarily due to the ever-
increasing complexity of network traffic and the various ways in which the network can be
excited by user activity. The ongoing development of new network applications, protocols, and
usage profiles further necessitates models that can adapt to the specific networks in which they
are deployed [1] demand for telecommunications network traffic continues to grow exponen-
tially worldwide. Research has shown that the number of mobile cellular subscribers is growing
worldwide, and the world’s population is gradually equalizing with the level of its use [2].
   Achieving this exponential growth requires effective planning and rapid expansion of telecom-
munications systems, as well as the introduction of modern equipment. One approach to the
leadership industry players is the development and adoption of appropriate forecasting models
for the implementation of this agenda. Forecasting methods can be classified as long-term
and short-term. According to [3] the forecasting process based on a time interval in weeks,
months, and years is a long-term forecast, while short-term forecasts are milliseconds, seconds,
minutes, hours, and days. Time series modeling and forecasting are widely used for analyzing
telecommunications network traffic [4]. It has been shown that ARIMA models are stable
forecast for BitTorrent traffic [5]. In contrast, [6] indicated that the accuracy of prediction of
ARIMA models has a limited time interval. Data sets of http network traffic grouped in different
periods of the day [7] and activities [8].

Proceedings of the 12th Majorov International Conference on Software Engineering and Computer Systems, December
10–11, 2020, Online & Saint Petersburg, Russia
" moshnikov.alex@gmail.com (A. Moshnikov)
 0000-0002-3689-2472 (A. Moshnikov); 0000-0003-1475-4105 (A. Syrov)
                                       © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
  In addition to the traffic itself, the task of analyzing the reliability of a network structure
that provides information exchange is also important [9], as well as aspects related to the
computational reliability of operations performed [10].


2. Problem statement
The problem of network traffic forecasting using the time series tool is considered. A special
feature of the proposed task is to consider the traffic of a single network user. As part of the
task under consideration, the following tasks were performed:

    • typical traffic sections (time series) characteristic of a particular network user behavior
      model were formed;
    • an overview of existing time series forecasting models was performed;
    • a time series forecasting method has been selected that allows you to get acceptable
      forecast results for various models of network user behavior.


3. Methodology
3.1. Description of initial data
To generate time series, raw sockets are used to capture and analyze frames sent or received
by the node in question. Time series are formed by measuring (at intervals of one second) the
amount of data coming to the node under consideration over the TCP/IP Protocol stack.
   Information about frames is saved in a table (the format of such a table is shown in the Fig.
1). This table is then converted to a time series data.
   Typical traffic sections (time series) that are typical for various network user behavior models
The following models of network user behavior are considered (based on personal experience):

    • click on links (time series R1);
    • download files (time series R2);
    • listening to music (time series R3);
    • view video (time series R4);
    • the user’s browser is running but not in use (time series R5).

  The results are shown in Fig. 2. and Fig. 3.

3.2. Overview of existing time series forecasting models
This review is based on the following works [11] and [12]. The following designations are used:
𝑦𝑇 - the observed (actual) value of the series at time 𝑇 , 𝑦^𝑇 +ℎ|𝑇 - the predicted value of the time
series for time 𝑇 + ℎ.
   Summary of time series forecasting models and fucntion parametres are presented in Table 2.
   All calculations related to forecasting will be performed in the R software environment. R is
a programming language for statistical data processing and graphics, as well as a free and open
Figure 1: Network monitoring data


Table 1
Series forecasting models
     Model name                                Parameters      Function forecast library
     Average                                        -                  meanf(ts)
     Last value                                     -                   naive(ts)
     Drift model                                    -             rwf(ts,drift=TRUE)
     Simple exponential smoothing model             𝛼                    ses(ts)
     The Holt Model                               𝛼, 𝛽                   holt(ts)
     Holt model with a fading trend              𝛼, 𝛽, 𝜑        holt(ts,damped=TRUE)
     Holt-Winters model,                         𝛼, 𝛽, 𝛾       hw(ts,seasonal= additive)
     with additive seasonality
     Holt-Winters model, with multiplicative     𝛼, 𝛽, 𝛾     hw(ts,seasonal= multiplicative)
     seasonality and fading trend
     Holt-winters model, with multiplicative    𝛼, 𝛽, 𝛾, 𝜑   hw(ts,seasonal= multiplicative,
     seasonality and fading trend                                   damped=TRUE)
     An ARIMA model                                 -
     Forecasting the components of a series                         stlf(ts, method)
     Autoregression based on neural networks                           nnetar(ts)


source computing environment within the GNU project. The R language contains tools that
allow you to create several parallel threads of calculations (by simultaneously loading several
processor cores) and reduce the time spent on modeling several times [13]. Series forecasting
models provided in the forecast package are considered.
Figure 2: Timing diagram of the traffic R1, R2, R3 series


4. Estimation of forecast accuracy
4.1. Evaluation of prediction accuracy
Forecast error refers to the difference between the observed value and its forecast:

                                      𝑒𝑇 +ℎ = 𝑦^𝑇 +ℎ − 𝑦^𝑇 +ℎ|𝑇                            (1)
  The root-mean-square error is used to estimate the prediction accuracy:
                                             √︁
                                 𝑅𝑀 𝑆𝐸 = 𝑚𝑒𝑎𝑛(𝑒2𝑡 )                                        (2)

4.1.1. Schematic diagram of estimation of forecasting accuracy.
The considered time series is divided into training and test parts. The parameters of the
forecasting model are determined by the training part. After determining the model parameters,
Figure 3: Timing diagram of the traffic R4, R5 series


the Leung-Box test is performed for the absence of auto-correlation in the model residuals. If
the Leung-Box test is not passed (there is a strong auto-correlation in the remainder of the
model), the model is rejected. If the test is passed, the accuracy of the forecast (RMSE value) is
determined by the test part of the series. Algorithm presented at Fig. 4.


Figure 4: Schematic diagram of estimation of forecasting accuracy
4.1.2. Combination of forecasting models.
As an additional study, we consider the possibility of predicting time series based on combina-
tions (combining) of several forecasting models. For a combination of forecasting models, the
total forecast error is calculated using:
                                                        𝑛
                                                           (𝑖)
                                                       ∑︁
                                    𝑒𝑇 +ℎ = (1/𝑁 ) ·      𝑒𝑇 +ℎ                                   (3)
                                                       𝑖=1

                                                                  (𝑖)
   where 𝑁 - number of prediction models in combination; 𝑒𝑇 +ℎ prediction error of the 𝑖-th
forecasting model.


5. The results of evaluation of prediction accuracy for different
   models
Results of estimating the accuracy of forecasting the R1 traffic type (as example): original series
and the function of partial auto-correlation are shown in Fig. 5; and the residual plot and fitted
model are shown in Fig. 6.


Figure 5: Results of estimating the accuracy of forecasting: original series, the function of partial
auto-correlation

  For a number of models, it was not possible to calculate the RMSE, MAPE, and p-value
indicators this is due to the fact that the sample size exceeded the parameters of the input file
Figure 6: Results of estimating the accuracy of forecasting: the residual plot, fitted model


for the Foreach package procedure. Results of estimating the accuracy of forecasting time series
R1-R5 shown in Table 2.
   Time diagrams of traffic forecasting of the R1 type 1-6 are shown in Fig. 7 and type 7-13 are
shown in Fig.8.
   Based on the results of testing the forecasting models using the Leung-Box test, it was found:
    • Average method, Simple exponential smoothing, Holt’s linear trend method, Holt’s linear
      trend method. Damped trend methods, Holt-Winters’ additive method, Holt-Winters’
      multiplicative method. Damped method, ARIMA, Neural network models passed test for
      time series R1;
    • Neural network models passed test for time series R2;
    • ARIMA, Neural network models passed test for time series R3;
    • ARIMA passed test for time series R4;
    • Seasonal naive method, STL with multiple seasonal periods, Neural network models
      passed test for time series R5. The Neural network models showed the better flexibility
      by performing the Leung-Box test for the largest number of traffic types.


6. Conclusion
The article considers the problem of network traffic forecasting using the time series tool. A
special feature of the proposed task is to consider the traffic of a single network user. The
Table 2
Results of estimating the accuracy of forecasting time series
       Forecasting model         Metric       R1         R2         R3        R4         R5
       Average                   RMSE       9 · 104    6 · 105    3 · 104   2 · 105    3 · 104
                                 MAPE       17548         32      3 · 106   2 · 104    3 · 104
       Naive method              RMSE       94519      951016        0      304040        20
                                 MAPE       17007         58         0         89         10
       Seasonal naive            RMSE      443164      642443      3416     390408    375832
       method                    MAPE      7383279        45       1308     39881     5173244
       Drift method              RMSE       95905      946121        0      304100        20
                                 MAPE       17407         57         0       1304         10
       Simple exponential        RMSE      155610      788437     37169     269173     46863
       smoothing                 MAPE       32158         37     3482586    806723    4225309
       Holt’s linear             RMSE      278414      788405     37206     279849     74122
       trend method              MAPE       58258         37     3720658    504311    6677233
       Holt’s linear trend       RMSE      159568      788966     37206     269168     46912
       method. Damped            MAPE       33005         37     3720658    806727    4229920
       trend methods
       Holt-Winters’             RMSE       254612     875499       -       331179       -
       additive method           MAPE      1906904       42         -       1809142
       Holt-Winters’             RMSE       513598     856715       -       316473       -
       multiplicative method     MAPE      3870810       43                   3377
       Holt-Winters’             RMSE       714634     729813       -       325126       -
       multiplicative method.    MAPE      13755727      48                 1114028
       Damped method
       ARIMA                     RMSE      104660      749206     28447     279456     35522
                                 MAPE      371876        42      2746378    4775078   3104142
       STL with multiple         RMSE      179971      924319    135606     294120    112159
       seasonal periods          MAPE      2208689       52      4851644    213482    3785941
       Neural network            RMSE      280028      778150     14116     204480     31622
       models                    MAPE      2570521       44      1353843    2543793   2480219


following activities were used as traffic types: clicking on links — time series R1; downloading
files (for example, movies) - time series R2, and others.
   Typical, generally accepted models were used for forecasting, such as the drift Model, the
Holt-winters Model, the ARIMA Model, and others. The accuracy estimation was performed
for the considered models. All implementation was carried out in the R software package and
using the forecast package. Based on the results obtained, we can conclude the following -
the most acceptable model for predicting network traffic of an individual user is a forecasting
model based on the decomposition of a series into separate components and their independent
forecasting. Further work will focus on the use of combinations of forecasting models for traffic
forecasting.
Figure 7: Forecast results for R1: a) average, b) an ARIMA model, c) Drift model, d) Holt model with a
fading trend, e) Holt model, f) Holt-Winters model


References
[1] Ntlangu, M., Baghai-Wadji, A. (2017). Modelling Network Traffic Using Time Series Analysis:
    A Review. BDIOT2017. 10.1145/3175684.3175725
[2] International Telecommunication Union (ITU). (2014), The World in 2014: ICT Facts and
    Figures, Technical Report
[3] Gowrishankar, S. (2008), A Time Series Modelling and Prediction of Wireless Network
    Traffic, Georgian Electronic Scientific Journal: Computer Science and Telecommunications,
    Vol. 2, No. 16, pp. 40-52.
[4] Diaz-Aviles, E., Pinelli, F., Lynch, K., Nabi, Z., Gkoufas, Y., Bouillet, E. and Calabrese, F. (2015),
    Towards Real-time Customer Experience Prediction for Telecommunication Operators, IEEE
    International Conference on Big Data, pp. 1063-1072.
[5] KuanHoong, P., Tan, I.K.T. and YikKeong, C. (2012). Bit Torrent Network Traffic Forecasting
    with ARMA, International Journal of Computer Networks and Communications, Vol. 4, No.
    4, pp. 143-156.
[6] Moussas, V. C., Daglis, M. and Kolega, E. (2005), Network Traffic Modelling and Predic-
    tion Using Multiplicative Seasonal ARIMA Models, 1st International Conference on Ex-
    periments/Process/System/ Modelling/Simulation/Optimisation, 1st IC-EpsMsO, 6-9 July,
    Athens, pp. 1-7
[7] Santos A. (2011). Network traffic characterization based on Time Series Analysis and
    Computational Intelligence. Journal of Computational Interdisciplinary Sciences. 2.
    10.6062/jcis.2011.02.03.0046.
[8] Oduro-Gyimah F., Boateng, K. Analysis and modelling of telecommunications network
    traffic: a time series approach (2018).
[9] A. S. Moshnikov and V. S. Kolomoitcev, "Reliability Assessment of Distributed Control
Figure 8: Forecast results for R1: a) Holt-Winters model, with multiplicative seasonality, b) Holt-
Winters model, with multiplicative seasonality and fading trend, c) Last value, d) Auto-regression based
on neural networks, e) Simple exponential smoothing model, f) The last value is seasonally adjusted
(seasonal naive method), g) Neural network models


    Systems with Network Structure," 2020 Wave Electronics and its Application in Information
    and Telecommunication Systems (WECONF), Saint-Petersburg, Russia, 2020, pp. 1-4, doi:
    10.1109/WECONF48837.2020.9131490.
[10] V. A. Bogatyrev, S. V. Bogatyrev and A. V. Bogatyrev, "Reliability and Probability of
    Timely Servicing in a Cluster of Heterogeneous Flow of Query Functionality," 2020 Wave
    Electronics and its Application in Information and Telecommunication Systems (WECONF),
    Saint-Petersburg, Russia, 2020, pp. 1-4, doi: 10.1109/WECONF48837.2020.9131165.
[11] Ruey S. Tsay.: Analysis of Financial Time Series. 2nd edn. A JOHN WILEY and SONS, New
    Jersey (2005)
[12] Robert H. Shumway, David S. Stoffer.: Time series analysis and it Applications, With R
    Examples. Third edition, Springer, (2017)
[13] Crawley MJ.: The R Book. 2nd ed. Wiley Publishing; 2012