Self-similar Traffic Research Experiment *

     Tatiana Tatarnikova1[0000-0002-6419-0072] and Ekaterina Poymanova2[0000-0001-9318-6454]
1Russian State Hydrometeorological University, ul. Voronezhskaya, 79, 192007 St. Petersburg,

                                                Russia
    2Saint-Petersburg State University of Aerospace Instrumentation,     Bolshaya Morskaya str.
                                67, 190000 St. Petersburg, Russia
                tm-tatarn@yandex.ru, e.d.poymanova@gmail.ru,


         Abstract. The article discusses the procedure of researching network traffic with
         a self-similar structure. It is a mixture of voice traffic, data and multimedia. Self-
         similar traffic covers very different time scales. Considered the characteristic
         properties of self-similar traffic both in geometric and statistical terms. Self-sim-
         ilar traffic is represented by the Pareto distribution. Received the characteristics
         of queuing systems using simulation. Results presented in this article make it
         possible to substantiate the prospective requirements for network node equip-
         ment. Obtained the evaluations quality of service for self-similar traffic in terms
         of buffer length and time delay in the network node. An experiment proving the
         operability of the ARIMA(p,d,q) model in the study of self-similar traffic was
         conducted
         Keywords: Network Traffic, Self-similarity, Long-time Dependence, Distribu-
         tion with Heavy «Tails», Slow-damping Dispersion, Hurst Index, Simulation,
         Buffer Storage Volume, forecasting, model of autoregressive integrated moving
         average


1        Introduction

Packet network traffic is the integration of voice, data and multimedia. Such traffic
covers very different time scales - from microseconds till seconds and even minutes
[1]. Mixture of data streams different on content and properties generates to so named
self-similar traffic.
   Self-similar traffic at any time scale is longtime dependence − availability of pulsa-
tions - activity periods, divided to less active periods [2].
   In classical models of information streams, such as Poisson stream, Erlang, gamma-
distribution and other pulsations are strongly smoothed on large time scales, which
makes the property of long-time dependence is missed. As a result, the classical models
are not allowed to appreciate the volumes of calculation resources of systems when
servicing pulsating traffic [3].


Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
2


   According to this fact, the present day task is simulation of network nodes with self-
similar traffic at the entrance. The aim of this modification is to reconsider characteris-
tics received in classical models of information streams.
   In addition, the self-similarity allows one to compose forecast models in different
time scales, which allows a long-term forecast of incoming traffic.


2                             Properties of self-similar traffic

Self-similarity can be characterized as geometrically and so statistically.
   Self-similarity as geometric concept underlines that fact that process keeps the struc-
ture on different time scales.
   In fig. 1-4 can be seen daily data of different traffic types from 08.20.2018. The data
are given by mobile operator MTS in St. Petersburg. They demonstrate the persistence
traffic structure over time.
Self-similarity as a statistical understanding is characterized by such properties:

─ slowly damped dispersion;
─ a long-time dependence;
─ availability of distribution with heavy "tails" of time intervals between two con-
  sistent events [2, 3].


                             400
    Traffic volume, GByte


                             300

                             200

                             100

                               0
                                    100
                                    150
                                    200
                                    250
                                    300
                                    350
                                    400
                                    450
                                    500
                                    550
                                    600
                                    650
                                    700
                                    750
                                    800
                                    850
                                    900
                                    950
                                     50
                                      0


                                   1000
                                   1050
                                   1100
                                   1150
                                   1200
                                   1250
                                   1300
                                   1350
                                   1400


                                                             t, min

                                               Fig. 1. 2G traffic during 1440 min


                            120000
                            100000
    Traffic volume, MByte


                            80000
                            60000
                            40000
                            20000
                                   0
                                        101
                                        151
                                        201
                                        251
                                        301
                                        351
                                        401
                                        451
                                        501
                                        551
                                        601
                                        651
                                        701
                                        751
                                        801
                                        851
                                        901
                                        951
                                         51
                                          1


                                       1001
                                       1051
                                       1101
                                       1151
                                       1201
                                       1251
                                       1301
                                       1351
                                       1401


                                                            t, min

                                            Fig. 2. CS Voice traffic during 1440 min
                                                                                                           3


                                   350
                                   300
          Traffic volume, GByte


                                   250
                                   200
                                   150
                                   100
                                    50
                                     0
                                          105
                                          157
                                          209
                                          261
                                          313
                                          365
                                          417
                                          469
                                          521
                                          573
                                          625
                                          677
                                          729
                                          781
                                          833
                                          885
                                          937
                                          989
                                           53
                                            1


                                         1041
                                         1093
                                         1145
                                         1197
                                         1249
                                         1301
                                         1353
                                         1405
                                         1457
                                                                 t, min

                                            Fig. 3. HSDPA traffic during 1440 min


                            2500
Traffic volume, GByte


                            2000
                            1500
                            1000
                                  500
                                   0
                                         148
                                         197
                                         246
                                         295
                                         344
                                         393
                                         442
                                         491
                                         540
                                         589
                                         638
                                         687
                                         736
                                         785
                                         834
                                         883
                                         932
                                         981
                                          50
                                          99
                                           1


                                        1030
                                        1079
                                        1128
                                        1177
                                        1226
                                        1275
                                        1324
                                        1373
                                        1422
                                                                t, min

                                           Fig. 4. Mix initial traffic during 1440 min

  The property slowly damped dispersion is that the dispersion of the sampling aver-
age dampens slower than quantity reciprocal of sampling size

                                              ( X ( n) (t ) ) σ 2 n2 H − 2 , n → ∞,
                                             D=                                                       (1)

where σ2 − dispersion of process X (t);
n − the sampling size;
H − Hurst index.
   For meaning H> 0.5 the direction of process dynamics most likely will not change;
   for H <0.5 prognoses that process will change direction;
   for H = 0.5, we have uncertainty − Brownian moving.
   Availability of a long-time dependence shows that the self-similar process has hy-
perbolically damped correlation function

                                         R (k ) ≅ k (2 H −2) A(k ), ∀k ≥ 1, k → ∞,                    (2)

A(k) − changing function on infinity, for which
                                                        A(kx)
                                                 lim          = 1 for all x> 0                        (3)
                                                 k →∞   A(k )

                   Property of availability of distribution with heavy "tail" treat to random variable X
4


                                P ( X > x ) ~ cx -α , x → ∞,                            (4)

where 0 <α <2 − parameter of distribution form, the smaller meaning α, the heavier the
"tail" of the distribution;
c − some positive constant.
  For α≤2, distribution has endless dispersion; for α≤1 distribution has an endless
mean.
  Adequate description of self-similar traffic is given by probability distributions with
heavy "tails", in particular, by the Pareto distribution [3, 4].
  Pareto distribution is done by following function
                                                  α
                                         K
                               F (t ) =
                                      1−   , t ≥ K,                                   (5)
                                          t 

where α − form parameter (simply parameter);
K − a border parameter, specifies minimum meaning of random variable x, plays a role
of a scale coefficient.
   In further, the Pareto distribution will be denoted as P(α, K) or simply P.
   Mathematical expectation M(x) of a random variable x distributed on Pareto is de-
termined by the expression M(x)=αK/(α-1).


3      Analysis of queuing systems M|M|1 and P|M|1

As known bufferisation is the main strategy provide resources. Research are concen-
trated around statistical characteristics of the queues [5,6]. It is obvious, that for self-
similar traffic needs buffers by more size, then predicted classical queue analysis [7].
Let us show it by example of estimating the characteristics of queuing systems (QS)
type M|M|1 and P|M|1. With this purpose were worked out simulation models QS in
program AnyLogic.
Experiment on simulation model was carried out in follow limits: buffer size L = ∞;
average processing time of application in service node T = 0.02 s; service node load
factor varied ρ∈ [0,2; one]; K = 0.01; α∈ [1.1; 2].
   Evaluate difference in characteristics QS type M|M|1 and P|M|1 in form of relations:

─ average queues lengths LP and LE with Pareto and exponential distribution of time
  intervals between two arrivals of traffic packets, accordingly;
─ maximum queues lengths LˆP and LˆE with Pareto and exponential distribution of
  time intervals between two arrivals of traffic packets, accordingly;
─ average waiting time Tw and stay time Ts packets into QS type M|M|1 and P|M|1 for
  different load.

   Since different QS type M|M|1 and P|M|1 are compared, that besides using one and
the same generator of random numbers and one the same scale coefficient, needs to use
one the same intensity arrivals of traffic packets [8-10].
                                                                                          5


    For QS type P|M|1, the intensity may be done through parameter α. If suppose that
the mathematical expectation of exponential distribution tends to mathematical expec-
                                   1 αK
                                             , so then α= (1 − Kλ ) . Thus, if the intensity
                                                                   −1
tation of Pareto distribution, so =
                                   λ α −1
λ = 25 packets / s, K =0.01, then for QS type P|M|1 with the same intensity, parameter
α= 1.33, that is not against property of distribution with heavy tails.
    The results of statistical characteristics queuing systems type M|M|1 and P|M|1 are
done in Table. 1. The number of experiments are 5104. Average queue length come to
whole rounded on the right.

                   Table 1. The statistical characteristics results of queues

                         α              LP LE            LˆP LˆE
                         1,1          1               3,70
                         1,2          1               3,75
                         1,3          2               3,71
                         1,4          2               3,86
                         1,5          3               3,89
                         1,6          4               3,82
                         1,7          5               4,00
                         1,8          10              4,21
                         1,9          21              9,23
                         2,0          216             29,93

Analysis of results from the table. 1 allows to make following conclusions.
   Difference in required buffer length for QS type M|M|1 and P|M|1 is obvious, starting
for α = 1.1. For α∈ [1,1; 1.7], this difference is stable when comparing the maximum
queue lengths and this difference rise for α > 1.7. At comparison of average queue
lengths, the buffer length for P|M|1 begins show the double rise already at α> 1.2.
   The rise of queue is influenced not so much by the distribution of time intervals, as
by correlation structure of process.


4      Traffic research experiment using the ARIMA(p,d,q) model

A feature of the self-similarity is also the ability to predict the amount of data in the
network, which helps prevent data loss associated with a denial of service.
   Based on [11–13], it was concluded that the autoregressive integrated moving aver-
age model (ARIMA) can be used to predict self-similar traffic.
   An experiment was conducted, which task was to show the operability of the
ARIMA(p,d,q) model in the study of traffic. For the experiment, there was taken the
data of LTE traffic for six days received from MTS (Fig.5).
   The purpose of the experiment: to build an ARIMA forecast model based of traffic
data for 5 days for the sixth day and compare it with traffic data for this period.
6


It can be seen from the graph in Figure 5 that there is seasonality with a period of 24
hours, the graph fluctuates around a certain value, and it is also seen that there is a
number of peak values. An autocorrelation analysis of the existing time series was car-
ried out and an autoregression function was constructed with the number of steps equal
to 48 (Fig.6).


                              Fig. 5. MTS LTE traffic graph


                   Fig. 6. The initial time series autocorrelation function

On the graph there was a peak of the first value of the time lag. This peak was removed
by taking a difference of the order of 1 (D-1) and the autocorrelation function for the
transformed time series and the partial autocorrelation function were constructed
(Fig.7). The parameters p, d, q were determined.
   Since the sequential difference operation was applied once, d=1. The parameter p
was chosen from the partial autocorrelation model p=1 (the first significant value of the
series of functions). The parameter q was chosen similarly from the autocorrelation
model and q=1.
                                                                                          7


                 Fig. 7. Autocorrelation and partial autocorrelation functions

As a result, the ARIMA(1,1,1) model was built and a forecast for one period was ob-
tained (Fig. 8).


                                 Fig. 8. Data forecast model

Since, due to the properties of self-similarity, the structure of the time series is pre-
served in different time scales, the constructed model is also suitable for other periods
of time (month, year, etc.).
This experiment is an example for constructing an ARIMA model and illustrates the
order of forecasting using this model. A feature of the study of the time series using the
ARIMA model is a mandatory expert assessment of the obtained autocorrelation mod-
els. On the one hand, this is a drawback, because forecasting is a labor-intensive process
and it requires the participation of a specialist, on the other hand, expert assessment
allows you to more accurately build a forecast model.


5      Conclusion

The rise in the share of multi-service traffic in networks actualizes the problem of meet-
ing customer requirements for the quality of network services provided.
   It was conducted the simulation experiment for QS type M|M|1 and P|M|1. The ob-
tained evaluations quality of service for self-similar traffic in terms of the buffer length
and the time delay in the network node.
8


   The results presented in the article make it possible to substantiate the prospective
requirements for network node equipment.
   An experiment was conducted proving the operability of the ARIMA(p,d,q) model
in the study of self-similar traffic.


References
 1. Bogatyrev, V.A., Bogatyrev, S. V., Parshutina, Bogatyrev A. V.: Model and Interaction Ef-
    ficiency of Computer Nodes Based on Transfer Reservation at Multipath Routing. In: 2019
    Wave Electronics and its Application in Information and Telecommunication Systems
    (WECONF), pp. 1-4 (2019). doi: 10.1109/WECONF.2019.8840647
 2. Tatarnikova, T.M. Statistical methods for studying network traffic. In Informatsionno-Up-
    ravliaiushchie Sistemy, vol. 96, no.5, pp. 35-43 (2018). doi: 10.31799/1684-8853-2018-5-
    35-43
 3. Tanenbaum, A., Wetherall, D.: Computer Networks. 5th ed. Prentice Hall (2010).
 4. Kutuzov O. I., Tatarnikova T. M. Model of a self-similar traffic generator and evaluation
    of buffer storage for classical and fractal queuing system. In Moscow Workshop on Elec-
    tronic and Networking Technologies, MWENT 2018 - Proceedings 1, pp. 1-3 (2018).
 5. Leland, W.E., Taqqu, M.S., Willinger, W. and Wilson, D.V. On The Self-Similar Nature of
    Ethernet Traffic. In Proc. ACM SIGCOMM’93, pp. 183-193. San-Fransisco (1993).
 6. Zwart, A. P.: Queueing Systems with Heavy Tails. Eindhoven University of Technology
    Publ. (2001).
 7. Poymanova, E.D., Tatarnikova, T. M. Models and Methods for Studying Network Traffic.
    In 2018 Wave Electronics and its Application in Information and Telecommunication Sys-
    tems (WECONF), pp. 1-5 (2018). doi: 10.1109 / WECONF.2018.8604470
 8. Bogatyrev, V.A. Protocols for dynamic distribution of requests through a bus with variable
    logic ring for reception authority transfer. In Automatic Control and Computer Sciences,
    vol. 33, no. 3, pp. 57-63 (1999).
 9. Tatarnikova, T., Kolbanev, M. Statement of a task corporate information networks interface
    centers structural synthesis. In IEEE EUROCON 2009, pp. 1883-1887. St. Petersburg
    (2009).
10. Bogatyrev, V. A., Vinokurova M. S. Control and Safety of Operation of Duplicated Com-
    puter Systems. In Communications in Computer and Information Science, IET – 2017, vol.
    700, pp. 331-342 (2017).
11. Будько М. Ю., Будько М. Б., Гирик А. В. Применение авторегрессионного интегриро-
    ванного скользящего среднего в алгоритмах управления перегрузками протоколов пе-
    редачи потоковых данных // Научно-технический вестник информационных техноло-
    гий, механики и оптики. 2007. №39. С. 319-323
12. Крюков Ю.А., Чернягин Д.В. ARIMA – модель прогнозирования значений трафика //
    Информационные технологии и вычислительные системы. 2011/2, с. 41-49
13. Соловьев А.Ю. О задаче прогнозирования самоподобных сетевых процессов // III
    Международная научная конференция «Современные проблемы информатизации в
    системах моделирования, программирования и телекоммуникациях». URL:
    http://econf.rae.ru/article/4745