Self-similar Traffic Research Experiment * Tatiana Tatarnikova1[0000-0002-6419-0072] and Ekaterina Poymanova2[0000-0001-9318-6454] 1Russian State Hydrometeorological University, ul. Voronezhskaya, 79, 192007 St. Petersburg, Russia 2Saint-Petersburg State University of Aerospace Instrumentation, Bolshaya Morskaya str. 67, 190000 St. Petersburg, Russia tm-tatarn@yandex.ru, e.d.poymanova@gmail.ru, Abstract. The article discusses the procedure of researching network traffic with a self-similar structure. It is a mixture of voice traffic, data and multimedia. Self- similar traffic covers very different time scales. Considered the characteristic properties of self-similar traffic both in geometric and statistical terms. Self-sim- ilar traffic is represented by the Pareto distribution. Received the characteristics of queuing systems using simulation. Results presented in this article make it possible to substantiate the prospective requirements for network node equip- ment. Obtained the evaluations quality of service for self-similar traffic in terms of buffer length and time delay in the network node. An experiment proving the operability of the ARIMA(p,d,q) model in the study of self-similar traffic was conducted Keywords: Network Traffic, Self-similarity, Long-time Dependence, Distribu- tion with Heavy «Tails», Slow-damping Dispersion, Hurst Index, Simulation, Buffer Storage Volume, forecasting, model of autoregressive integrated moving average 1 Introduction Packet network traffic is the integration of voice, data and multimedia. Such traffic covers very different time scales - from microseconds till seconds and even minutes [1]. Mixture of data streams different on content and properties generates to so named self-similar traffic. Self-similar traffic at any time scale is longtime dependence − availability of pulsa- tions - activity periods, divided to less active periods [2]. In classical models of information streams, such as Poisson stream, Erlang, gamma- distribution and other pulsations are strongly smoothed on large time scales, which makes the property of long-time dependence is missed. As a result, the classical models are not allowed to appreciate the volumes of calculation resources of systems when servicing pulsating traffic [3]. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 According to this fact, the present day task is simulation of network nodes with self- similar traffic at the entrance. The aim of this modification is to reconsider characteris- tics received in classical models of information streams. In addition, the self-similarity allows one to compose forecast models in different time scales, which allows a long-term forecast of incoming traffic. 2 Properties of self-similar traffic Self-similarity can be characterized as geometrically and so statistically. Self-similarity as geometric concept underlines that fact that process keeps the struc- ture on different time scales. In fig. 1-4 can be seen daily data of different traffic types from 08.20.2018. The data are given by mobile operator MTS in St. Petersburg. They demonstrate the persistence traffic structure over time. Self-similarity as a statistical understanding is characterized by such properties: ─ slowly damped dispersion; ─ a long-time dependence; ─ availability of distribution with heavy "tails" of time intervals between two con- sistent events [2, 3]. 400 Traffic volume, GByte 300 200 100 0 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 50 0 1000 1050 1100 1150 1200 1250 1300 1350 1400 t, min Fig. 1. 2G traffic during 1440 min 120000 100000 Traffic volume, MByte 80000 60000 40000 20000 0 101 151 201 251 301 351 401 451 501 551 601 651 701 751 801 851 901 951 51 1 1001 1051 1101 1151 1201 1251 1301 1351 1401 t, min Fig. 2. CS Voice traffic during 1440 min 3 350 300 Traffic volume, GByte 250 200 150 100 50 0 105 157 209 261 313 365 417 469 521 573 625 677 729 781 833 885 937 989 53 1 1041 1093 1145 1197 1249 1301 1353 1405 1457 t, min Fig. 3. HSDPA traffic during 1440 min 2500 Traffic volume, GByte 2000 1500 1000 500 0 148 197 246 295 344 393 442 491 540 589 638 687 736 785 834 883 932 981 50 99 1 1030 1079 1128 1177 1226 1275 1324 1373 1422 t, min Fig. 4. Mix initial traffic during 1440 min The property slowly damped dispersion is that the dispersion of the sampling aver- age dampens slower than quantity reciprocal of sampling size ( X ( n) (t ) ) σ 2 n2 H − 2 , n → ∞, D= (1) where σ2 − dispersion of process X (t); n − the sampling size; H − Hurst index. For meaning H> 0.5 the direction of process dynamics most likely will not change; for H <0.5 prognoses that process will change direction; for H = 0.5, we have uncertainty − Brownian moving. Availability of a long-time dependence shows that the self-similar process has hy- perbolically damped correlation function R (k ) ≅ k (2 H −2) A(k ), ∀k ≥ 1, k → ∞, (2) A(k) − changing function on infinity, for which A(kx) lim = 1 for all x> 0 (3) k →∞ A(k ) Property of availability of distribution with heavy "tail" treat to random variable X 4 P ( X > x ) ~ cx -α , x → ∞, (4) where 0 <α <2 − parameter of distribution form, the smaller meaning α, the heavier the "tail" of the distribution; c − some positive constant. For α≤2, distribution has endless dispersion; for α≤1 distribution has an endless mean. Adequate description of self-similar traffic is given by probability distributions with heavy "tails", in particular, by the Pareto distribution [3, 4]. Pareto distribution is done by following function α K F (t ) = 1−   , t ≥ K, (5)  t  where α − form parameter (simply parameter); K − a border parameter, specifies minimum meaning of random variable x, plays a role of a scale coefficient. In further, the Pareto distribution will be denoted as P(α, K) or simply P. Mathematical expectation M(x) of a random variable x distributed on Pareto is de- termined by the expression M(x)=αK/(α-1). 3 Analysis of queuing systems M|M|1 and P|M|1 As known bufferisation is the main strategy provide resources. Research are concen- trated around statistical characteristics of the queues [5,6]. It is obvious, that for self- similar traffic needs buffers by more size, then predicted classical queue analysis [7]. Let us show it by example of estimating the characteristics of queuing systems (QS) type M|M|1 and P|M|1. With this purpose were worked out simulation models QS in program AnyLogic. Experiment on simulation model was carried out in follow limits: buffer size L = ∞; average processing time of application in service node T = 0.02 s; service node load factor varied ρ∈ [0,2; one]; K = 0.01; α∈ [1.1; 2]. Evaluate difference in characteristics QS type M|M|1 and P|M|1 in form of relations: ─ average queues lengths LP and LE with Pareto and exponential distribution of time intervals between two arrivals of traffic packets, accordingly; ─ maximum queues lengths LˆP and LˆE with Pareto and exponential distribution of time intervals between two arrivals of traffic packets, accordingly; ─ average waiting time Tw and stay time Ts packets into QS type M|M|1 and P|M|1 for different load. Since different QS type M|M|1 and P|M|1 are compared, that besides using one and the same generator of random numbers and one the same scale coefficient, needs to use one the same intensity arrivals of traffic packets [8-10]. 5 For QS type P|M|1, the intensity may be done through parameter α. If suppose that the mathematical expectation of exponential distribution tends to mathematical expec- 1 αK , so then α= (1 − Kλ ) . Thus, if the intensity −1 tation of Pareto distribution, so = λ α −1 λ = 25 packets / s, K =0.01, then for QS type P|M|1 with the same intensity, parameter α= 1.33, that is not against property of distribution with heavy tails. The results of statistical characteristics queuing systems type M|M|1 and P|M|1 are done in Table. 1. The number of experiments are 5104. Average queue length come to whole rounded on the right. Table 1. The statistical characteristics results of queues α LP LE LˆP LˆE 1,1 1 3,70 1,2 1 3,75 1,3 2 3,71 1,4 2 3,86 1,5 3 3,89 1,6 4 3,82 1,7 5 4,00 1,8 10 4,21 1,9 21 9,23 2,0 216 29,93 Analysis of results from the table. 1 allows to make following conclusions. Difference in required buffer length for QS type M|M|1 and P|M|1 is obvious, starting for α = 1.1. For α∈ [1,1; 1.7], this difference is stable when comparing the maximum queue lengths and this difference rise for α > 1.7. At comparison of average queue lengths, the buffer length for P|M|1 begins show the double rise already at α> 1.2. The rise of queue is influenced not so much by the distribution of time intervals, as by correlation structure of process. 4 Traffic research experiment using the ARIMA(p,d,q) model A feature of the self-similarity is also the ability to predict the amount of data in the network, which helps prevent data loss associated with a denial of service. Based on [11–13], it was concluded that the autoregressive integrated moving aver- age model (ARIMA) can be used to predict self-similar traffic. An experiment was conducted, which task was to show the operability of the ARIMA(p,d,q) model in the study of traffic. For the experiment, there was taken the data of LTE traffic for six days received from MTS (Fig.5). The purpose of the experiment: to build an ARIMA forecast model based of traffic data for 5 days for the sixth day and compare it with traffic data for this period. 6 It can be seen from the graph in Figure 5 that there is seasonality with a period of 24 hours, the graph fluctuates around a certain value, and it is also seen that there is a number of peak values. An autocorrelation analysis of the existing time series was car- ried out and an autoregression function was constructed with the number of steps equal to 48 (Fig.6). Fig. 5. MTS LTE traffic graph Fig. 6. The initial time series autocorrelation function On the graph there was a peak of the first value of the time lag. This peak was removed by taking a difference of the order of 1 (D-1) and the autocorrelation function for the transformed time series and the partial autocorrelation function were constructed (Fig.7). The parameters p, d, q were determined. Since the sequential difference operation was applied once, d=1. The parameter p was chosen from the partial autocorrelation model p=1 (the first significant value of the series of functions). The parameter q was chosen similarly from the autocorrelation model and q=1. 7 Fig. 7. Autocorrelation and partial autocorrelation functions As a result, the ARIMA(1,1,1) model was built and a forecast for one period was ob- tained (Fig. 8). Fig. 8. Data forecast model Since, due to the properties of self-similarity, the structure of the time series is pre- served in different time scales, the constructed model is also suitable for other periods of time (month, year, etc.). This experiment is an example for constructing an ARIMA model and illustrates the order of forecasting using this model. A feature of the study of the time series using the ARIMA model is a mandatory expert assessment of the obtained autocorrelation mod- els. On the one hand, this is a drawback, because forecasting is a labor-intensive process and it requires the participation of a specialist, on the other hand, expert assessment allows you to more accurately build a forecast model. 5 Conclusion The rise in the share of multi-service traffic in networks actualizes the problem of meet- ing customer requirements for the quality of network services provided. It was conducted the simulation experiment for QS type M|M|1 and P|M|1. The ob- tained evaluations quality of service for self-similar traffic in terms of the buffer length and the time delay in the network node. 8 The results presented in the article make it possible to substantiate the prospective requirements for network node equipment. An experiment was conducted proving the operability of the ARIMA(p,d,q) model in the study of self-similar traffic. References 1. Bogatyrev, V.A., Bogatyrev, S. V., Parshutina, Bogatyrev A. V.: Model and Interaction Ef- ficiency of Computer Nodes Based on Transfer Reservation at Multipath Routing. In: 2019 Wave Electronics and its Application in Information and Telecommunication Systems (WECONF), pp. 1-4 (2019). doi: 10.1109/WECONF.2019.8840647 2. Tatarnikova, T.M. Statistical methods for studying network traffic. In Informatsionno-Up- ravliaiushchie Sistemy, vol. 96, no.5, pp. 35-43 (2018). doi: 10.31799/1684-8853-2018-5- 35-43 3. Tanenbaum, A., Wetherall, D.: Computer Networks. 5th ed. Prentice Hall (2010). 4. Kutuzov O. I., Tatarnikova T. M. Model of a self-similar traffic generator and evaluation of buffer storage for classical and fractal queuing system. In Moscow Workshop on Elec- tronic and Networking Technologies, MWENT 2018 - Proceedings 1, pp. 1-3 (2018). 5. Leland, W.E., Taqqu, M.S., Willinger, W. and Wilson, D.V. On The Self-Similar Nature of Ethernet Traffic. In Proc. ACM SIGCOMM’93, pp. 183-193. San-Fransisco (1993). 6. Zwart, A. P.: Queueing Systems with Heavy Tails. Eindhoven University of Technology Publ. (2001). 7. Poymanova, E.D., Tatarnikova, T. M. Models and Methods for Studying Network Traffic. In 2018 Wave Electronics and its Application in Information and Telecommunication Sys- tems (WECONF), pp. 1-5 (2018). doi: 10.1109 / WECONF.2018.8604470 8. Bogatyrev, V.A. Protocols for dynamic distribution of requests through a bus with variable logic ring for reception authority transfer. In Automatic Control and Computer Sciences, vol. 33, no. 3, pp. 57-63 (1999). 9. Tatarnikova, T., Kolbanev, M. Statement of a task corporate information networks interface centers structural synthesis. In IEEE EUROCON 2009, pp. 1883-1887. St. Petersburg (2009). 10. Bogatyrev, V. A., Vinokurova M. S. Control and Safety of Operation of Duplicated Com- puter Systems. In Communications in Computer and Information Science, IET – 2017, vol. 700, pp. 331-342 (2017). 11. Будько М. Ю., Будько М. Б., Гирик А. В. Применение авторегрессионного интегриро- ванного скользящего среднего в алгоритмах управления перегрузками протоколов пе- редачи потоковых данных // Научно-технический вестник информационных техноло- гий, механики и оптики. 2007. №39. С. 319-323 12. Крюков Ю.А., Чернягин Д.В. ARIMA – модель прогнозирования значений трафика // Информационные технологии и вычислительные системы. 2011/2, с. 41-49 13. Соловьев А.Ю. О задаче прогнозирования самоподобных сетевых процессов // III Международная научная конференция «Современные проблемы информатизации в системах моделирования, программирования и телекоммуникациях». URL: http://econf.rae.ru/article/4745