468 Estimation of Buffer Capacity for Weibull Traffic* Oleg I. Kutuzov 1 [0000-0001-9318-6454] and Tatiana M. Tatarnikova 1 [0000-0002-6419-0072] 1Saint Petersburg Electrotechnical University "LETI", ul. Professora Popova 5, 197376 St. Pe- tersburg, Russia kutuzov-oleg@mail.ru tm-tatarn@yandex.ru Abstract. The solution to the problem of estimating the probability of exceeding a given value by a random value is discussed, in particular, estimating the small and very small probabilities of packet loss due to buffer overflows. It is shown that statistics of extreme values is an effective means of solving the problems of this class. With the appropriate choice of the structure of the extrapolation for- mula, which underlies the extreme values, it is possible to significantly accelerate the computational experiment. The use of the Extreme Value Theory method in combination with the Monte Carlo method for estimating small and very small values of the probabilities of random variables allows us to reduce the simulation time by 4-5 tens compared to the classical Monte Carlo method. Demonstrated the solution to the problem of calculating the buffer capacity for Weibull traffic based on the admissible probability 10-6 of its overflow. Keywords: Network Traffic, Fractal Traffic, Distribution with Heavy «Tails», Weibull, Buffer Storage Volume, Probability of Loss, Extreme Value Theory, Simulation, Experiment Acceleration. 1 Introduction A feature of the traffic of modern packet-switched info-communication networks is its large-scale invariance (fractality). Fractal traffic is described by long-term dependen- cies (LTD). The concept of a pulsating structure is often used in this context [2]. An adequate description of fractal traffic is given by probability distributions with heavy tails [1], [2]. One of these distributions used to describe the packet structure of network traffic is the Weibull distribution [3], [4] The Weibull distribution, hereinafter W(c, b), is widely used in the technique and it is distributions with heavy tails with the distribution function of y values: F ( y )  1  e  ( y ) , y  0. c (1) * Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 469 The distribution W(c, b) can be obtained as a function of the exponential distribution. Let X be a random variable that has an exponential distribution Fx  X   1  ex . Then the random variable Y has a Weibull distribution Y  X 1/ с . Really   Fy (Y )  P(Y  y )  P  X 1/ с  y   P  X  y c   1  e y  1  e c c  y1/ c . (2) The resulting final expression is a function for the Weibull distribution. Let us compare the values of the tails W(c, b) and exponentially distributed random variables with the same probability of their observation    e x   y1/ c   x  y c  x  y  x1/ c . c  y1/ c c e (3) From the relation, y  x1/ c it follows that for values x> 1 and c <1 value y> x, that is, the “tail” of the W(c, b) distribution is heavier than the “tail” of the exponential distribution. “Heavy tail” means that in distributions with heavy tails, significant (large) values of a random variable are more likely (more often) than the same values, for example, in an exponential distribution. The fractal properties of traffic have a significant effect on the characteristics of the network [1-5]. The solution to many problems of the theory and technology of infor- mation networks requires an assessment of such characteristics as the probability of a random value exceeding a given value. These may be tasks related to the estimation of small and very small probabilities of packet loss due to buffer overflow. For such prob- lems, a relatively small part of the random variable values given by the tail of the dis- tribution function becomes decisive [3], [ 4]. An effective means of solving problems of this class is the statistics of extreme val- ues [6]. This is an extrapolation method based on knowledge of the asymptotic behavior of the probability distribution. The "tail" of the probability distribution of values ran- dom variable is approximated by a dependence, the parameters of which are estimated from the results of direct modeling by the Monte Carlo method [7]. With the proper choice of the structure of the extrapolation formula, one can significantly accelerate the computational experiment [8]. The purpose of the paper is to present a solution to the problem of calculating the buffer capacity for "heavy" traffic, based on the admissible probability Pover≤10-6 of its overflow. 2 Node Model and Problem Statement We consider the network node as a queuing system (QS) of class G|G|1|m. To describe the recurrent flow of applications and/or service time, we use the probability distribu- tion with a heavy tail - the Weibull distribution. With the ON-OFF model, the Pareto distribution is more often considered. However, the Weibull distribution with the form 470 parameter c<1 has all finite moments. This makes the distribution W(c, b) suitable for modeling the convergence of tail parts in a heavy multiplexing environment [3]. We set the task of calculating the required buffer capacity as the problem of finding the dependence P  (m) (4) the probability P of failure (loss of claim) on the size of the buffer m (m=0, 1, 2, ...). This will allow for any predetermined admissible 1 loss probability P* to determine the corresponding smallest admissible buffer size m* by the formula following from (4): m*  –1 ( P * ), (5) where -1 is the inverse of the function . Assuming that the function  is monoton- ically decreasing2, we can state that the inverse function -1 exists3. Thus, having solved the main problem, we can, for any given quality level (that is, for any maximum admis- sible probability of losses P*), by the formula (5) determine the smallest buffer size m* that provides it. In solving this problem, we will rely on the method of extreme values (Extreme Value Theory, EVT) [6]. 3 Extreme Value Method The theory of extreme values is based on the postulate that for a large number of initial distributions F(x), the distribution of maxima Fn(x) taken from samples of size n tends to an asymptote of the form lim F n (bn  an )  exp(  exp(  x)), (6) where an and bn are parameters estimated from independent sample values of a contin- uous random variable X. A class of distributions converging to the asymptote (6) is called exponential. Weibull distribution belongs to this class. The EVT method was developed to analyze continuous random variables. The classical expression obtained using EVT for approximating the "tails" of con- tinuous distributions belonging to the exponential class has the form [6] 1 In practice, it is required that the probability of loss of the application is very small (for ex- ample, the admissible probability of losses can be 10 –6–10–9). 2 We believe that the proportion of P requests lost due to lack of queue space cannot increase with increasing buffer size m. 3 The function  is defined in formula (4) on the discrete set {m} = {0, 1, ...} of values of the argument m. Therefore, the inverse function -1, strictly speaking, is defined on a discrete set – the set {P} = { (0),  (1), ...}. 471 1  x  bn  Q( x)  exp  , (7) Eτ n  an  where Eτ is the average length of the sample sequences of values of the random variable X, generated serially; n is a fixed length of a series of sample sequences; an and bn are parameters calculated from the results of the general sample. The accuracy of the approximation can be estimated by comparing the results ob- tained by the EVT method with the results calculated from the corresponding theoreti- cal distributions To this end, a computer model was constructed to simulate Weibull and exponential distributions. The following results were obtained from the experiment with the model sampling. In the Table. 1 shows the results of modeling the “tail” of the Weibull distribution e ( x / b ) with parameters c=0,5; b=1. c Table 1. Weibull distribution tail results x 200 400 600 800 1000 1200 1600 2000 e  ( x / b )c 8,4910-4 4,510-5 4,810-6 7,210-7 1,3610-7 3,010-8 2,0610-9 1,910-10 Simu- 9,110-4 5,510-5 6,110-6 9,210-7 1,710-7 3,6410-8 2,2710-9 1,910-10 lation ( x / m) Table 2 presents the results of modeling the “tail” of the exponential distribution e with parameter m=3,37. Table 2. Exponential distribution tail results x 25 35 45 50 55 60 65 70 e( x / m) 6,110 3,0810 1,5810 3,610-7 -4 -5 -6 8,1610 1,8910 4,210 9,5210-10 -8 -8 -9 Simu- 5,610-4 2,810-5 1,4410-6 3,2410-7 7,310-8 1,6410-8 3,710-9 8,3310-10 lation The results of an experimental study of the use of the EVT method for estimating the tail of known theoretical distributions show a rather high approximation accuracy – for the same tail values the probabilities are of the same order of smallness. Therefore, it can be argued that the function constructed by the EVT method and approximating the tail of the distribution is equivalent to the theoretical function. Also, a comparison of the values of the Weibull “tails” and the exponential distribu- tion for probabilities of the same order of smallness shows how much the Weibull “tail” is heavier than the exponential one. 472 4 The Model of the Process of Losses in the Buffer of the Limit Capacity The process of receipt and servicing of requirements in the QS with a limited queue (hereinafter referred to as the limited buffer – LB ) can be considered as a regenerative process [9] of occupation and discharge of the system. During the busy interval, requirements arrive and leave the system. The regeneration point in the model coincides with the moment the request arrives, which catches the service node unoccupied [9]. A separate busy interval ends when the entire buffer is freed. The number of escalation processes and queue reduction processes occurring in the same occupation period is the same. From the analysis of the QS model with an unlimited queue (hereinafter referred to as the unlimited buffer - UB) [7], it follows that the number of requirements lost in the LB due to the overflow of its capacity is equal to the difference between the maximum value of the queue formed in the UB during the busy period and the value of the LB capacity. Due to the stationary nature of the QS, the analysis of events in one employment interval is equivalent for all intervals. The maximum value of the queue Li in the UB achieved during the escalation process at the i-th employment interval does not depend on the number of escalation and reduc- tion processes, nor the number of requests served in this interval and does not depend on the length of the employment gap. The value of Li depends only on the time intervals between arrivals and service in- tervals. Since these times are random variables, the value Li for  i is also a random variable. When observing an infinite number of employment intervals, it can be argued that the maximum value of the queue L has a probability distribution function of its values. This allows us to finally recognize that the probability of losing requests in an LB with a capacity of N (probability of buffer overflow) is a function of the tail of the distribution of the values of the maximum value L, starting with the value L=N+1. Therefore, if the distribution function for L=N is F(x), then the probability of losses is P loss( N )  f Q( N  1)  f 1  F ( N ). (8) that is, it is a distribution function of the tail of the probabilities of the values of the maximum queue length in the UB. Experimental estimates of the required capacity of the limit buffer The queue in the buffer is a discrete random variable. To describe the probability function of the values of the “tails” of discrete random values, for expression (4) is obtained extension in the form [10] an x  bn  1 Q  x  R exp( ), (9) nK an were an and bn are the coefficients calculated from the sample data; 473 K – the average number of applications received at the entrance of the QS for the regeneration period; n is the number of values in the sequences into which the sampling sequence is com- posed of the maximum values of the queue at the regeneration intervals; R  exp  x an  – the constant R depends on the value of the interval 2x=hi-hi-1 between the values hi and hi-1 of the discrete values x. Let Ploss_ad be the admissible value of the probability of losses due to the excess of the buffer capacity N. We set the value of Ploss_ad and solve (9) concerning the variable x=N. We obtain an expression for estimating the value of the buffer capacity N, which provides the value of the probability of losses Ploss (N) ≤ Ploss_ad (N), in the form  an R   N   an ln    bn  1 (10)  nKPloss_ad  where  x  is the nearest integer not less than x. To use expression (10) to determine the required value of the capacitance of the LB, it is necessary to calculate the coefficients, an and bn from the sample data K , ab, and bn. For this purpose, a simulation model of a single-phase QS with LB was realized. The simulation model reproduced the periods of regeneration of receipt of applications in the buffer and their servicing. The maximum values of the queues and their processing were taken into account. The model corresponded to four types of QS: M|W|1, W|W|1, W|M|1, and M|M|1 with load factors = 0,5; 0,7; 0,9 for each type of QS. The simulation was carried out for the given values of the probability of losses Ploss_ad: 10-4, 10-6, 10-8, 10-10. Tables 1-4 show the simulation results for this source data and PM is the value of the probability of losses obtained from the Monte Carlo simulation with the found values of the buffer capacity N. The corresponding values of Ploss_ad and PM presented in the tables are of the same order of smallness, which confirms the sufficient accuracy of the calculated results by expression (10). Also, a comparative analysis of the calculated data in Tables 1-3 with the data in Table 4 confirms the fact that the packet traffic, in which receiving of requests are described as distribution with heavy tails, requires significantly more significant amounts of the buffer memory in network nodes. Table 3. Simulation results for QS M|W|1 N PM Ploss_ad =0,5 =0,7 =0,9 =0,5 =0,7 =0,9 10-4 42 66 180 6,710-5 1,710-4 1,910-4 10-6 69 113 300 - 3,010-6 - 10-8 94 154 430 - - - 10-10 122 203 576 - - - 474 Table 4. Simulation results for QS W|M|1 N PM Ploss_ad =0,5 =0,7 =0,9 =0,5 =0,7 =0,9 10-4 53 81 198 1,410-4 1,710-4 1,210-4 10-6 81 128 343 6,010-6 2,610-6 4,610-6 10-8 121 179 482 - - - 10-10 150 227 603 - - - Table 5. Simulation results for QS W|W|1 N PM Ploss_ad =0,5 =0,7 =0,9 =0,5 =0,7 =0,9 10-4 83 128 311 1,310-4 1,610-4 2,810-4 10-6 128 217 546 - 1,310-6 5,910-6 10-8 184 279 780 - - - 10-10 242 366 1026 - - - Table 6. Simulation results for QS M|M|1 N PM Ploss_ad =0,5 =0,7 =0,9 =0,5 =0,7 =0,9 10-4 12 21 60 5,910-5 1,110-4 1,310-4 10-6 18 32 103 8,010-7 2,710-6 1,710-6 10-8 23 49 147 3,010-7 - - 10-10 32 61 187 - - - The use of the analytical-statistical method and expression (10) in the case of multi- server traffic allows us to calculate the values of the LB capacity based on the admissi- ble value of the probability of losses Ploss_ad <<10-9. Conclusion Fractal models are difficult to analytically interpret. The presented method for calcu- lating the buffer capacity under the conditions of “heavy” traffic described by the Weibull distribution is relatively simple and efficient. The use of the EVT method in combination with the Monte Carlo method for estimating small and very small values of the probabilities of random variables allows us to reduce the simulation time by 4–5 tens in comparison with the classical Monte Carlo method [8]. The EVT method was developed to approximate extreme statistics of distributions of an exponential class [6]. However, the case of small values of the shape parameter (c<1) is of particular in- terest since Weibull distributions with such parameters occupy an intermediate place between distributions with an exponential decrease in tails (exponential distribution, 475 gamma distribution) and “heavy-tailed” distributions with power-law descending tails of the Zipf-Pareto type [11]. In the same work [11], the possibility of expanding the convergence of the distribu- tions of statistics of a fairly general form, based on samples of random volume, to the Weibull distribution was shown. In our opinion, this is a significant point, since it expands the Weibull distribution capabilities to describe real traffic. References 1. Shelukhin O. I. Multifractals. Infocommunication applications. Moscow: Hot line - Tele- com, (2011). 2. Stolings W. Modern computer networks. 2nd ed. – St. Petersburg: Peter, (2003). 3. Arfeen M.A., Pawlikowski K., McNickle N., Willig A. The role of the Weibull Distribution in Internet traffic modeling. In 25th International Teletraffic Congress (ITC). – Shanghai, 2013. pp. 1–8. DOI: 10. 1109/ITC.2013.6662948 4. Ramirez-Cobo R., Lillo R.E., Wiper M. P. Bayesian analysis of a gueeuing system with a long-tailed arrival process. Communication in Statistics, vol. 37, 697–712 (2008). 5. Kutuzov O. I., Tatarnikova T. M. Model of a self-similar traffic generator and evaluation of buffer storage for classical and fractal queuing system// Moscow Workshop on Electronic and Networking Technologies, MWENT 2018 - Proceedings 1, pp. 1-3, 2018. 6. Galambos Janos. Asymptotic theory of extreme order statistics. Moscow: Nauka, (1984). 7. Kutuzov O. I., Haddad M. Analytical and statistical method for calculating the capacity of the drive / Equipment in ecology and human activity. Materials of the International Confer- ence "IENS - 2002". St. Petersburg, 9 p. (2002). 8. Francesco Bernabei, Roberto Ferretti, Marco Listanti, Giuseppe Zingrillo. A methodology for buffer design in ATM switches // J.European transactions on telecommunications and related technologia vol. 2, no. 4, 367-379 (1991). 9. Crane M., Lemoine O. Introduction to the regenerative method of model analysis. - M.: Nauka, 1982. 10. Haddat Mufid. Development of a method for calculating the characteristics of the exchange of information in wide-band networks for integrated maintenance of automated control sys- tems. The dissertation for the degree of candidate of technical sciences. St. Petersburg State Electrotechnical Institute (LETI). St. Petersburg, 1994. 11. Korolev V.Yu., Sokolov I.A. On the conditions for convergence of distributions of extreme order statistics to the Weibull distribution. Informatics and its application, no3(8), 3-11 (2014).