The Regression Analysis of the Data to Determine the Buffer Size When Serving a Self-Similar Packets Flow

The Regression Analysis of the Data to Determine the Buffer Size When Serving a Self-Similar Packets Flow GennadiyLinets North Caucasus Federal University

2 Kulakova str 355029 Stavropol Russia

RomanVoronkin roman.voronkin@gmail.com North Caucasus Federal University

2 Kulakova str 355029 Stavropol Russia

SvetlanaGovorova North Caucasus Federal University

2 Kulakova str 355029 Stavropol Russia

IlyaPalkanov North Caucasus Federal University

2 Kulakova str 355029 Stavropol Russia

CarlosGrilo carlos.grilo@ipleiria.pt Instituto Polit´écnico de Leiria

Rua General Norton de Matos Apartado 4133 2411-901 Leiria Portugal

The Regression Analysis of the Data to Determine the Buffer Size When Serving a Self-Similar Packets Flow 0E0CAE53EB45A1BDAE9F47EC9F41FEB3 GROBID - A machine learning software for extracting information from scholarly documents Telecommunication network, self-similar traffic, Hurst exponent, Pareto distribution, packet loss, regression analysis, quality metrics, penalty score, machine learning 0000-0002-2279-3887 (Gennadiy Linets) 0000-0002-7345-579X (Roman Voronkin) 0000-0002-3225-1088 (Svetlana Govorova) 0000-0003-0751-3928 (Ilya Palkanov) 0000-0001-9727-905X (Carlos Grilo)

Using the methods of regression analysis on the basis of simulation data, a model for predicting the queue size of the input self-similar packet flow, distributed according to the Pareto law when it is transformed into a flow having an exponential distribution, is constructed. Since the amount of losses in the general case does not give any information about the efficiency of using the buffer memory space in the process of transforming a self-similar packet flow, a quality metric (penalty) was introduced to get the quality of the models after training, which is a complex score. This criterion considers both packet loss during functional transformations and ineffective use of the buffer space in switching nodes. The choice of the best model for predicting the queue size when servicing a self-similar packet flow was carried out using the following characteristics: the coefficient of determination; root-mean-square regression error; mean absolute error; the penalty score. The best in terms of the investigated characteristics are the models using the isotonic regression and the support vector regression.

Introduction

The main reason leading to a buffer overflow is the presence of a long-term dependence in network traffic due to its self-similarity, as a result of which the total cumulative effect in a wide range of delays can significantly differ from that observed in a short-term dependent process [1]. To eliminate self-similarity of network traffic, various models and traffic transformation devices are used, one of which is the asynchronous simulation model described in [2][3][4], for which there is a software implementation [5].

An important indicator of the operation of this model is the queue size used in the traffic transformation process. Since, due to limited computer resources, the queue cannot have an infinite size, the problem arises of predicting the queue size depending on the measure of self-similarity of the input traffic, which is the Hurst exponent.

The solution to the problem of finding the optimal buffer size for a given value of the Hurst exponent H can be found using the methods of regression analysis, based on simulation data obtained using the developed software [5].

Since machine learning includes many methods, at the initial stage, for further comparison with more complex models built, in particular, using deep learning methods, it is advisable to consider only methods of pairwise regression analysis, isotonic regression and support vector machines.

Let us involve a quality metric (penalty), which is a complex score and considers both packet loss during traffic transformation and inefficient use of buffer space.

Next, we choose the best model for predicting the queue size, depending on the Hurst exponent of the input flow, using the following quality metrics:  coefficient of determination;  root mean square error of regression;  mean absolute error;  penalty score value.

When setting the problem, special attention should be paid to testing the resulting models. In this case, the classical approach, which consists in dividing the entire data set into training and test samples, is not acceptable. Since on the test sample we will get the estimated number of lost packets and, therefore, the estimated penalty based on the difference between the predicted and actual buffer sizes, the obtained models must be tested by simulating traffic transformation with a queue size limitation, based on the results of applying the tested model for determining the size of the queue to the sequence being converted. This task is not trivial and will not be discussed in this article.

The solution of the problem

The simulation model presented in [2] provides transformation of the input flow of packets, which is obviously self-similar, into a given distribution law, in particular, into an exponential one. The object of transformation is a one-dimensional distribution density of time intervals between packets of the input flow. Using the developed model, 11,000 tests were carried out and data were obtained for statistical analysis.

Since the amount of losses in the general case does not give any information about the efficiency of using the queue in the process of the transformation traffic, to assess the quality of the resulting model, we introduce a quality metric -the penalty score, which takes into account not only the amount of losses, but also not rational use of buffer memory.

Let define i y as the true value of the queue size in the sample, ˆi y is the predicted value of the queue size in the sample corresponding to the true value i y . If îi yy  , we will penalize the learning system by   îi yy   . If îi yy  , the amount of the penalty will depend on the value of the difference

î i i yy    ,

with i    the amount of the penalty will be   i       and 0 -otherwise. Let us illustrate this with an example (Figure 1). In the second case 2 ŷy      and the penalty is 0, it is assumed that ŷ is the preferred queue size for 2 y . In the third case 3 ŷ y    , the amount of the penalty is determined from the expression

  3 ŷy       .

Thus, the amount of the penalty score will be determined from the equation:

    ˆ, if , ˆ, if , 0, otherwise. i i i i i i i i y y y y p y y y y                    

The total penalty for all trials is determined as the arithmetic mean between the penalties for each trial:

1 1 n i i pp n   

where n is the number of tests. In the process of training the model, it is necessary to ensure the minimum value of the penalty for all tests, in other words min p  . The presented system of penalties provides for the introduction of three hyperparameters: ,   and  , where 0

   and 0  . Let's set the hyperparameter values as follows:

The initial data analysis

Figure 2 shows a scatter plot of queue size in dependence of Hurst exponent. The figure clearly shows that there is a certain correlation between the Hurst exponent and the buffer size [4].

Let us first group the tests by the value of the Hurst exponent and then select 30 groups to estimate the spread of the queue size. Next, we can build a box-plot for each group. It follows from Figure 3 that the largest amount of outliers from above is observed for the first 10 groups, which corresponds to the Hurst exponent value close to 0.5. Consequently, at these values of the Hurst exponent, losses may occur due to the fact that the required buffer size will be greater than the predicted one.

For groups from 28 to 30, there are significant outliers from the bottom, which leads to inefficient use of buffer memory.

The regression analysis

Machine learning is a subset of artificial intelligence that studies and explores algorithms that can learn without direct programming. Linear regression is a typical representative of machine learning algorithms [7].

There are the following tasks solved by machine learning: supervised learning, unsupervised learning, reinforcement learning, active learning, knowledge transfer, etc. Regression (as well as classification) belongs to the class of supervised learning problems, when a certain target variable must be predicted for a given set of features of the observed object. As a rule, in supervised learning problems, experience E is represented as a set of pairs of features and target variables:

    , 1... ii i D x y n 

. In the case of linear regression, the feature description of an object is a real vector m xR  , where R is the set of real numbers and the target variable is a scalar yR  . The simplest measure of the quality L for the regression problem is

    2

ˆ,, L y y y y 

where ŷ is an estimate of the real value of the target variable [7,8].

Let us restore the dependence shown in Figure 2 using the methods of regression analysis. The basis of regression analysis is the method of least squares (OLS), according to which the function   y f x  is taken as the regression equation such that the sum of the squares of the differences would satisfy

  2 1 ˆmin. n ii i S y y     

Using the methods of pairwise regression analysis, we will carry out a statistical analysis of the data obtained by transforming an input self-similar flow distributed according to the Pareto law into a flow having an exponential distribution. Let us examine the methods widely used in practice, which allow finding the buffer size for the input flow with a given Hurst exponent.

The linear regression analysis

In this case, the relationship between the Hurst exponent H and queue size ŷ is determined according to the linear equation: (1) The result of the fitting for the current model is shown on Figure 4. Thus we obtained the statistically significant result. In Table 1 the quality metrics values are shown for the obtained linear regression model. The obtained value of the coefficient of determination suggests that only about 58% of cases of changes in the Hurst exponent lead to a change in the size of the queue within the framework of the linear model. The obtained result is unsatisfactory for practice, therefore, in the simplest case, it makes sense to consider other methods using the methods of linearization of nonlinear dependencies. As a result, the nonlinear dependence can be reduced to linear, and then, the least squares method can be used.

The hyperbolic regression

For the hyperbolic regression, the relationship between H and ŷ can be described as follows: We obtain the regression equation using the least squares method:

H y 379 . 489 438 . 875 ˆ  (2)

The result of the fitting for the current model is shown on Figure 5.

Figure 5:

The first hyperbolic model report is built using the statsmodels package of the Python programming language

Thereby, we obtained the statistically significant result. In Table 2 the quality metrics values are shown for the obtained first hyperbolic regression model. The obtained value of the coefficient of determination suggests that about 45% of cases of changes in the Hurst exponent lead to a change in the size of the queue. This is much worse than the value of the coefficient of determination of the linear model. For this reason, it makes sense to consider a different hyperbolic regression model: The result of the fitting for the current model is shown on Figure 6.

Figure 6:

The second hyperbolic model report is built using statsmodels package of the Python programming language Accordingly we obtained the statistically significant result. In Table 3 the quality metrics values are shown for the obtained second hyperbolic regression model. The obtained value of the coefficient of determination is about 59%, which is slightly better than the linear model.

The power regression

In the case of the power regression, the relationship between H and ŷ is: The exponential function is internally linear, therefore, estimates of the unknown parameters of its linearized form can be calculated using the classical least squares method. The regression equation is:

(4) The result of the fitting for the current model is shown on Figure 7.

Figure 7:

The power model report is built using statsmodels package of the Python programming language Thus we obtained the statistically significant result. In Table 4 the quality metrics values are shown for the obtained power regression model.

53.042

The obtained value of the coefficient of determination is 70%, which is much better than the coefficient of determination of the linear model.

The exponential regression

For the exponential regression, the relationship between H and ŷ is: , In this way we obtained the statistically significant result. In Table 4 the quality metrics values are shown for the obtained exponential regression model. The obtained value of the coefficient of determination indicates that about 74% of cases of changes in the Hurst exponent lead to a change in the size of the queue in the framework of the exponential model, which is the best result when using the methods of paired regression analysis. An analysis of the amount of the penalty gives the same result.

Let us carry out a comparative analysis of the results obtained and then build graphs of the regression equations (1-5) (Figure 9). It is obvious that exponential regression most closely fits the relationship between the Hurst exponent and the buffer size.

Figure 9: The comparative analysis of the results of paired regression analysis

The trivial paired regression models described above do not adequately describe the dependence of the queue size on the Hurst exponent, so we complicate the model. One possible way is the isotonic regression usage.

The isotonic regression

In statistics, isotonic regression or monotonic regression is a method of fitting a free-form line to a sequence of observations under the following constraints: the fitted free-form line should be non-decreasing (or not increasing) over the domain, and should lie as close as possible to the observations [13]. In the process of constructing an isotonic curve, the following problem is solved [13]:

  2 ˆmin, i i i i w y y  

where the value of the weighting factor is 0 i w  . This gives a vector that consists of the non-decreasing elements that are closest in terms of the root mean square error. In practice, this list of elements forms a piecewise linear function.

Let us train the isotonic regression model using the scikit-learn package of the Python 3 programming language [9] and build a graph corresponding to the model built using isotonic regression (Figure 10)

In Table 6 the quality metrics values are shown for the obtained isotonic regression model. The obtained value of the coefficient of determination suggests that about 92% of cases of changes in the Hurst exponent lead to a change in the size of the queue within the framework of this model, which is much better than the models built on the basis of pair regression methods. Moreover, the value of the penalty for isotonic regression is two times less than the corresponding value for paired regression.

The support vector regression

Support Vector Machines (SVM) is a linear algorithm used in classification and regression problems (for regression problems it is called SVR -Support Vector Regression). The main idea of the method is to construct a hyperplane that separates the sampled objects in an optimal way [10][11][12].

Support vector machines maximize the padding of objects, which is closely related to minimizing the likelihood of overfitting. Moreover, it makes it very easy to go to the construction of a nonlinear dividing surface due to the nuclear transition [10,119].

Let us train the model based on SVR. The nonlinear nature of the relationship between the Hurst exponent value and the queue size indicates the need to choose a radial basis kernel for the SVR model. This model was trained using the scikit-learn package of the Python 3 programming language [12]. In Figure 11 the graph of the relationship between queue size and Hurst exponent is shown. In Table 7 the quality metrics values are shown for the obtained support vector regression model. The obtained value of the coefficient of determination is about 90%, which is slightly worse than that of the method using isotonic regression. However, the penalty for this method is less than for isotonic regression.

Comparative analysis of models

The research results are presented in Table 8 for estimating and choosing the best method for predicting queue size from the Hurst exponent.

Based on the data of the pivot table, it can be concluded that the best predictive ability based on the introduced quality metric is a model built using the support vector machine. Within the framework of this study, it can be concluded that the complication of SVR by transition to the rectifying space does not lead to an improvement in the quality of learning.

Conclusions

Thus, we investigated seven models that allow to predict the size of the queue when transforming an input flow with a Pareto distribution into an output flow with an exponential distribution depending on the Hurst exponent of the input flow, built on the basis of regression analysis methods.

The unacceptability of the classical approach, which consists in dividing the entire dataset into training and test samples, is shown. Since, within the framework of the set task, the obtained models must be tested by simulation modeling of traffic transformation with a limit on the queue size, based on the results of applying the tested model for determining the queue size to the sequence being converted.

Since the amount of losses in the general case does not give any information about the efficiency of using the queue in the process of converting traffic, to assess the quality of the resulting model, a penalty was introduced that takes into account not only the amount of losses, but also not rational use of buffer memory.

The best for the selected quality metrics are the isotonic regression and the support vector regression. It managed to to reduce the penalty score for these models by more than two times in comparison with the trivial linear model. The use of these models will make it possible to more efficiently use the buffer space of the RAM of telecommunication network switching nodes. The usage of these models will make it possible to more efficiently use the buffer space of the RAM in telecommunication network switching nodes. Nevertheless, the obtained models do not belong to strong machine learning models, therefore, the additional researches are required using decision trees ensembles and neural networks.

Figure 1 :1Figure 1: Graph of the dependence of the amount of the penalty on the buffer volume

Figure 2 :2Figure 2: The scatter plot of queue size in dependence of the Hurst exponent

Figure 3 :3Figure 3: The box-plot of the 30 analyzed groups

We obtain the regression equation using the least squares method:

Figure 4 :4Figure 4: The linear model report is built using the statsmodels package of the Python programming language

the hyperbolic equation is achieved by replacing 1 H with a new variable, which we denote by z [6]. Then the hyperbolic regression equation takes the form 01 ˆ. y b b z 

Using the least squares method, we obtain the regression equation for this model:

1 b1and belongs to the class of regression models that can be reduced to linear form using transformations[6

1 b1This equation is non-linear with respect to the coefficient and belongs to the class of regression models, which are reduced to a linear form using transformations[6 The exponential function is internally linear; therefore, estimates of the unknown parameters of its linearized form can be calculated using the classical least squares method. The regression equation is: result of the fitting for the current model is shown on the Figure8.

Figure 8 :8Figure 8: The exponential model report is built using statsmodels package of the Python programming language

Figure 10 :10Figure 10: Plotting an isotonic curve to a dataset

Figure 11 :11Figure 11: Plot corresponding to trained support vector machine

Table 11Linear regression model quality metricsQuality MetricValueCoefficient of determination R20.584Root mean square error of regression RMSE130.908Mean absolute error MAE96.808Penalty scorep55.710

Table 22Quality metrics of a hyperbolic regression modelQuality MetricValueCoefficient of determination R20.453Root mean square error of regression RMSE150.218Mean absolute error MAE110.511Penalty scorep63.841

Table 33Quality metrics of the modified hyperbolic modelQuality MetricValueCoefficient of determination R20.591Root mean square error of regression RMSE223.798Mean absolute error MAE77.543Penalty scorep39.537

Table 44Extent regression model quality metricsQuality MetricValueCoefficient of determination R20.699Root mean square error of regression RMSE128.675Mean absolute error MAE72.823Penalty score p

Table 55Exponential regression model quality metricsQuality MetricValueCoefficient of determination R20.745Root mean square error of regression RMSE112.443Mean absolute error MAE65.199Penalty score p46.768

Table 66Isotonic regression quality metricsQuality MetricValueCoefficient of determination R20.928Root mean square error of regression RMSE54.437Mean absolute error MAE39.501Penalty scorep21.269

Table 77Support vector model quality metricsQuality MetricValueCoefficient of determination R20.901Root mean square error of regression RMSE63.868Mean absolute error MAE52.506Penalty score p18.374

Table 88The comparison between considered regression methods for 0.5<H<1Coefficient of determination R2Root mean square error of regression RMSEMean absolute error MAEPenalty score pLinear regression0.584130.90896.80855.710Hyperbolic regression 10.453150.218110.51163.841Hyperbolic regression 20.591223.79877.54339.537Extent regression0.699128.67572.82353.042Exponential regression0.745112.44365.19946.768Isotonic regression0.92854.43739.50121.269Support vector machine SVR0.90163.86852.50618.374

OIShelukhin Fractal processes in telecommunications IShelukhin AMTenyakshev AVAspen <author> <persName><forename type="first">Ed</forename><forename type="middle">O I</forename><surname>Shelukhin</surname></persName> </author> <imprint> <publisher>Radiotekhnika</publisher> <biblScope unit="page" from="2003" to="2479" /> </imprint> </monogr> </biblStruct> <biblStruct xml:id="b2"> <monogr> <title level="m" type="main">Simulation model of asynchronous transformation of self-similar traffic in switching nodes using a queue // Infocommunication technologies GILinets SVGovorova RAVoronkin VPMochalov 2019 17 Security methods for a group of mobile robots according to the requirements of Russian and foreign legislation ASBasan ESBasan MALapina VNKormakova VGLapin 10.1088/1757-899X/873/1/012031 IOP Conference Series: Materials Science and Engineering 873 1 12031 2020 Decrease energy consumption of transport telecommunication networks due to the usage of stage-by-stage controlling procedure GILinets SVMelnikov SVGovorova VVMedenec MALapina CEUR Workshop Proceedings REMS 2018 -Proceedings of the 2018 Multidisciplinary Symposium on Computer Science and ICT 2018 A program for generating a dataset to study the statistical characteristics of a self-similar traffic transformation model GILinets SVGovorova RVoronkin 2019619275 Certificate of state registration of a computer program. Register date 15 10.1007/978-3-662-46221-8 Handbook of Mathematics NBronshtein KASemendyayev GMusiol HMühlig Basic principles of machine learning on the example of linear regression Building machine learning systems in Python PedroLuis WillieCoelho Richart Slinkin A.A. M. 2016 DMK Press 302 2nd edition / translate from English Isotonic regression 2020 BChardin LMassaron ABoschetti Large-scale machine learning with Python / translate from English AVLogunova DMK Press 2018 358 in Russian Python and machine learning / translate from English SRaska A.V. Logunova 2017 DMK Press 418 in Russian Support Vector Regression (SVR) using linear and non-linear kernels Causal isotonic regression TWestling PGilbert MCarone