=Paper= {{Paper |id=Vol-2842/paper_8 |storemode=property |title=The Regression Analysis of the Data to Determine the Buffer Size when Serving a Self-similar Packets Flow |pdfUrl=https://ceur-ws.org/Vol-2842/paper_7.pdf |volume=Vol-2842 |authors=Gennadiy Linets,Roman Voronkin,Svetlana Govorova,Ilya Palkanov,Carlos Grilo }} ==The Regression Analysis of the Data to Determine the Buffer Size when Serving a Self-similar Packets Flow== https://ceur-ws.org/Vol-2842/paper_7.pdf
The Regression Analysis of the Data to Determine the Buffer Size
When Serving a Self-Similar Packets Flow

Gennadiy Linets a, Roman Voronkin a, Svetlana Govorova a, Ilya Palkanov a, Carlos Grilo b
a
    North Caucasus Federal University, 2 Kulakova str, Stavropol, 355029, Russia
b
    Instituto Polit´écnico de Leiria, Rua General Norton de Matos, Apartado 4133, 2411-901 Leiria, Portugal

                 Abstract
                 Using the methods of regression analysis on the basis of simulation data, a model for predicting the
                 queue size of the input self-similar packet flow, distributed according to the Pareto law when it is
                 transformed into a flow having an exponential distribution, is constructed. Since the amount of
                 losses in the general case does not give any information about the efficiency of using the buffer
                 memory space in the process of transforming a self-similar packet flow, a quality metric (penalty)
                 was introduced to get the quality of the models after training, which is a complex score. This
                 criterion considers both packet loss during functional transformations and ineffective use of the
                 buffer space in switching nodes. The choice of the best model for predicting the queue size when
                 servicing a self-similar packet flow was carried out using the following characteristics: the
                 coefficient of determination; root-mean-square regression error; mean absolute error; the penalty
                 score. The best in terms of the investigated characteristics are the models using the isotonic
                 regression and the support vector regression.

                 Keywords 1
                 Telecommunication network, self-similar traffic, Hurst exponent, Pareto distribution, packet loss,
                 regression analysis, quality metrics, penalty score, machine learning.

1. Introduction
   The main reason leading to a buffer overflow is the presence of a long-term dependence in network traffic
due to its self-similarity, as a result of which the total cumulative effect in a wide range of delays can
significantly differ from that observed in a short-term dependent process [1]. To eliminate self-similarity of
network traffic, various models and traffic transformation devices are used, one of which is the asynchronous
simulation model described in [2-4], for which there is a software implementation [5].
   An important indicator of the operation of this model is the queue size used in the traffic transformation
process. Since, due to limited computer resources, the queue cannot have an infinite size, the problem arises
of predicting the queue size depending on the measure of self-similarity of the input traffic, which is the
Hurst exponent.
   The solution to the problem of finding the optimal buffer size for a given value of the Hurst exponent H
can be found using the methods of regression analysis, based on simulation data obtained using the
developed software [5].

2. Statement of the problem
   Using machine learning methods, it is necessary to develop a model to predict the queue size depending
on the Hurst exponent value based on the data obtained when performing the transformation of an input self-
similar flow distributed according to the Pareto law into a flow having an exponential distribution.


YRID-2020: International Workshop on Data Mining and Knowledge Engineering, October 15-16, 2020, Stavropol, Russia
EMAIL: kbytw@mail.ru (Gennadiy Linets); roman.voronkin@gmail.com (Roman Voronkin); mitnik2@yandex.ru (Svetlana Govorova);
ilya0693@yandex.ru (Ilya Palkanov); carlos.grilo@ipleiria.pt (Carlos Grilo)
ORCID: 0000-0002-2279-3887 (Gennadiy Linets); 0000-0002-7345-579X (Roman Voronkin); 0000-0002-3225-1088 (Svetlana Govorova);
0000-0003-0751-3928 (Ilya Palkanov); 0000-0001-9727-905X (Carlos Grilo)
            ©️ 2020 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)
    Since machine learning includes many methods, at the initial stage, for further comparison with more
complex models built, in particular, using deep learning methods, it is advisable to consider only methods of
pairwise regression analysis, isotonic regression and support vector machines.
    Let us involve a quality metric (penalty), which is a complex score and considers both packet loss during
traffic transformation and inefficient use of buffer space.
    Next, we choose the best model for predicting the queue size, depending on the Hurst exponent of the
input flow, using the following quality metrics:
     coefficient of determination;
     root mean square error of regression;
     mean absolute error;
     penalty score value.
    When setting the problem, special attention should be paid to testing the resulting models. In this case, the
classical approach, which consists in dividing the entire data set into training and test samples, is not
acceptable. Since on the test sample we will get the estimated number of lost packets and, therefore, the
estimated penalty based on the difference between the predicted and actual buffer sizes, the obtained models
must be tested by simulating traffic transformation with a queue size limitation, based on the results of
applying the tested model for determining the size of the queue to the sequence being converted. This task is
not trivial and will not be discussed in this article.

3. The solution of the problem
    The simulation model presented in [2] provides transformation of the input flow of packets, which is
obviously self-similar, into a given distribution law, in particular, into an exponential one. The object of
transformation is a one-dimensional distribution density of time intervals between packets of the input flow.
Using the developed model, 11,000 tests were carried out and data were obtained for statistical analysis.
    Since the amount of losses in the general case does not give any information about the efficiency of using
the queue in the process of the transformation traffic, to assess the quality of the resulting model, we
introduce a quality metric - the penalty score, which takes into account not only the amount of losses, but
also not rational use of buffer memory.
    Let define yi as the true value of the queue size in the sample, yˆi is the predicted value of the queue size
in the sample corresponding to the true value yi . If yi  yˆi , we will penalize the learning system by
    yi  yˆi  . If yi  yˆi , the amount of the penalty will depend on the value of the difference i  yi  yˆi ,
with i   the amount of the penalty will be    i    and 0 – otherwise. Let us illustrate this with an
example (Figure 1).
                                                   p


                                                                               yi  yˆi 



                               yˆi  yi   


                                                             
                                                        




                                                   y3   y2       ŷ    y1       y

Figure 1: Graph of the dependence of the amount of the penalty on the buffer volume

    Consider three cases, each of which corresponds to the true values of the queue sizes y1 , y2 and y3 .
Suppose the predicted values of the queue sizes coincide in each of the three cases, in other words
yˆ1  yˆ2  yˆ3  yˆ . Then, in the first case, y1  yˆ and the amount of the penalty is determined as    y1  yˆ  .
In the second case ŷ    y2   and the penalty is 0, it is assumed that ŷ is the preferred queue size for
 y2 . In the third case y3  yˆ   , the amount of the penalty is determined from the expression
   ŷ  y3    .
   Thus, the amount of the penalty score will be determined from the equation:
                                              yi  yˆ  , if yi  yˆi ,
                                         
                                    pi       yˆi  yi    , if yˆi  yi   ,
                                                        0, otherwise.
                                         

   The total penalty for all trials is determined as the arithmetic mean between the penalties for each trial:
                                                          1 n
                                                      p   pi
                                                          n i 1
where n is the number of tests. In the process of training the model, it is necessary to ensure the minimum
value of the penalty for all tests, in other words p  min .
   The presented system of penalties provides for the introduction of three hyperparameters:  ,  and  ,
where     0 and   0 . Let's set the hyperparameter values as follows:                                  .

3.1. The initial data analysis
   Figure 2 shows a scatter plot of queue size in dependence of Hurst exponent. The figure clearly shows
that there is a certain correlation between the Hurst exponent and the buffer size [4].
   Let us first group the tests by the value of the Hurst exponent and then select 30 groups to estimate the
spread of the queue size.




Figure 2: The scatter plot of queue size in                 Figure 3: The box-plot of the 30 analyzed groups
dependence of the Hurst exponent

   Next, we can build a box-plot for each group. It follows from Figure 3 that the largest amount of outliers
from above is observed for the first 10 groups, which corresponds to the Hurst exponent value close to 0.5.
Consequently, at these values of the Hurst exponent, losses may occur due to the fact that the required buffer
size will be greater than the predicted one.
   For groups from 28 to 30, there are significant outliers from the bottom, which leads to inefficient use of
buffer memory.

3.2. The regression analysis
   Machine learning is a subset of artificial intelligence that studies and explores algorithms that can learn
without direct programming. Linear regression is a typical representative of machine learning algorithms [7].
   There are the following tasks solved by machine learning: supervised learning, unsupervised learning,
reinforcement learning, active learning, knowledge transfer, etc. Regression (as well as classification)
belongs to the class of supervised learning problems, when a certain target variable must be predicted for a
given set of features of the observed object. As a rule, in supervised learning problems, experience E is
represented as a set of pairs of features and target variables: D   xi , yi   1... n . In the case of linear
                                                                                  i
regression, the feature description of an object is a real vector x  R , where R is the set of real numbers and
                                                                             m

the target variable is a scalar y  R . The simplest measure of the quality L for the regression problem is
                                                L  y, yˆ    y  yˆ  ,
                                                                       2


where ŷ is an estimate of the real value of the target variable [7, 8].
    Let us restore the dependence shown in Figure 2 using the methods of regression analysis.
    The basis of regression analysis is the method of least squares (OLS), according to which the function
 y  f  x  is taken as the regression equation such that the sum of the squares of the differences would
satisfy
                                                   n
                                              S    yi  yˆi   min.
                                                                 2

                                                  i 1
   Using the methods of pairwise regression analysis, we will carry out a statistical analysis of the data
obtained by transforming an input self-similar flow distributed according to the Pareto law into a flow having
an exponential distribution. Let us examine the methods widely used in practice, which allow finding the
buffer size for the input flow with a given Hurst exponent.

3.3. The linear regression analysis

   In this case, the relationship between the Hurst exponent H and queue size ŷ is determined according to
the linear equation:
                                                     yˆ  b0  b1H .
   We obtain the regression equation using the least squares method:
                                                                                                        (1)
   The result of the fitting for the current model is shown on Figure 4.




Figure 4: The linear model report is built using the statsmodels package of the Python programming
language

    Thus we obtained the statistically significant result. In Table 1 the quality metrics values are shown for
the obtained linear regression model.

Table 1
Linear regression model quality metrics
                   Quality Metric                                                 Value
          Coefficient of determination R2                                          0.584
    Root mean square error of regression RMSE                                    130.908
             Mean absolute error MAE                                              96.808
                  Penalty score p                                                 55.710
    The obtained value of the coefficient of determination suggests that only about 58% of cases of changes
in the Hurst exponent lead to a change in the size of the queue within the framework of the linear model. The
obtained result is unsatisfactory for practice, therefore, in the simplest case, it makes sense to consider other
methods using the methods of linearization of nonlinear dependencies. As a result, the nonlinear dependence
can be reduced to linear, and then, the least squares method can be used.

3.4. The hyperbolic regression

   For the hyperbolic regression, the relationship between H and ŷ can be described as follows:
                                                             b
                                                    yˆ  b0  1 .
                                                             H
                                                                           1
   The linearization of the hyperbolic equation is achieved by replacing     with a new variable, which we
                                                                          H
denote by z [6]. Then the hyperbolic regression equation takes the form yˆ  b0  b1z. We obtain the
regression equation using the least squares method:
                                                             489.379
                                            yˆ  875.438                                                     (2)
                                                               H
   The result of the fitting for the current model is shown on Figure 5.




Figure 5: The first hyperbolic model report is built using the statsmodels package of the Python
programming language

   Thereby, we obtained the statistically significant result. In Table 2 the quality metrics values are shown
for the obtained first hyperbolic regression model.

Table 2
Quality metrics of a hyperbolic regression model
                    Quality Metric                                              Value
          Coefficient of determination R2                                      0.453
    Root mean square error of regression RMSE                                 150.218
             Mean absolute error MAE                                          110.511
                   Penalty score p                                            63.841

   The obtained value of the coefficient of determination suggests that about 45% of cases of changes in the
Hurst exponent lead to a change in the size of the queue. This is much worse than the value of the coefficient
of determination of the linear model. For this reason, it makes sense to consider a different hyperbolic
regression model:
                                                           1
                                                  yˆ           .
                                                       b0  b1H
   Using the least squares method, we obtain the regression equation for this model:
                                                           1
                                           yˆ                                                             (3)
                                                  0.0399  0.0397  H
   The result of the fitting for the current model is shown on Figure 6.




Figure 6: The second hyperbolic model report is built using statsmodels package of the Python
programming language

   Accordingly we obtained the statistically significant result. In Table 3 the quality metrics values are
shown for the obtained second hyperbolic regression model.

Table 3
Quality metrics of the modified hyperbolic model
                    Quality Metric                                         Value
          Coefficient of determination R2                                  0.591
    Root mean square error of regression RMSE                            223.798
             Mean absolute error MAE                                      77.543
                   Penalty score p                                         39.537
    The obtained value of the coefficient of determination is about 59%, which is slightly better than the
linear model.

3.5. The power regression

   In the case of the power regression, the relationship between H and ŷ is:
                                                       yˆ  b0 H b1 .
    This equation is nonlinear in the coefficient b1 and belongs to the class of regression models that can be
reduced to linear form using transformations [6]
                                                 ln y  ln b0  b1 ln H .
    The exponential function is internally linear, therefore, estimates of the unknown parameters of its
linearized form can be calculated using the classical least squares method. The regression equation is:
                                                                                                           (4)
    The result of the fitting for the current model is shown on Figure 7.
Figure 7: The power model report is built using statsmodels package of the Python programming language

    Thus we obtained the statistically significant result. In Table 4 the quality metrics values are shown for
the obtained power regression model.

Table 4
Extent regression model quality metrics
                    Quality Metric                                            Value
           Coefficient of determination R2                                    0.699
    Root mean square error of regression RMSE                               128.675
             Mean absolute error MAE                                         72.823
                   Penalty score p                                           53.042

   The obtained value of the coefficient of determination is 70%, which is much better than the coefficient
of determination of the linear model.

3.6. The exponential regression

   For the exponential regression, the relationship between H and ŷ is:
                                                         yˆ  b0eb1H .
    This equation is non-linear with respect to the coefficient b1 and belongs to the class of regression
models, which are reduced to a linear form using transformations [6]:
                                                 ln yˆ  ln b0  H ln b1.
    The exponential function is internally linear; therefore, estimates of the unknown parameters of its
linearized form can be calculated using the classical least squares method. The regression equation is:
                                                      yˆ  2.926  e 5.089H                            (5)
    The result of the fitting for the current model is shown on the Figure 8.

   , In this way we obtained the statistically significant result. In Table 4 the quality metrics values are
shown for the obtained exponential regression model.
Figure 8: The exponential model report is built using statsmodels package of the Python programming
language

Table 5
Exponential regression model quality metrics
                   Quality Metric                                               Value
          Coefficient of determination R2                                      0.745
    Root mean square error of regression RMSE                                 112.443
             Mean absolute error MAE                                          65.199
                  Penalty score p                                             46.768

   The obtained value of the coefficient of determination indicates that about 74% of cases of changes in the
Hurst exponent lead to a change in the size of the queue in the framework of the exponential model, which is
the best result when using the methods of paired regression analysis. An analysis of the amount of the
penalty gives the same result.
   Let us carry out a comparative analysis of the results obtained and then build graphs of the regression
equations (1-5) (Figure 9). It is obvious that exponential regression most closely fits the relationship between
the Hurst exponent and the buffer size.




Figure 9: The comparative analysis of the results of paired regression analysis
   The trivial paired regression models described above do not adequately describe the dependence of the
queue size on the Hurst exponent, so we complicate the model. One possible way is the isotonic regression
usage.

3.7. The isotonic regression
   In statistics, isotonic regression or monotonic regression is a method of fitting a free-form line to a
sequence of observations under the following constraints: the fitted free-form line should be non-decreasing
(or not increasing) over the domain, and should lie as close as possible to the observations [13]. In the
process of constructing an isotonic curve, the following problem is solved [13]:
                                             i wi  yi  yˆi   min,
                                                            2


where the value of the weighting factor is wi  0 . This gives a vector that consists of the non-decreasing
elements that are closest in terms of the root mean square error. In practice, this list of elements forms a
piecewise linear function.
   Let us train the isotonic regression model using the scikit-learn package of the Python 3 programming
language [9] and build a graph corresponding to the model built using isotonic regression (Figure 10)
   In Table 6 the quality metrics values are shown for the obtained isotonic regression model.

Table 6
Isotonic regression quality metrics
                    Quality Metric                                           Value
           Coefficient of determination R2                                  0.928
     Root mean square error of regression RMSE                              54.437
              Mean absolute error MAE                                       39.501
                   Penalty score p                                          21.269

   The obtained value of the coefficient of determination suggests that about 92% of cases of changes in the
Hurst exponent lead to a change in the size of the queue within the framework of this model, which is much
better than the models built on the basis of pair regression methods. Moreover, the value of the penalty for
isotonic regression is two times less than the corresponding value for paired regression.




Figure 10: Plotting an isotonic curve to a dataset
3.8. The support vector regression
    Support Vector Machines (SVM) is a linear algorithm used in classification and regression problems (for
regression problems it is called SVR - Support Vector Regression). The main idea of the method is to
construct a hyperplane that separates the sampled objects in an optimal way [10-12].
    Support vector machines maximize the padding of objects, which is closely related to minimizing the
likelihood of overfitting. Moreover, it makes it very easy to go to the construction of a nonlinear dividing
surface due to the nuclear transition [10, 119].
    Let us train the model based on SVR. The nonlinear nature of the relationship between the Hurst
exponent value and the queue size indicates the need to choose a radial basis kernel for the SVR model. This
model was trained using the scikit-learn package of the Python 3 programming language [12]. In Figure 11
the graph of the relationship between queue size and Hurst exponent is shown.




Figure 11: Plot corresponding to trained support vector machine

   In Table 7 the quality metrics values are shown for the obtained support vector regression model.

Table 7
Support vector model quality metrics
                   Quality Metric                                            Value
          Coefficient of determination R2                                   0.901
    Root mean square error of regression RMSE                               63.868
             Mean absolute error MAE                                        52.506
                  Penalty score p                                           18.374


   The obtained value of the coefficient of determination is about 90%, which is slightly worse than that of
the method using isotonic regression. However, the penalty for this method is less than for isotonic
regression.

3.9. Comparative analysis of models
   The research results are presented in Table 8 for estimating and choosing the best method for predicting
queue size from the Hurst exponent.
    Based on the data of the pivot table, it can be concluded that the best predictive ability based on the
introduced quality metric is a model built using the support vector machine. Within the framework of this
study, it can be concluded that the complication of SVR by transition to the rectifying space does not lead to
an improvement in the quality of learning.

Table 8
The comparison between considered regression methods for 0.5