-

Evaluating Load Adjusted Learning Strategies for Client Service Levels Prediction from Cloud-hosted Video Servers

Obinna Izima

Obinna.Izima@mydit.ie 0

Ruair de Frein

ruairi.defrein@dit.ie 0

Mark Davis

mark.davis@dit.ie 0 0 Dublin Institute of Technology , Ireland

Network managers that succeed in improving the accuracy of client video service level predictions, where the video is deployed in a cloud infrastructure, will have the ability to deliver responsive, SLA-compliant service to their customers. Meeting up-time guarantees, achieving rapid rst-call resolution, and minimizing time-to-recovery after video service outages will maintain customer loyalty. To date, regression-based models have been applied to generate these predictions for client machines using the kernel metrics of a server cluster. The e ect of time-varying loads on cloud-hosted video servers, which arise due to dynamic user requests have not been leveraged to improve prediction using regularized learning algorithms such as the LASSO and Elastic Net and also Random Forest. We evaluate the performance of load-adjusted learning strategies using a number of learning algorithms and demonstrate that improved predictions are achieved irrespective of the learning approach. A secondary bene t of the load-adjusted learning approach is that it reduces the computational cost as long as the load is not constant. Finally, we demonstrate that Random Forest signi cantly improve the prediction performance produced by the best performing linear regression variant, the Elastic Net.

Streaming video content over wired and wireless communication networks will be a major contributor to future internet tra c as can be inferred from [ 1 ] which predicts that the global IP video tra c is estimated to be about 82 percent of all consumer tra c by 2021. To safeguard revenues, network providers must be able to proactively monitor customer experience. The authors of [ 2 ] adopt a Machine Learning (ML) approach in their work on video service-level prediction using the kernel metrics of the server delivering the video which is a rst step to meet this goal.

Fig. 1 depicts the set-up of the system under study in this paper which is representative of real world systems. It is composed of a cloud-based server infrastructure servicing dynamically and time-varying client requests received over a network. Server resources are shared between multiple clients with the video service (RHS) delivering video to target client machines (LHS). The number of users accessing the server resources changes rapidly and this poses a challenge in predicting the target client's video quality. A video server of this form must be able to handle time-varying loads as users can start and stop videos at arbitrary times simultaneously.

We seek a model that characterizes the e ect of the time-varying loads on the client's video quality provided we have knowledge of the kernel metrics of the server delivering the video. Both the client machine and the server clocks are synchronized to match up observations. Samples are drawn from the client and server machines every second. A VLC media player services the Video-onDemand (VoD) requests in [ 2 ] and extracts the RTP packet rate, Video Frame Rate (VFR) and Audio Bu er Rate (ABR), yi at time i. RTP, VFR and ABR are the client's service-level metrics we seek to predict. The System Activity Report (SAR) function on the server extracts the feature set, x. Here, features imply the metrics on the operating system for example, the TCP active connections on the server. The authors of [ 3 ] characterize the system we seek to investigate properly; this investigation examines the e ect of time-varying requests on the system resources.

We contribute adaptive learning techniques which reduce the computational complexity, and provide more accurate predictions over the baseline approach in [ 2 ]. 1. We demonstrate this by considering the performance of linear regression methods and non-linear methods using the baseline approach. The linear methods we evaluate are Linear Regression (LR) and also members of the family of shrinkage methods: Ridge Regression (RR), LASSO and Elastic Net. We compare the performance of the best performing linear method with a non-linear method, Random Forest; 2. We evaluate the e cacy of the load-adjusted (LA) learning (i.e. using the TCP socket count in our learning algorithms to improve performance) on two di erent traces which vary periodically and according to a ashcrowd behaviour; 3. We determine whether the load-adjusted technique works better in linear or non-linear learning algorithms.

This paper is organized as follows. In Section 2, we introduce the load-adjusted learning technique and the Machine Learning (ML) techniques. Section 3 introduces the model tting procedures and the evaluation framework. In Section 4 we evaluate the e cacy of each of the approaches. Section 5 places our contribution in the context of related literature and we conclude our work in Section 6. 2

Learning Strategies

The server in Fig. 1, collects device statistics, x using SAR. The number of active clients, the load signal at time i can be measured with the TCPSCK feature of x. A load generator dynamically allocates client requests for video to the server under two load patterns, a periodic-load pattern and ashcrowd-load pattern. In the periodic-load pattern, clients are started following a Poisson process with an average arrival rate of 30 clients per minute. This arrival rate is modulated using a sinusoidal function with a period of an hour and an amplitude of 20 clients. The ashcrowd-load pattern starts clients with a Poisson process with 5 clients per minute average arrival rate and peaks at randomly created events at a rate of 10 events per hour. During ash events, the average arrival rate swaps to 50 clients per minute for a minute and then gradually reduces to 5 clients per minute within the next 4 minutes.

Using the device statistics computed from SAR, service-level metrics can be computed at the clients. In this paper, we are interested in predicting the (i) Video Frame Rate (VFR), the number of displayed video frames for any time i; (ii) Audio Bu er Rate (ABR), audio bu ers played for time i; and (iii) RTP packet count, the number of received RTP packets in time i. Fig. 2 illustrates a plot of the RTP packet count and the TCPSCK count, the load proxy recorded over a period of 15000 seconds for both load patterns. As can be inferred from Fig. 2, the TCPSCK count rises with an increase in load and may result to a reduction in the RTP packet received at the client because the system has limited resources. 2.1

Load-adjusted Learning (LA)

The e ect of concurrent requests on the server kernel are examined. In Fig. 2, we plot the RTP packet count recorded over 15000 seconds and the TCPSCK kernel parameter over the same period of time. From visual inspection of the plot, the RTP packet count has the same periodic pattern as the TCPSCK. It is also evident that as the load increases, the TCPSCK count increases and may lead to a decrease in the number of RTP packet counts at the client because the system does not have unlimited resources.

Load-adjusted model: Let n represent one video resource currently being used by a client. A server response with respect to its kernel metric, the n-th feature, to one request for a video at time i is the sum of the resource a user has and some deviation signal speci c to a feature [ 4 ]. (1) (2) (3) xi[n] = 2 n + i[n; 1] + i[n; 2]: The deviation from the ideal performance arising from the second user is denoted by i[n; 2]. Assume that at any time, i the number of users requesting the service is K[i]. When there are ve client requests for server resources, K[i] = 5. The response of the n-th feature to the time-varying load is xi[n] = n + i[n];

where i 2 Z; xi[n]; n 2 R:

An additional request for resources by the current user or a new client would invoke a feature response of the form:

K(i) xi[n] = nK[i] + X i[n; k]: k=1 The load signal nK[i] denotes the number of active users at time i times the resources one user uses, n. The TCPSCK is K[i].

Un-adjusted Learning (UA): Previous attempts at predicting service-level metrics from device statistics do not model the e ect of the time-varying load. Problem Statement: Our objective is to learn a model that predicts servicelevel QoS metrics, y: RTP packet rates, video frame rate and audio bu er rate using the features xi[n] given a time varying load K[i]. Using both the ashcrowdload and periodic-load traces, we test the hypothesis that a learning algorithm which takes the load value into consideration produces better predictions than models obtained using algorithms which ignore the load. They assume that K[i] is constant C. 2.2

Machine Learning Techniques

In this section, we brie y introduce the di erent ML techniques we adopted for our experiments.

Linear Regression: We start with Linear Regression (LR), a baseline for many ML techniques. The LR models a linear relationship between the metrics we want to predict y, the dependent variables and the independent variables of predictors x as a linear function of the form: y^i =

N X xi[n] n n=1 where xn[ 1 ] represents the intercept and the remaining features represent the feature space of the predictors. The variable y^i represent the estimates for yi. The model coe cients, which we use for prediction, are n where n = 1; : : : N . LR computes the coe cients that minimize the residual sum of squares (RSS). Ridge Regression (RR): In a second approach, we use Ridge Regression (RR), a variant of the LR which includes an `2-norm loss function on the coe cients, and maintains a small amount of energy in each coe cient [ 5 ]. RR seeks model coe cients that best suit the data by minimizing the RSS in the LR equation with the addition of a regularization parameter Pn n2 ; where 0. (4) (5) Least Absolute Shrinkage and Selection Operator (LASSO): We then apply the LASSO, another LR variant that imposes an `1-norm on the regression coe cients. Similar to the RR, the LASSO solves the LR model with the addition of a regularization parameter Pn j nj, where 0. In contrast to RR, the `1norm in LASSO performs a form of automatic variable selection and continuous shrinkage by forcing some model coe cients to zero and e ectively turning-o some features. The feature space we examine is a high dimensional one and LASSO is known to obtain sparse linear models in such cases [ 6 ]. Elastic Net (EN): We apply the Elastic Net (EN) model owing to its ability to perform shrinkage and variable selection just like the LASSO method. The LASSO is known to su er from poor prediction accuracy due to high correlations between the features especially in cases where the number of observations are greater than the feature space, as is the case in our data set [ 7 ]. The EN combines both the LASSO and RR, that is, it applies a mixture of `1-norm and `2-norm penalties on the coe cients. The EN automates the choice of the regularization parameter and produces sparse solutions just like the LASSO. However, the EN tends to select more features than the LASSO does as the EN overcomes the grouping e ect situation in LASSO. The grouping e ect is a situation where the LASSO tends to select only one feature from a group of features with high pairwise correlations.

Random Forest (RF): We apply a non-linear method, the Random Forest algorithm, which is an ensemble method which builds multiple decision trees and consolidates their results to obtain stable and more accurate predictions. In simple terms, the RF estimates y^ for the metric y using the average predictions from a large number of regression trees [ 8 ]. 3

Experiments, Model Fitting and Evaluation Procedure

We perform model computations using the traces made publicly available in [ 2 ]. All four evaluation frameworks were implemented in RStudio [version1.1.423]. The traces we utilize for our experiments are the periodic-load pattern and ashcrowd-load pattern. The periodic-load contains 51043 observations for 297 features while the ashcrowd-load pattern has 275 features with 15150 observations. We start by pre-processing the data sets to remove all non-numeric and constant value features. Using the ML techniques above we learn models to predict the service-level metrics ABR, VFR and RTP using the device statistics with the un-adjusted method. We perform two sets of experiments. One set of experiments with the periodic-load trace and the second set of experiments with the ashcrowd-load trace.

Un-Adjusted Method (UA): For the UA learning, we adopt the technique used in [ 2 ], we generate the train and test data using any sample from the data regardless of the load value. We also adopt the validation set approach [ 5 ] for all model building and evaluation with 60% of the trace in the training set and 40% in the testing set. The 60/40 split was done for both traces with 60% set aside for training the models and 40% for test data and prediction. Using the UA approach, we learn LR, RR, LASSO and EN models for the service-level metrics.

The regularization techniques, RR, LASSO and EN required a method for selecting the regularization parameter, , for the penalty function. RR, LASSO and EN use an `2-norm, `1-norm and combination of both norms as a weighted (by ) penalty term. The entire path for the results for these models was calculated using path-wise cyclical coordinate descent algorithms. Computationally e cient and e ective approaches for evaluating these convex optimization problems were implemented using the glmnet package in R.

To obtain the value of , we employed 10-fold cross-validation (CV) approach for both learning approaches. This value was used in subsequent learning and prediction experiments. The 10-fold CV was implemented for training the models and during testing using both traces. Di erent values for were determined for both the UA and LA algorithms. A sequence of values between 0:0001 and 1 was selected and cross-validation applied to select optimal values for the regularized models. EN outperformed the other three linear methods evaluated and was adopted for our model performance comparison between the LA and UA models. Load-Adjusted Method (LA): We obtain subsets of the entire data set for which the load value, the TCPSCK, is xed and has more than 500 samples. To ensure that we have enough data to split between train and test samples, we use the top subsets with the most samples in them for the LA learning. Using the validation set approach, we divide the traces into two; 60% of the trace in the training set and 40% in the test set. We apply the same crossvalidation procedures used for the UA models to nd the best values for the EN algorithm using the LA method. We learn EN models for each subset of the data based on the TCPSCK value. We then apply EN to the traces to learn UA prediction models for VFR, ABR and RTP packet rate. We refer to these models as Un-adjusted Elastic Net (UA-EN) models. We learn Load-adjusted Elastic Net (LA-EN) models for our service level metrics with the same number of samples as was done for the UA-EN models.

Using the same samples used for the UA-EN and LA-EN models, we then apply Random Forest to learn Load-Adjusted Random Forest (LA-RF) and Unadjusted Random Forest (UA-RF) models for the service-level metrics.

All models are evaluated in terms of two accuracy measures. The rst is the Root Mean Squared Error (RMSE), computed as q n1 Pin=1 (yi y^i)2. The best model is the model with the lowest RMSE. The second measure is the R-squared which is essentially a statistical measure that explains the goodness of t of our regression models. R-squared achieves this by comparing our regression models with a baseline model, one which is simply the average of of observed responses of the dependent feature. The R-squared is computed as R2 = 1 SSSSET where the Sum of Squared Errors (SSE) in our model is computed as Pin=1 (yi y^i)2 and the Sum of Squared Total (SST) of our baseline model is computed as Pin=1 (yi yi)2. The model with the highest R-squared value is the best, and a model with an R-squared value of 1 is a perfect model. We report only the test RMSE and R-squared values.

Results

We present three results: (1) We compare the performance of the four linear models we examined using the UA approach. (2) We examine the performance of the best performing linear model with the non-linear model, the Random Forest algorithm. (3) Finally, we compare the performance of our LA models with the UA approach.

UA Linear Models: Table 2 lists the performance of the linear models LR, RR, LASSO and the EN on the test data for both the Periodic-load and Flashcrowdload traces using the UA technique. 1. The EN gives the best result out of all four linear models for both traces. The RMSE for all three metrics predicted is lowest for the EN and the R-squared is highest for the EN models in both traces (boldfont in table). 2. The EN performance is closely matched by the LASSO and the LR for both traces across all three metrics. The EN o ers the best prediction accuracy due to its ability to overcome the limitations of LASSO by automatically tuning the loss function based on the data. The results indicate that the EN does well in both traces but seems to o er lower RMSE values for the VFR and ABR using the Flashcrowd trace. 3. The RR algorithm o ers the worst predictions across all three metrics for both traces.

Un-adjusted Method

Model RMSE RMSE

Periodic-load Trace

Linear Regression 14.75 Non-linear

Random Forest 4. The performance of the Random Forest algorithm using the UA approach is listed in Table 2 for both load traces. The RF algorithm o ers a big improvement in RMSE and R-squared over the EN. The RF performance gain over the indicates that non-linear methods perform signi cantly better than linear models as can be inferred from the results.

Comparison of LA and UA: We have listed results for EN and RF models learned using the LA and UA approaches in Table 3. 1. The LA models for EN and RF outperform the UA models for both the Periodic-load and Flashcrowd-load traces across all three service-level metrics. The LA-EN estimates are over 5 audio bu ers/second better than for the UA-EN in both load traces; the LA-EN estimates for VFR and RTP indicate similar improvements in both load traces. The LA-EN o ers similar performance gains for both traces. 2. The RF algorithm o ers better RMSE and R-squared values for the LA technique than the LA-EN. The LA-RF shows a big improvement in estimates. We compare the average RTP prediction to illustrate what the RMSE values imply. For instance, the LA-RF estimates for the RTP using the Flashcrowd trace indicate that the values lie between 90 to 449 RTP packets. The UARF estimates for the same trace o er estimates between 8 to 347 RTP packets/second. True RTP values lie between 83 to 545 RTP packets/second. Expressed in percentages, the average improvement in prediction performance is 50% to 60% better for the LA RF learning. 3. The LA models in the Flashcrowd-load trace o er better prediction metrics than the Periodic-load. Fig. 3 illustrates the accuracy of the LA predictions for the RTP packet/second for both traces for comaprison with the UA predictions.

Flashcrowd LA-EN

True Predicted True Predicted True

Predicted 1 Periodic LA-EN

400 1 Flashcrowd LA-RF

400 500 y 100 500 y 100 500 y 100 500 y 100 1 800 800 800 500 100 500 100 500 100 500 100

Flashcrowd UA-EN

True

Predicted 1 Periodic UA-EN

400 True Predicted True Predicted True

Predicted 1 Flashcrowd UA-RF

400 1 Periodic UA-RF 400 800 800 800 800 1 Periodic LA-RF

True Predicted 400 400 Time (s) 800 1

400 Time (s) Yanggratoke et al.in [ 2 ] applied Machine Learning using a UA approach for service-level prediction from cloud-hosted device statistics. We have demonstrated that it is possible to improve the accuracy of the predictions achieved in [ 2 ]. Similarly, the authors of [ 9 ] investigated the problem of service-level estimation using ML for another cloud hosted service, Voldemort, a Key-Value store. We posit that our LA approach may also work in this scenario.

The authors of [ 10 ] applied a signal processing approach for prediction of service-level metrics from cloud-based device statistics. In their work, the authors developed an initial system load model to aid subsequent service-level prediction. This technique is called load-adjusted learning. It provides the foundation for the approach undertaken in this paper. The load-adjusted technique trains prediction weights conditional on the load value. The work in [ 10 ] was limited to regression models. We have also demonstrated that Random Forest models give better predictions when load adjusted.

Our results are of relevance to networking professionals. Our load adjusted approach is computationally cheap than UA learning. We consider subset of the data based on the load value which improves prediction accuracy and reduces computation. From the perspective of the network service provider, the authors of [ 11 ] evaluated monitoring of the quality of a compressed video transmitted in a lossy packet network using bitstream measurements only. In their work, they adopted the Mean Squared Error (MSE) as an estimation of the video quality. They examined three di erent techniques for MSE estimation NoParse (NP), QuickParse (QP) and FullParse (FP). The FP method extracts detailed information regarding e ects of packet loss on the video; the QP method is only concerned with extracting high-level details about the video bitstream quality and as a result requires less computational time than the FP. The NP method estimates the MSE based using network-level measurements only. They concluded that the FP was the most accurate of the methods examined. In a practical network system spanning multiple Internet Service Providers over a broad geographical area, there may be instances when there are no available measurements for in-network video quality estimation except for the packet loss rate and bitrate; in such cases, the NP could be a handy tool. However, our LA approach using readily available device statistics of the server(s) delivering the video resources can learn the client video without any detailed knowledge of the system or the video.

There is a signi cant momentum behind in the concept of Software De ned Networks (SDN) which will lay the foundation for our future work, particularly, how we will deploy our learning engine. The authors of [ 12 ] proposed an approach to measure di erent video Quality of Experience (QoE) metrics running on client's devices in order to improve QoE. They also explored the possibility of dynamic routing of requests or designation of best available delivery node based on predetermined network conditions. Using a light-weight plugin they created for HTML5 video player, they were able to monitor various QoE factors (e.g. bu ering state and video resolution at target). With these they were able to analyze user-perceived experience while using the video is streaming. This setup points us towards how we might extend our testbed.

The LA improvements in performance makes no additional assumptions about the data. We will investigate how weakened forms of the independence assumption made by these models can improve prediction.

Conclusion

We introduced a method for improving predictions of service-level metrics using a load adjusted learning technique. We provided evidence that the EN algorithm provides the best prediction performance among the linear regression variants using the baseline UA approach. We also presented evidence which shows that the LA learning algorithm improves the UA prediction performance for all three metrics under study. We further demonstrated that the Random Forest predictions outperform the EN estimates using the load adjusted approach. The LA method o ers signi cant improvements in the prediction accuracy and reduces the computational requirements of the system delivering the resources. Acknowledgement. This publication has emanated from research conducted with the nancial support of Science Foundation Ireland (SFI) under the Grant Number 15/SIRG/3459.

1. Cisco Systems 2017. Cisco Visual Networking Index: Global Mobile Data Tra c Forecast Update , 2016 -2021

White

Paper. Cisco , 2017 .

Yanggratoke ,

Ahmed ,

Ardelius ,

Flinta ,

Johnsson ,

Gillblad , and

Stadler . Predicting real-time service-level metrics from device statistics . In IFIP/IEEE Int. Sym. on Int. Net. Man. (IM) , 2015 .

3. R. de Frein. E ect of system load on video service metrics . IEEE Irish Signals & Systems Conference , pages 1 {6 , 2015 .

4. R. de Frein. Take o a load: Load-adjusted video quality prediction and measurement . In IEEE Inter. Conf. on Comp. and IT , pages 1886 { 1894 , Oct 2015 .

Hastie ,

Tibshirani , and

Friedman . The Elements of Statistical Learning . Springer New York Inc., New York, USA, 2001 .

Tibshirani . Regression Shrinkage and Selection Via the LASSO . J. Roy. Stat. Soc. Series B (Methodological) , 58 ( 1 ): 267 { 288 , 1996 .

Zou and

Hastie . Regularization and variable selection via the Elastic-Net . J. Roy. Stat. Soc. , 67 ( 2 ): 301 { 320 , 2005 .

Breiman . Random forests . Machine Learning , 45 ( 1 ):5{ 32 , Oct 2001 .

Stadler ,

Pasquini , and

Fodor . Learning from network device statistics . J. Netw. Syst. Manage. , 25 ( 4 ): 672 { 698 , October 2017 .

10. R. de Frein. Source separation approach to video quality prediction in computer networks . IEEE Comm. Let ., 20 ( 7 ): 1333 {6, Jul . 2016 .

11. A. R. Reibman , V. A.

Vaishampayan , and Y.

Sermadevi . Quality monitoring of video over a packet network . IEEE Trans. on Multim. , 6 ( 2 ): 327 { 334 , April 2004 .

12. H. Nam , K.

Kim , J. Y.

Kim , and H.

Schulzrinne . Towards QoE-aware video streaming using SDN . In 2014 IEEE Globecom , pages 1317 { 1322 , Dec 2014 .