4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) Predicting Quality of Service (QoS) Parameters using Extreme Learning Machines with Various Kernel Methods Lov Kumar Santanu Kumar Rath Ashish Sureka NIT Rourkela, India NIT Rourkela, India ABB Corporate Research, India lovkumar505@gmail.com skrath@nitrkl.ac.in ashish.sureka@in.abb.com Abstract—Web services which are language and platform and estimate maintainability by analyzing the source code independent self-contained web-based distributed application [3][4][5][6]. The work presented in this paper is motivated by components represented by their interfaces can have different the need to investigate the correlation between QoS attributes Quality of Service (QoS) characteristics such as performance, reliability and scalability. One of the major objectives of a such as response time, availability, throughput, reliability, web service provider and implementer is to be able to estimate modularity, testability and interoperability and source code and improve the QoS parameters of their web service as its metrics such as classic object oriented metrics (Chidamber and clients application are dependent on the overall quality of Kemerer) as well as other well-known metrics such as Baski & the service. We hypothesize that the QoS parameters have a Misra and Harry M. Sneed metrics. Specifically, our research correlation with several source code metrics and hence can be estimated by analyzing the source code. We investigate the aim is to study the correlation between 15 web service quality predictive power of 37 different software metrics (Chidamber attributes and 37 source code metrics and then build machine and Kemerer, Harry M. Sneed, Baski & Misra) to estimate learning based predictive models for estimating the quality of 15 QoS attributes. We develop QoS prediction models using a given service based on the computed source code metrics. Extreme Learning Machines (ELM) with various kernel methods. Our aim is to conduct experiments on a real-world dataset and Since the performance of the classifiers depends on the software metrics that are used to build the prediction model, we also also examine the extent to which feature selection techniques examine two different feature selection techniques i.e., Principal such as Principal Component Analysis (PCA) and Rough Set Component Analysis (PCA), and Rough Set Analysis (RSA) for Analysis (RSA) can be used for dimensionality reduction and dimensionality reduction and removing irrelevant features. The filter irrelevant features. performance of QoS prediction models are compared using three different types of performance parameters i.e., MAE, MMRE, II. R ELATED W ORK , R ESEARCH C ONTRIBUTIONS AND RMSE. Our experimental results demonstrate that the model R ESEARCH F RAMEWORK developed by extreme learning machine with RBF kernel achieves better results as compared to the other models in terms of the Related Work: Coscia et al. investigate the potential of obtain- predictive accuracy. ing more maintainable services by exploiting Object-Oriented Index Terms—Extreme Learning Machines, Predictive Mod- metrics (OO) values from the source code implementing eling, Quality of service (QoS) Parameters, Software Metrics, Source Code Analysis, Web Service Definition Language (WSDL) services [3]. Their approach proposed the use of OO metrics as early indicators to guide software developers towards obtaining more maintainable services [3]. Coscia and Crasso et al. present a statistical correlation analysis demonstrating that I. R ESEARCH M OTIVATION AND A IM classic software engineering metrics (such as WMC, CBO, Web services are distributed web application components RFC, CAM, TPC, APC and LCOM) can be used to predict the which can be implemented in different languages, deployed most relevant quality attributes of WSDL documents [4]. Ma- on different client and server platforms, are represented by teos et al. found that there is a high correlation between well- interfaces and communicate using open protocols [1][2]. Web known object-oriented metrics taken in the code implementing service implementers and providers need to comply with com- services and the occurrences of anti-patterns in their WSDLs mon web service standards so that they can be language and [5]. Kumar et al. use different object-oriented software metrics platform independent and can be discovered and used by other and Support Vector Machines with different type of kernels for applications [1][2]. Applications and business solutions using predicting maintainability of services [6]. Their experimental web services (which integrate and combine several services) results demonstrate that maintainability of SOC paradigm can expect high Quality of Service (QoS) such as performance, be predicted by application of 11 object-oriented metrics [6]. scalability and reliability as their application is dependent Olatunji et al. develop an extreme learning machine (ELM) on the service. Measuring quality of service attributes and maintainability prediction model for objectoriented software characteristics of web services and understanding their rela- systems [7]. tionship with source code metrics can help developers control 27 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) Research Contributions: The main research contribution tools [8][9]. Al-Masri et al. collect the Web services using their of the study presented in this paper is the application of Web Service Crawler Engine (WSCE) and majority of the Web 37 source code metrics (Chidamber and Kemerer, Harry services are obtained from public sources. We observe that 524 M. Sneed, Baski & Misra) for predicting 15 Quality of out of 2507 Web Service have their corresponding WSDL file. Service (QoS) or maintainability parameters for web services Baski et al. present a suite of metrics to evaluate the quality of by employing Extreme Learning Machines (ELM) using the XML web service in terms of its maintainability [10]. We various kernel methods and two feature selection techniques apply the Baski and Misra metrics suite tool on the 524 WSDL (Principal Component Analysis and Rough Set Analysis). files and obtained successful parsing for 200 files. We use the To the best of our knowledge, the research presented in this metrics proposed by Baski et al. as predictor variables. We paper is the first such in-depth empirical study on publicly could not include 324 WSDL files as part of our experimental available well-known dataset. dataset as we were unable to parse them for computing Baski and Misra metrics. Hence, we finally use 200 Web services Research Framework: Figure 1 displays our research frame- for the experiments presented in this paper. Redistribution work and methodology. The framework consists of multiple of the data on the web is not permitted according to the steps. As shown in Figure 1, we first compute the QoS dataset usage guidelines and hence we provide a list2 of the parameters for the web services in our dataset. We compute 37 200 Web services used in our study so that our research can source code metrics belonging to 3 different metrics suite. We be reproduced and replicated for benchmark or comparison. apply two different feature selection methodology (Rough Set Figure 2 shows a scatter plot for the number of Java files Analysis and Principal Component Analysis) for the purpose for the 200 WSDL files in our dataset. The X-axis represents of dimensionality reduction and removing irrelevant features. the WSDL File ID and the Y-axis represent the number of We apply Extreme Learning Machines (ELM) with three Java files. Figure 2 shows that there are several web services different kernel functions (linear, polynomial and RBF). We implemented using more than 100 Java files. create 6 sets of metrics suite, 2 feature selection techniques and 3 kernel functions and evaluate the performance of all 700 the combinations resulting in a comprehensive and in-depth 600 experimental evaluation. Finally, we evaluate the performance 500 of various models using wide used estimator evaluation met- No. of Classes 400 rics and conduct statistical tests to identify best learning 300 algorithms. 200 III. E XPERIMENTAL DATASET 100 1 We use a subset of QWS Dataset for our experimental anal- 0 0 20 40 60 80 100 120 140 160 180 200 WSDL ID ysis. The QWS Dataset provided by Al-Masri et al. includes a sets of 2507 Web services and their 9 QWS parameters (such Fig. 2. Scatter Plot for the Number of Java Files for the 200 WSDL Files in as response time, availability, throughput, compliance and Experimental Dataset latency) which are measured using Web service benchmark 1 http://www.uoguelph.ca/∼qmahmoud/qws/ 2 http://bit.ly/1S8020w Fig. 1. Research Methodology and Framework 28 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) IV. D EPENDENT VARIABLES Q O S PARAMETERS to generate the bytecode for computing the size and structure Table I shows the descriptive statistics of 9 QoS param- software metrics using the CKJM extended tool. The minimum eters provided by the creators of QWS dataset. The owners number of Java files are 7 and the maximum is 605. The mean, of QWS dataset provide QoS parameter values for all the median, standard deviation, skewness and kurtosis is 52.39, 2507 web services. However, Table I displays the descriptive 45.50, 59.06, 5.43 and 43.94 respectively. Table III displays statistics computed by us for the 200 web services used in the descriptive statistics for 19 size and structure software our experimental dataset. Table I reveals substantial variation metrics computed using CKJM Extended Tool for the 200 Web or dispersion in the parameter values across 200 web services services in our dataset. The mean value of AMC as 61.94 which shows variability in the quality across services. Sneed means that the mean of the average method size calculated et al. describes a tool supported method for measuring web in terms of the number of Java binary codes in the method service interfaces [11]. The extended version of their tool can for each class is 62. We compute the standard deviation for be used to compute maintainability, modularity, reusability, all the 19 metrics to quantify the amount of dispersion and testability, interoperability and conformity of web services. We spread in the values. We observe (refer to Table III) that few calculate these values for the 200 web services in our dataset metrics such as DIT, NOC, MFA, CAM, IC and CBM have and assign them as dependent variables. Table II displays low standard deviation which means that the data points are the descriptive statistics for the QoS parameters calculated close to the mean. However, we observe that LCOM, LCO, using Sneed’s Tool. Hence, we have a total of 15 dependent AMC and CC have relatively high values of standard deviation variables. which means that the data points are dispersed over a wider range of values. TABLE I D ESCRIPTIVE STATISTICS OF Q O S PARAMETERS PROVIDED BY QWS TABLE III DATASET D ESCRIPTIVE STATISTICS OF O BJECT-O RIENTED M ETRICS Parameter Min Max Mean Median Std Dev Skewness Kurtosis Metrics Min Max Mean Median Std Dev Skewness Kurtosis Response Time 57.00 1664.62 325.11 252.20 289.33 3.15 13.12 WMC 9.48 13.57 11.01 10.96 0.48 0.81 6.54 Availability 13.00 100.00 86.65 89.00 12.57 -2.63 12.06 DIT 0.87 1.02 0.98 0.98 0.02 -2.07 9.72 Throughput 0.20 36.90 7.04 4.00 6.94 1.57 5.79 NOC 0.00 0.13 0.01 0.01 0.02 2.64 12.55 Successability 14.00 100.00 90.19 96.00 13.61 -2.57 10.81 CBO 4.10 12.55 10.70 11.01 1.33 -1.15 5.18 Reliability 33.00 83.00 66.64 73.00 9.61 -0.60 2.96 RFC 12.78 44.55 40.35 41.48 4.13 -3.13 15.15 Compliance 67.00 100.00 92.19 100.00 9.78 -0.90 2.61 LCOM 74.03 405.49 120.94 108.67 45.70 2.99 13.96 Best Practices 57.00 93.00 78.81 82.00 7.70 -0.68 2.67 Latency 0.74 1337.00 42.81 12.20 106.23 9.56 112.68 Ca 0.64 3.92 2.91 2.99 0.62 -0.50 2.72 Documentation 1.00 96.00 29.37 32.00 26.97 1.06 3.31 Ce 3.49 9.50 8.24 8.37 0.90 -1.30 6.01 NPM 4.88 9.27 6.55 6.47 0.48 1.07 7.90 LCOM3 1.18 1.50 1.32 1.31 0.06 0.27 2.75 LCO 76.14 493.64 399.18 411.50 54.03 -2.61 11.71 DAM 0.21 0.45 0.37 0.37 0.04 -0.45 4.53 TABLE II MOA 0.02 2.28 0.60 0.53 0.28 2.00 11.03 D ESCRIPTIVE STATISTICS OF Q O S PARAMETERS CALCULATED USING MFA 0.00 0.02 0.00 0.00 0.00 1.92 8.43 S NEEDS T OOL CAM 0.39 0.43 0.40 0.40 0.01 0.22 4.54 IC 0.00 0.05 0.01 0.01 0.01 0.79 3.68 Parameter Min Max Mean Median Std Dev Skewness Kurtosis CBM 0.00 0.05 0.01 0.01 0.01 0.79 3.68 Maintainability 0.00 77.67 31.07 28.17 24.29 0.37 2.02 AMC 7.68 82.86 61.94 64.37 10.75 -1.69 6.97 Modularity 0.10 0.81 0.22 0.17 0.13 2.02 7.10 CC 18.17 71.39 42.77 43.74 9.58 -0.29 3.67 Reusability 0.10 0.90 0.38 0.35 0.17 0.32 2.94 Testability 0.10 0.66 0.19 0.16 0.09 2.58 10.71 Interoperability 0.14 0.90 0.51 0.41 0.23 0.65 2.01 Conformity 0.43 0.98 0.79 0.87 0.15 -0.47 1.57 TABLE IV D ESCRIPTIVE STATISTICS OF H ARRY M. S NEED ’ S M ETRICS S UITE Metrics Min Max Mean Median Std Dev Skewness Kurtosis V. P REDICTOR VARIABLES - S OURCE C ODE M ETRICS Data Complexity Relation Complexity 0.10 0.10 0.81 0.90 0.28 0.87 0.27 0.90 0.17 0.07 0.60 -7.72 2.59 83.58 Format Complexity 0.14 0.72 0.60 0.64 0.09 -1.05 5.24 Chidamber and Kemerer Metrics: We compute several Structure Complexity 0.15 0.90 0.61 0.63 0.17 -0.13 2.63 size and structure software metrics from the bytecode of Data Flow Complexity Language Complexity 0.10 0.16 0.90 0.88 0.87 0.61 0.90 0.56 0.10 0.21 -5.64 0.03 39.14 1.78 the compiled Java files in our experimental dataset using Object Point 42.00 4581.00 299.32 200.00 483.31 5.67 41.67 Data Point 29.00 3124.00 222.75 152.00 347.48 5.16 34.85 CKJM extended3 [12][13]. CKJM extended is an extended Function Point 6.00 776.00 53.73 32.00 94.21 5.36 35.33 Major Rule Violation 2.00 109.00 26.39 10.00 26.13 0.62 1.97 version of tool for calculating Chidamber and Kemerer Java Medium Rule Violation 2.00 16.00 5.02 5.00 1.94 0.56 6.71 Metrics and many other metrics such as weighted methods per Minor Rule Violation 2.00 586.00 49.51 35.50 63.14 4.26 30.60 class, coupling between object classes, lines of code, measure Harry M. Sneed Metrics: Sneed’s tool implements metrics of functional abstraction, average method complexity and for quantity, quality and complexity of web service interfaces. McCabe’s Cyclomatic Complexity. We use the WSDL2Java The values of all the metrics are statically computed from a Axis2 code generator4 which comes built-in with an Eclipse service interface in WSDL as the suite of metrics is based on plug-in to generate Java class files from the 200 WSDL files the WSDL schema element occurrences [11]. We compute in our experimental dataset. We then compile the Java files six interface complexity metrics for all the 200 web services 3 http://gromit.iiar.pwr.wroc.pl/p inf/ckjm/ in our dataset. The six interface complexity metrics are 4 https://axis.apache.org/axis2/java/core/tools/eclipse/wsdl2java-plugin.html computed between a scale of 0.0 to 1.0. A value between 0.0 29 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) TABLE V VI. C ODE M ETRICS - C ORRELATION A NALYSIS D ESCRIPTIVE STATISTICS OF BASKI AND M ISRA M ETRICS S UITE We compute the association between 37 metrics consisting Metrics Min Max Mean Median Std Dev Skewness Kurtosis OPS 0.00 108.00 7.76 5.00 13.74 5.41 35.37 of dependent and independent variables using the Pearson’s DW 0.00 2052.00 114.63 62.00 216.46 6.56 53.80 correlations coefficient (r). The coefficient of correlation r MDC 0.00 17.00 4.40 5.00 2.62 1.97 8.87 measures the strength and direction of the linear relationship DMR 0.00 1.00 0.53 0.50 0.22 0.50 3.32 ME 0.00 3.80 1.73 2.12 0.73 -0.43 3.58 between two variables. Figure 3 displays our experimental MRS 0.00 72.00 3.65 2.60 6.25 7.99 79.12 results on correlation analysis between the 37 metrics. In Figure 3, a Black circle represents an r value between 0.7 and 1.0 indicating a strong positive linear relationship. A and 0.4 represents low complexity and a value between 0.4 white circle r rvalue between 0.3 and 0.7 indicate a weak and 0.6 indicates average complexity. A value of more than positive linear relationship. A black square r represents a 0.6 falls in the range of high complexity wherein any value value between −1 and −0.7 indicating a strong negative linear above 0.8 reveals that there are major issues with the code relationship. A white square r represents a value between design [4][11]. Table IV shows the minimum, maximum, −0.7 and −0.3 indicating a weak negative linear relationship. mean, median and standard deviation of the size complexity A blank circle represents no linear relationships between the values for all the web services on our dataset. In addition to two variables. For example, based on Figure 3, we infer that 6 interface complexity metrics, we measure 6 more metrics there is a strong positive linear relationship between OPS using the extended version of the tool provided to us by and four other variables MRS, OP, DP and FP. On the other the author himself: object point, data point, function point, hand, we observe a weak linear relationship between ILC and major, medium and minor rule violations. Table IV displays IDC as well as ISC and IDC. Figure 3 reveals association the descriptive statistics for the 12 metrics for all the web between different suite of metrics and not just associations services in our dataset. between metrics within the same suite. For example, DMR is part of Baski and Misra metrics suite. DMR has a strong Baski and Misra Metrics: We compute 6 metrics proposed negative correlation with ISC (Structure Complexity), OP by Baski and Misra [10]. Their metrics are based on the (Object Point), DP (Data Point), FP (Function Point) and analysis of the structure of the exchanged messages described MERV (Medium Rule Violation) which is part of Harry M. in WSDL which becomes the basis for measuring the data Sneed metrics suite. complexity. Their metric suitbe is based on WSDL and XSD schema elements occurrences. Table V reveals the descriptive VII. F EATURE E XTRACTION AND S ELECTION USING PCA AND RSA statistics of the 6 metrics: Data Weight of a WSDL (DW), Dis- tinct Message Ratio (DMR), Distinct Message Count (DMC), We investigate the application of Principal Component Message Entropy (ME), Message Repetition Scale (MRS) and Analysis (PCA) and Rough Set Analysis (RSA) as a data pre- Operations Per Service (OPS). processing step for feature extraction and selection [14]. Our CC AMC CBM IC CAM MFA MOA DAM LCO LCOM3 NPM Ce Ca LCOM RFC CBO NOC DIT WMC MiRV MeRV MRV FP DP OP ILC IDFC ISC IFC IRC IDC MRS ME DMR MDC DW OPS S C C C C C P P MP M V M V W V C IT C O L FC M a N e L PM L 3 O M A A M C C M C C W D C R ME S C C C M P O D F R eR iR IR IF IS F IL M O O F I M C B C O A A B D M R D M ID D O ID R M O N C M D C A C M C Fig. 3. Pearson’s Correlation Coefficient between 37 Metrics 30 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) objective behind using PCA and RSA is to identify features Support Vector Machines (SVMs) in-terms of learning speed which are relevant in-terms of high predictive power and and computational scalability [15]. ELM has demonstrated impact on the dependent variable and filter irrelevant features good potential to resolving regression and classification prob- which have little or no impact on the classifier accuracy [14]. lems [15] and our objective is to investigate if ELMs can We apply PCA and varimax rotation method on all source be successfully applied in the domain of web service QoS code metrics. The experimental results of PCA analysis is prediction using source code metrics. Selection of an appro- shown in Table VI. Table VI reveals the relationship between priate kernel function depending on the application domain the original source code metrics and the domain metrics. For and dataset is an important and core issue [16]. each PC (Principal Components), we provide the eigenvalue, Ding et al. mention that there is a correlation between variance percent, cumulative percent and source code metrics the generalization performance and learning performance with interpretation (refer to Table VI). In PCA the order of the the kernel function [16] as in the case of traditional neural eigenvalues from highest to lowest indicates the principal networks [16]. Hence, we investigate the performance of the components in the order of significance. Among all Principal ELM based classifier using three different kernel functions: Components, we select only those which have Eigen value linear, polynomial and RBF. ELMs can be used with different greater than 1. Our analysis reveals that 9 PCs have Eigen kernel functions and one can create hybrid kernel functions value greater than 1 (refer to Table VI). Table VI shows the also. The most basic, simplest and fastest is the linear ker- mapping of each component to the most important metric for nel function which is used as a baseline for comparison that component. Table VII shows the optimal subset of features with more complicated kernel functions such as polynomial for every dependent variable derived from the original set of and RBF. Table VIII shows the performance of the ELM 37 source code metrics based features after applying RSA. We based classifier with linear kernel function. Table IX shows apply the RSA procedure 15 times (one for each dependent the performance of the ELM based predictive model with variable). Table VII reveals that it is possible to reduce the second degree polynomial kernel. The polynomial kernel is number of features substantially and several features from the more sophisticated than the linear kernel and uses non-linear original set are found to be uncorrelated. equations instead of the linear equations for the purpose of regression and classification and is expected to result in better TABLE VI accuracy in comparison to the classifier with linear kernel. The F EATURE E XTRACTION USING P RINCIPAL C OMPONENT A NALYSIS - Radial Basis Function kernel (RBF or Gaussian) is a popular D ESCRIPTIVE S TATISTICS kernel function and widely used in Support Vector Machine PC Eigenvalue % variance Cumulative % Metrics Interpretation (SVM) learning algorithm. We use linear kernel to investigate PC1 6.4 17.3 17.3 CBO, RFC, Ca, Ce, LCOM3, LCO, DAM, CAM PC2 5.8 15.76 33.06 OPS, MRS, IRC, IDFC, OP, DP, FP if the data is linearly separable but also use polynomial and PC3 3.67 9.94 43.00 DW, MDC, MeRV, MiRV, CC, ME PC4 3.39 9.16 52.17 DMR, IDC,ISC, ILC RBF kernel to examine if our data is not linearly separable PC5 3.34 9.03 61.2 IC, CBM, MOA (computing a non-linear decision boundary). We employ four PC6 2.5 6.77 67.98 IFC, DIT, NOC, MFA PC7 2.23 6.02 74.00 WMC, NPM different performance metrics (MAE, MMRE, RMSE and r- PC8 2.14 5.79 79.79 MRV, AMC PC9 1.36 3.7 83.5 LCOM value) to study the accuracy of the classifiers. The Mean Absolute Error (MAE) measures the difference between the predicted or forecasted value and the actual values (average TABLE VII of the absolute errors). Table VIII and IX reveals that the S OURCE C ODE M ETRICS (F EATURE ) S ELECTION O UTPUT U SING ROUGH forecast for several predictive model is very accurate as the S ET A NALYSIS (RSA) MAE value is less than 0.05. For example, the MAE value QoS Selected Metrics for HMS, AM and PCA metrics for predicting Conformity is Response Time DMR, SC, LC, WMC, Ca, LCOM3, MFA, CAM, IC, CC Availability FC, SC, LC, MeRV, MiRV, WMC, Ca, LCOM3, MFA, CAM, IC, CC 0.03. Table VIII and IX reveals that in general the predictive Throughput ME, FC, SC, LC, MRV, MeRV, MiRV, Ce, MOA, MFA, CAM, CBM, CC accuracy for response time, latency, modularity and conformity Successability ME, FC, SC, DFC, LC, MRV, WMC, LCOM3, LCO, DAM, MOA, CAM Reliability FC, SC, DFC, LC, WMC, LCOM3, LCO, MOA, MFA, CAM, CBM is better than the predictive accuracy of other QoS parameters. Compliance ME, FC, SC, DFC, LC, MRV, WMC, MiRV, Ca, CC, DAM, MOA, CAM, NPM Best Practices ME, FC, SC, DFC, LC, MRV, MiRV, WMC, Ca, NPM, MOA, MFA, CAM, CC Kitchenham et al. mention that Mean Magnitude of Relative Latency DMR, ME, DC, FC, DFC, LC, MRV, NOC, NPM, LCO, MOA, CAM, IC Error (MMRE) is a widely used assessment criterion for Documentation ME, FC, SC, DFC, LC, MRV, MeRV, WMC, Ca, NPM, CAM, IC, CC Maintainability DP, Ce, LCOM3, MOA, MFA, CAM, CBM evaluating the predictive accuracy and overall performance Modularity DMR, ME, SC, DFC, LC, DP, MRV, MiRV, WMC, Ca, MOA, IC, AMC Reusability MDC, DMR, FC, SC, DFC, LC, LCOM, LCOM3 of competing software prediction models and particularly Testability ME, FC, SC, LC, MiRV, DIT, NOC, CC, RFC the software estimation models [17]. MMRE computes the Interoperability SC, LC, MeRV, MiRV, WMC, DIT, CBO, MFA, CC Conformity ME, FC, DFC, LC, MRV, WMC, Ca, CAM difference between actual and predicted value relative to the actual value. Table X shows that the MMRE values for ELM with RBF kernel is between 0.30 to 0.35 for response time, VIII. A PPLICATION OF E XTREME L EARNING M ACHINES availability and successability and indicates good estimation (ELM S ) ability of the classifier. Table VIII and Table IX reveals Huan et al. mention that Extreme Learning Machines that the MMRE values for conformity QoS parameter are (ELMs) have shown to outperform computational intelligence as low as 0.05, 0.06, 0.10 and 0.11. Root Mean Square techniques such as Artificial Neural Networks (ANNs) and Error (RMSE) or Root Mean square Deviation root-mean- 31 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) TABLE VIII P ERFORMANCE M ATRIX FOR ELM WITH L INEAR K ERNEL ty y e n lit m es t io ili y lit Ti bi t ic t ab e it y y ta it y pu ty nc bi y y na lit ac er en se ili lit lit m ar gh sa ia bi cy ai Pr op on ab bi bi m or pl es ul u la nt ten ta lia cu ter ro sp us nf m cc od ai st ai s Th Co Do Co Re Re La Re Av Su Be Te M M In MAE BMS 0.11 0.19 0.18 0.21 0.25 0.28 0.19 0.11 0.23 0.20 0.11 0.16 0.12 0.16 0.18 HMS 0.10 0.16 0.18 0.19 0.25 0.27 0.19 0.11 0.22 0.15 0.08 0.15 0.07 0.14 0.06 OOM 0.09 0.14 0.18 0.17 0.25 0.25 0.18 0.11 0.22 0.12 0.15 0.15 0.14 0.22 0.18 AM 0.10 0.15 0.18 0.18 0.26 0.27 0.20 0.11 0.21 0.11 0.07 0.13 0.06 0.11 0.06 PCA 0.10 0.16 0.19 0.18 0.26 0.26 0.21 0.12 0.21 0.16 0.09 0.14 0.10 0.14 0.09 RSA 0.11 0.17 0.18 0.20 0.25 0.26 0.19 0.11 0.23 0.27 0.16 0.17 0.15 0.25 0.26 MMRE BMS 0.37 0.35 0.73 0.38 0.66 0.61 0.37 0.55 0.90 0.76 0.41 0.61 0.46 0.35 0.28 HMS 0.36 0.33 0.73 0.36 0.69 0.61 0.39 0.56 0.93 0.60 0.29 0.60 0.26 0.31 0.11 OOM 0.34 0.32 0.74 0.35 0.71 0.61 0.37 0.55 0.89 0.47 0.54 0.57 0.55 0.50 0.33 AM 0.36 0.34 0.71 0.37 0.71 0.64 0.47 0.54 0.90 0.42 0.22 0.46 0.20 0.25 0.10 PCA 0.34 0.34 0.74 0.36 0.69 0.59 0.49 0.60 0.88 0.60 0.30 0.59 0.37 0.33 0.16 RSA 0.36 0.34 0.71 0.37 0.69 0.57 0.36 0.54 0.88 1.09 0.60 0.67 0.54 0.52 0.44 RMSE BMS 0.16 0.26 0.22 0.28 0.29 0.33 0.24 0.15 0.29 0.26 0.17 0.22 0.17 0.22 0.22 HMS 0.15 0.22 0.22 0.24 0.29 0.31 0.23 0.15 0.28 0.21 0.12 0.19 0.11 0.18 0.08 OOM 0.15 0.21 0.22 0.23 0.29 0.30 0.22 0.14 0.28 0.16 0.22 0.19 0.21 0.28 0.23 AM 0.15 0.22 0.22 0.24 0.29 0.32 0.24 0.14 0.27 0.14 0.11 0.16 0.10 0.15 0.08 PCA 0.15 0.22 0.22 0.25 0.29 0.31 0.25 0.16 0.27 0.21 0.13 0.20 0.14 0.19 0.13 RSA 0.17 0.26 0.22 0.29 0.30 0.31 0.23 0.14 0.29 0.34 0.22 0.23 0.21 0.31 0.32 r-value BMS 0.34 0.30 0.30 0.45 0.50 0.49 0.47 0.45 0.41 0.84 0.81 0.50 0.90 0.88 0.79 HMS 0.35 0.33 0.39 0.38 0.21 0.28 0.62 0.21 0.23 0.88 0.90 0.83 0.94 0.88 0.98 OOM 0.66 0.62 0.57 0.27 0.65 0.48 0.64 0.48 0.33 0.94 0.74 0.78 0.51 0.67 0.77 AM 0.32 0.35 0.37 0.26 0.34 0.30 0.34 0.58 0.27 0.94 0.98 0.90 0.93 0.92 0.99 PCA 0.40 0.32 0.37 0.39 0.41 0.40 0.29 0.36 0.37 0.87 0.94 0.79 0.86 0.90 0.99 RSA 0.29 0.77 0.49 0.45 0.37 0.48 0.55 0.31 0.43 0.11 0.36 0.48 0.70 0.77 0.43 TABLE IX P ERFORMANCE M ATRIX FOR ELM WITH P OLYNOMIAL K ERNEL ty y e n lit m es tio ili y lit Ti bi tic t ab e ity y ta ity pu ty nc bi y y na lit ac er en se ili lit lit m ar gh sa ia bi cy ai Pr op on ab bi bi m or pl es ul u la nt ten ta lia cu ter ro sp us nf m cc od ai st ai s Th Co Do Co Re Re La Re Av Su Be Te M M In BMS 0.10 0.14 0.18 0.17 0.24 0.26 0.19 0.11 0.23 0.17 0.11 0.16 0.11 0.15 0.12 HMS 0.11 0.15 0.19 0.17 0.26 0.28 0.19 0.11 0.23 0.12 0.05 0.11 0.04 0.10 0.03 OOM 0.12 0.15 0.21 0.18 0.27 0.28 0.19 0.11 0.23 0.12 0.13 0.14 0.15 0.18 0.12 AM 0.13 0.20 0.20 0.23 0.28 0.33 0.23 0.13 0.28 0.10 0.06 0.12 0.05 0.11 0.03 PCA 0.11 0.15 0.20 0.18 0.24 0.26 0.19 0.12 0.23 0.10 0.06 0.13 0.07 0.11 0.03 RSA 0.10 0.14 0.18 0.16 0.24 0.26 0.18 0.11 0.22 0.25 0.14 0.16 0.14 0.23 0.23 MMRE BMS 0.36 0.32 0.75 0.35 0.68 0.62 0.39 0.58 0.94 0.71 0.41 0.67 0.44 0.36 0.20 HMS 0.38 0.33 0.76 0.35 0.71 0.64 0.40 0.57 0.93 0.52 0.19 0.40 0.18 0.19 0.05 OOM 0.42 0.34 0.81 0.37 0.77 0.64 0.37 0.56 0.94 0.44 0.48 0.49 0.57 0.41 0.22 AM 0.48 0.41 0.82 0.46 0.73 0.70 0.54 0.67 1.18 0.42 0.25 0.41 0.18 0.21 0.05 PCA 0.36 0.35 0.80 0.37 0.69 0.59 0.47 0.60 0.93 0.38 0.24 0.50 0.28 0.23 0.06 RSA 0.36 0.31 0.74 0.33 0.68 0.59 0.36 0.57 0.91 1.04 0.52 0.67 0.53 0.51 0.41 RMSE BMS 0.15 0.20 0.22 0.23 0.28 0.31 0.22 0.15 0.29 0.23 0.16 0.22 0.17 0.20 0.16 HMS 0.16 0.21 0.23 0.23 0.30 0.33 0.23 0.16 0.29 0.18 0.08 0.17 0.07 0.13 0.04 OOM 0.18 0.22 0.26 0.25 0.32 0.34 0.23 0.15 0.30 0.15 0.20 0.19 0.21 0.25 0.18 AM 0.19 0.27 0.25 0.32 0.33 0.41 0.28 0.17 0.36 0.16 0.10 0.16 0.07 0.17 0.05 PCA 0.15 0.22 0.24 0.24 0.28 0.31 0.24 0.16 0.29 0.13 0.11 0.18 0.11 0.16 0.06 RSA 0.16 0.20 0.22 0.22 0.28 0.31 0.22 0.15 0.28 0.31 0.20 0.21 0.20 0.28 0.26 r-value BMS 0.46 0.30 0.39 0.16 0.36 0.16 0.38 0.07 0.09 0.86 0.88 0.45 0.93 0.89 0.89 HMS 0.18 0.31 0.19 0.38 -0.02 0.19 0.24 0.46 0.37 0.94 0.97 0.90 1.00 0.95 1.00 OOM 0.08 0.29 0.38 0.22 0.36 0.16 0.34 0.42 0.46 0.96 0.77 0.72 0.59 0.75 0.92 AM 0.12 0.31 0.40 -0.01 0.40 0.19 0.42 0.50 0.37 0.95 0.99 0.91 0.98 0.94 1.00 PCA 0.18 -0.04 0.46 0.19 0.29 0.39 0.58 0.51 0.51 0.94 0.98 0.90 0.99 0.95 1.00 RSA 0.25 0.37 0.54 0.26 0.39 0.24 0.42 0.13 0.26 0.60 0.75 0.45 0.51 0.70 0.62 square deviation computes the sample standard deviation of parameters. We do not observe a dominate approach between the differences between predicted values by the estimator and PCA and RSA. actual values. RMSE is also an indicator of the predicted IX. C OMPARING A LGORITHMS USING S TATISTICAL and observed values. From Table VIII, we infer that the best S IGNIFICANCE T ESTING RMSE value in-case of ELM with linear kernel is for response time and latency QoS parameters. The minimum RMSE value Our objective is to compare several learning algorithms and obtained is 0.08 for HMS metrics and conformity parameter assess which algorithm is better. Dietterich et al. review 5 ap- in-case of linear kernel. From Table VIII, we observe that proximate statistical tests for determining whether one learning in-case of polynomial kernel, the performance of PCA based algorithm outperforms another on a particular learning task feature extraction technique is better for some parameters in and dataset [18]. We apply the 10-fold cross-validated paired t- comparison to RSA based feature selection technique and test as described in the paper by Dietterich et al. [18]. We have similarly the performance of RSA is better than PCA for some several combination of subsets of metrics and ELM kernel functions as learning algorithms. We consider 6 different sets 32 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) TABLE X P ERFORMANCE M ATRIX FOR ELM WITH RBF K ERNEL ty y e n lit m es t io ili y lit Ti bi t ic t ab e it y y ta it y pu ty nc bi y y na lit ac er en se ili lit lit m ar gh sa ia bi cy ai Pr op on ab bi bi m or pl es ul u la nt ten ta lia cu ter ro sp us nf m cc od ai st ai s Th Co Do Co Re Re La Re Av Su Be Te M M In BMS 0.10 0.14 0.18 0.17 0.24 0.26 0.18 0.11 0.22 0.25 0.16 0.16 0.14 0.24 0.25 HMS 0.10 0.14 0.18 0.17 0.24 0.26 0.18 0.11 0.22 0.25 0.16 0.18 0.14 0.25 0.29 OOM 0.09 0.13 0.18 0.16 0.24 0.25 0.18 0.11 0.22 0.24 0.17 0.16 0.15 0.25 0.27 AM 0.09 0.13 0.19 0.17 0.24 0.26 0.20 0.11 0.21 0.24 0.17 0.17 0.14 0.24 0.31 PCA 0.09 0.13 0.19 0.16 0.24 0.25 0.20 0.11 0.21 0.24 0.17 0.17 0.14 0.24 0.30 RSA 0.10 0.13 0.18 0.16 0.24 0.25 0.18 0.11 0.22 0.26 0.16 0.16 0.14 0.26 0.26 MMRE BMS 0.34 0.31 0.74 0.35 0.67 0.60 0.39 0.56 0.91 1.03 0.60 0.71 0.55 0.52 0.47 HMS 0.34 0.32 0.74 0.35 0.66 0.61 0.38 0.56 0.92 1.05 0.58 0.76 0.53 0.53 0.65 OOM 0.33 0.31 0.73 0.34 0.70 0.58 0.38 0.56 0.89 1.03 0.61 0.65 0.53 0.56 0.49 AM 0.32 0.32 0.74 0.35 0.68 0.60 0.48 0.56 0.90 0.97 0.59 0.72 0.50 0.54 0.69 PCA 0.31 0.30 0.75 0.34 0.68 0.59 0.49 0.58 0.88 0.96 0.60 0.73 0.49 0.54 0.68 RSA 0.34 0.30 0.73 0.33 0.68 0.57 0.37 0.55 0.90 1.12 0.61 0.70 0.55 0.55 0.47 RMSE BMS 0.15 0.20 0.21 0.22 0.27 0.29 0.22 0.15 0.28 0.31 0.21 0.22 0.20 0.29 0.27 HMS 0.15 0.20 0.22 0.22 0.27 0.30 0.22 0.15 0.28 0.31 0.21 0.24 0.20 0.29 0.31 OOM 0.14 0.19 0.22 0.21 0.28 0.29 0.22 0.14 0.27 0.30 0.23 0.21 0.21 0.30 0.28 AM 0.14 0.20 0.22 0.22 0.28 0.29 0.23 0.14 0.27 0.30 0.23 0.23 0.20 0.29 0.32 PCA 0.14 0.19 0.22 0.21 0.28 0.29 0.24 0.16 0.27 0.30 0.23 0.23 0.20 0.29 0.32 RSA 0.15 0.19 0.21 0.21 0.27 0.29 0.21 0.14 0.27 0.31 0.22 0.22 0.20 0.30 0.28 r-value BMS 0.31 0.50 0.19 0.02 0.08 0.24 0.50 0.46 0.15 0.87 0.81 0.47 0.81 0.78 0.93 HMS 0.34 0.18 0.43 0.38 0.29 0.33 0.33 0.38 0.34 0.93 0.96 0.48 0.91 0.78 0.97 OOM 0.57 0.34 0.35 0.44 0.55 0.37 0.64 0.29 0.26 0.82 0.58 0.71 0.53 0.70 0.83 AM 0.63 0.39 0.29 0.20 0.48 0.28 0.32 0.30 0.34 0.96 0.94 0.62 0.83 0.91 0.95 PCA 0.50 0.20 0.32 0.00 0.44 0.41 0.64 0.26 0.24 0.92 0.73 0.68 0.75 0.85 0.98 RSA 0.21 0.39 0.36 0.42 0.38 0.46 0.61 0.35 0.43 0.42 0.59 0.20 0.44 0.56 0.57 of metrics: All Metrics (AM), Only Object Oriented Metrics subset of metrics, a total number of three set (one for each (OOM), Harry M. Sneed’s Metrics (HMS), Baski and Misra performance measure) are used, each with 45 data points (3 Metrics (BMS), Metrics derived after executing PCA, and kernels multiplied by 15 QoS parameters). Table XI displays metrics derived after executing RSA. We consider 6 sets of the result of the 10-fold cross-validated paired t-test analysis. metrics as input to develop a model to predict 15 different For each of the 3 kernels (Linear, Polynomial and RBF), QoS parameters. We investigate the application of extreme 6 different subset of metrics are considered as input with learning machine with three different types of kernel functions: three different performance parameters. The three different linear kernel, polynomial kernel, and radial basis function performance parameters are: Mean Absolute Error (MAE), with three different performance parameters. Hence, for each Mean Magnitude of Relative Error (MMRE) and Root Mean TABLE XI E XPERIMENTAL R ESULTS ON T- TEST BETWEEN D IFFERENT S ET OF M ETRICS MAE Mean Difference p-value BMS HMS OOM AM PCA RSA BMS HMS OOM AM PCA RSA BMS 0.000 0.014 0.000 0.009 0.011 -0.013 NaN 0.004 0.986 0.140 0.008 0.008 HMS -0.014 0.000 -0.014 -0.004 -0.002 -0.026 0.004 NaN 0.017 0.164 0.175 0.003 OOM 0.000 0.014 0.000 0.009 0.011 -0.013 0.986 0.017 NaN 0.155 0.025 0.026 AM -0.009 0.004 -0.009 0.000 0.002 -0.022 0.140 0.164 0.155 NaN 0.570 0.035 PCA -0.011 0.002 -0.011 -0.002 0.000 -0.024 0.008 0.175 0.025 0.570 NaN 0.005 RSA 0.013 0.026 0.013 0.022 0.024 0.000 0.008 0.003 0.026 0.035 0.005 NaN MMRE Mean Difference p-value BMS HMS OOM AM PCA RSA BMS HMS OOM AM PCA RSA BMS 0.000 0.037 0.000 0.026 0.027 -0.037 NaN 0.011 0.990 0.182 0.051 0.009 HMS -0.037 0.000 -0.037 -0.010 -0.010 -0.074 0.011 NaN 0.039 0.346 0.199 0.005 OOM 0.000 0.037 0.000 0.026 0.027 -0.037 0.990 0.039 NaN 0.199 0.081 0.078 AM -0.026 0.010 -0.026 0.000 0.000 -0.063 0.182 0.346 0.199 NaN 0.967 0.048 PCA -0.027 0.010 -0.027 0.000 0.000 -0.064 0.051 0.199 0.081 0.967 NaN 0.014 RSA 0.037 0.074 0.037 0.063 0.064 0.000 0.009 0.005 0.078 0.048 0.014 NaN RMSE Mean Difference p-value BMS HMS OOM AM PCA RSA BMS HMS OOM AM PCA RSA BMS 0.000 0.018 -0.001 0.010 0.013 -0.014 NaN 0.002 0.781 0.205 0.008 0.005 HMS -0.018 0.000 -0.020 -0.008 -0.005 -0.033 0.002 NaN 0.010 0.066 0.038 0.002 OOM 0.001 0.020 0.000 0.012 0.014 -0.013 0.781 0.010 NaN 0.167 0.021 0.050 AM -0.010 0.008 -0.012 0.000 0.003 -0.025 0.205 0.066 0.167 NaN 0.575 0.045 PCA -0.013 0.005 -0.014 -0.003 0.000 -0.027 0.008 0.038 0.021 0.575 NaN 0.004 RSA 0.014 0.033 0.013 0.025 0.027 0.000 0.005 0.002 0.050 0.045 0.004 NaN 33 4th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2016) TABLE XII E XPERIMENTAL R ESULTS ON T- TEST BETWEEN T HREE D IFFERENT K ERNELS Mean Difference MAE MRE RMSE Linear Polynomial RBF Linear Polynomial RBF Linear Polynomial RBF Linear 0.000 0.007 -0.021 Linear 0.000 0.005 -0.084 Linear 0.000 0.005 -0.018 Polynomial -0.007 0.000 -0.028 Polynomial -0.005 0.000 -0.090 Polynomial -0.005 0.000 -0.023 RBF 0.021 0.028 0.000 RBF 0.084 0.090 0.000 RBF 0.018 0.023 0.000 p-value MAE MRE RMSE Linear Polynomial RBF Linear Polynomial RBF Linear Polynomial RBF Linear NaN 0.13 0.001 Linear NaN 0.467 0.000 Linear NaN 0.121 0.007 Polynomial 0.13 NaN 0.000 Polynomial 0.467 NaN 0.000 Polynomial 0.121 NaN 0.005 RBF 0.001 0.000 NaN RBF 0.000 0.000 NaN RBF 0.007 0.005 NaN Squared Error (RMSE). Hence for each kernel a total three set R EFERENCES (one for each performance measure) are used, each with 90 [1] F. Curbera, M. Duftler, R. Khalaf, W. Nagy, N. Mukhi, and S. Weer- data points (six subsets of metrics multiplied by 15 QoS). The awarana, “Unraveling the web services web: an introduction to soap, experimental results of t-test analysis for different performance wsdl, and uddi,” IEEE Internet computing, vol. 6, no. 2, p. 86, 2002. [2] E. Newcomer and G. Lomow, Understanding SOA with Web services. parameter (MAE, MMRE and RMSE) and three different ELM Addison-Wesley, 2005. kernels are summarized in Table XII. Table XII contains two [3] J. L. O. Coscia, M. Crasso, C. Mateos, A. Zunino, and S. Misra, parts. The first part of the table XII shows the mean difference Predicting Web Service Maintainability via Object-Oriented Metrics: A Statistics-Based Approach. Springer Berlin Heidelberg, 2012, pp. value and second part shows the p-value between different 29–39. pairs. Table XII reveals that there is no significant difference [4] J. L. O. Coscia, M. Crasso, C. Mateos, and A. Zunino, “Estimating web between the kernel function, due to the fact that p-value is service interface quality through conventional object-oriented metrics.” CLEI Electron. J., vol. 16, 2013. greater than 0.05. However, by closely examining the value [5] C. Mateos, M. Crasso, A. Zunino, and J. L. O. Coscia, “Detecting wsdl of mean difference, polynomial kernel yields better result bad practices in code-first web services,” Int. J. Web Grid Serv., vol. 7, compared to other kernels function i.e., linear and RBF kernel no. 4, pp. 357–387, Jan. 2011. [6] L. Kumar, M. Kumar, and S. K. Rath, “Maintainability prediction of web functions. service using support vector machine with various kernel methods,” In- ternational Journal of System Assurance Engineering and Management, X. C ONCLUSION pp. 1–18, 2016. We develop a predictive model to estimate QoS parameters [7] S. O. Olatunji, Z. Rasheed, K. Sattar, A. Al-Mana, M. Alshayeb, and E. El-Sebakhy, “Extreme learning machine as maintainability prediction of web services using source code (implementing the services) model for object-oriented software systems,” Journal of Computing, metrics. We experiment with six different sets of metrics as vol. 2, no. 8, pp. 49–56, 2010. input to develop a prediction model. The performance of these [8] E. Al-Masri and Q. H. Mahmoud, “Qos-based discovery and ranking of web services,” in Computer Communications and Networks, 2007. sets of metrics are evaluated using Extreme Learning Machines ICCCN 2007. Proceedings of 16th International Conference on, 2007, (ELM) with various kernel functions such as linear, polyno- pp. 529–534. mial and RBF kernel function. From the correlation analysis [9] E. Al Masri and Q. H. Mahmoud, “Investigating web services on the world wide web,” Proceedings of the 17th International Conference on between metrics, we observe that there exists a high correlation World Wide Web, pp. 795–804, 2008. between Object-Oriented metrics and WSDL metrics. From t- [10] D. Baski and S. Misra, “Metrics suite for maintainability of extensible test analysis, we infer that in most of the cases, there is the markup language web services,” IET Software, vol. 5, no. 3, pp. 320– 341, 2011. difference between the various sets of metrics in terms of the [11] H. M. Sneed, “Measuring web service interfaces,” in Web Systems performance of the estimator is not substantial but moderate. Evolution (WSE), 2010 12th IEEE International Symposium on, 2010, We observe that the predictive model developed using Harry pp. 111–115. [12] M. Jureczko and D. Spinellis, Using Object-Oriented Design Metrics M. Sneed (HMS) metrics yields better result compared to other to Predict Software Defects, ser. Monographs of System Dependability, sets of metrics such as all metrics and Baski and Misra metrics. 2010, vol. Models and Methodology of System Dependability, pp. 69– From t-test analysis, we can also interpret that difference 81. [13] D. Spinellis, “Tool writing: a forgotten art? (software tools),” IEEE between the three kernel functions in-terms of their influence Software, vol. 22, no. 4, pp. 9–11, 2005. on the predictive accuracy is moderate. We conclude that none [14] R. W. Swiniarski and A. Skowron, “Rough set methods in feature of the feature selection technique dominate the other and one selection and recognition,” Pattern recognition letters, vol. 24, no. 6, pp. 833–849, 2003. feature selection method is better than the other for some [15] G.-B. Huang, D. H. Wang, and Y. Lan, “Extreme learning machines: QoS parameters and vice-versa. By assessing the value of a survey,” International Journal of Machine Learning and Cybernetics, mean difference, we infer that the polynomial kernel for ELM vol. 2, no. 2, pp. 107–122, 2011. [16] S. Ding, Y. Zhang, X. Xu, and L. Bao, “A novel extreme learning yields better result compared to other kernels function i.e., machine based on hybrid kernel function,” Journal of Computers, vol. 8, linear and RBF kernel functions. From performance results, no. 8, pp. 2110–2117, 2013. it is observed that the performance of the predictive model [17] B. A. Kitchenham, L. M. Pickard, S. G. MacDonell, and M. J. Shepperd, “What accuracy statistics really measure [software estimation],” IEE or estimator varies with the different sets of software metrics, Proceedings-Software, vol. 148, no. 3, pp. 81–85, 2001. feature selection technique and the kernel functions. Finally, [18] T. G. Dietterich, “Approximate statistical tests for comparing supervised we conclude that it is possible to estimate the QoS parameters classification learning algorithms,” Neural Comput., vol. 10, no. 7, pp. 1895–1923, Oct. 1998. using ELM and source code metrics. 34