An effort allocation method to optimal code sanitization for quality-aware energy efficiency improvement Marco Bessi∗ , Gabriella Carrozza† , Roberto Pietrantuono‡ , Stefano Russo‡ , ∗ CAST Software, Via San Vittore n. 49, 20123 Milan, Italy † Accenture Operations - Infrastructure Services Manager, Piazzale dell’Industria n. 40, 00144, Rome, Italy ‡ Università degli Studi di Napoli Federico II, DIETI, Via Claudio 21, 80125, Naples, Italy. Email: M.Bessi@castsoftware.com, gabriella.carrozza@accenture.com, {roberto.pietrantuono, stefano.russo}@unina.it Abstract—Software energy efficiency has been shown to re- multi-facet attribute, including reliability, usability, portability, markably affect the energy consumption of IT platforms. Besides security, performance, efficiency [3], which are all affected by “performance” of the code in efficiently accomplishing a task, its how the software is designed and ultimately implemented. In “correctness” matters too. Software containing defects is likely to fail and the computational cost to complete an operation becomes this position paper, we first explore the relationship between much higher if the user encounters a failure. Both performance- code quality and performance-related energy consumption as related energy efficiency of software and its defectiveness are im- well as between code quality and software defectiveness. Code pacted by the quality of the code. Exploiting the relation between quality is expressed in terms of degree of adherence to pre- code quality and energy/defectiveness attributes is the main idea defined sets of well-programming rules. The violation of rules behind this position paper. Starting from the authors’ previous experience in this field, we define a method to first predict the is checked by means of a well-known tool, the Automatic applications of a software system more likely to impact energy Static Analysis (ASA) [4]. Some preliminary results are pre- consumption and with higher residual defectiveness, and then sented in this regard. Then, a method is proposed to: to exploit the prediction for optimally scheduling the effort for code sanitization – thus supporting, by quantitative figures, the quality assurance teams’ decision-makers. • Predict the applications of a software system more likely Index Terms—Reliability allocation, effort allocation, test plan- to impact performance-related energy consumption as ning, quality assurance, static analysis, prediction, optimization well as the applications more likely to contain defects. This is accomplished by machine learning algorithms to train predictive models inferring the relation between I. I NTRODUCTION ASA rule violations and energy/defectiveness attributes. Energy consumption of IT systems is increasingly becom- By such a knowledge, quality assurance team leaders can ing a concern, especially for large infrastructure managers. rank the applications according to the criticality (both Research on Green IT is developing at a rapid pace, spanning in terms of performance-related energy consumption and from fields centered on hardware device and architectural defectiveness) before the testing stage starts, and plan design, to software design and development, as well as to actions to improve the code where needed. operations management and usage practices. • Optimize the allocation of the effort for code improve- Software energy efficiency has been shown to remarkably ment to each application, based on the prediction results. affect energy consumption of all the other infrastructural levels Effort allocation models are a very useful practice for [1] [2]. The efficiency of the code is indeed related to the com- quality assurance planning on quantitative bases, espe- putational expense needed to perform the intended task. We cially in large systems [5] [6]. We formulate a multi- claim that the overall energy efficiency due to software is not objective optimization model suggesting the best dis- only related to how fast the code performs a task, but also to tribution of effort i) across applications and ii) across the correctness of its execution. A software containing defects ASA rules, to sanitize the code under specific objectives is likely to fail upon a request is made; the computational cost in terms of: cost to fix violations, expected impact on to accomplish a requested operation becomes much higher if energy, expected defectiveness reduction. the user encounters a failure and must retry the operation, or, even worse, if the system has to be restarted or switched off in favour of a replica. In the following, we first present our experience on both Both performance-related efficiency of software and its energy efficiency measurement and software defects analysis defectiveness are related to the quality of the code. Quality is a (Section II), with preliminary results about correlation with ASA rule violations; then, the method is presented in Section † G. Carrozza was with CAST Software when this paper was prepared. III; a roadmap is in Section IV. II. PAST E XPERIENCE AND P RELIMINARY R ESULTS Resource usage by the application can be estimated by dynamic analysis, namely by means of profiling during testing. A. ASA-Energy Relationship However, there would be three problems with this approach: i) Measurement (and, in our case, prediction) of energy ef- the result would depend on the goodness of test cases, namely ficiency is as much important as underestimated. Previous on their ability to well represent the operational usage; ii) the studies shown that 86% of ICT departments in UK do not result does not provide direct hints about how it is possible know their CO2 emissions and 80% of companies do not know to optimize the code for performance improvement; iii) the their electric bill [1]. In the work we have done about this topic result of such an evaluation could be provided too late, as [7] [8], we introduced and tested a model to measure software (operational) testing is done at the end of the lifecycle, whereas application energy consumption that decouples the usage of developers would like to benefit from the feedback earlier. the computational resources from their energy consumption. We therefore explore the performance indications that can Such a method allows estimating the energy efficiency of be given by ASA on the source code. In fact, the adherence software applications independently of the hardware infras- or violation to predefined programming rules and patterns is tructure. Let us model the usage of resources by monitoring able to suggest possible performance issues. This, in turn, is three key entities: the CPU, the storage (e.g., let us assume likely to impact energy consumption, as software with higher a DB that is the common storage means on large systems), performance is expected to be more energy-efficient. We are the network. According to our model, energy consumption is exploring if there is any relation between Automatic Static expressed as: code Analysis (ASA) rules and the usage pattern factor γ, ET OT = Eidle + ECP U + EDB + EN ET which is the most relevant contributor to ET OT . As prelim- (1) inary result we are obtaining, we have observed that some = Pidle · tex + γ · βu + DDB · eDB + DN ET · eN ET ASA rules, defined by means of the CAST c tool, are highly correlated to the γ metric, as reported in Table I. in which: • Pidle is the power absorption of the CPU in the idle state TABLE I: Spearman correlation among # violations with γ for each group and tex is the time in which the system is idle; of rules. Text in bold indicates confidence > 95% • γ is the usage of the CPU, and can be written as: Programming Rules Spearman ρ 1 t2 R f t1 U% (t)dt · ω(t) where: U% is the percentage usage Avoid using Driver Manager 0.93 of the CPU at time t, ω is a function denoting the Avoid the use of “Instanceof” inside loops 0.87 application-dependent consumption pattern, which may Avoid using Hashing Table 0.85 Avoid String initialization with String objects 0.85 depend on many factors (e.g., number of concurrent users, Avoid String concatenation in loops 0.75 type of operations, type of application, state of the DB, Avoid using Dynamic instantiation 0.72 etc.); f is the clock frequency of the CPU; t1 and t2 are the initial and final measurement time; Based on this result, we are confident that the infringements • βu is the power absorption function of the CPU per in terms of code rule violations as obtained by ASA are percent usage (it is approximated by a constant assuming potentially correlated with energy efficiency. This is going to the power grows linearly with usage); be exploited for our method formulation. • DDB is the amount of data (in number bytes) exchanged with the DB, while eDB is the energy consumed to B. ASA-Defects Relationship exchange 1 byte with the DB; The quality of software can be well assessed by ASA. In • DN ET is the amount of data (in number bytes) exchanged our previous experience [9], we have studied the quality of with the network, while eN ET is the energy consumed large-scale mission-critical software systems produced by a big to exchange 1 byte with the network. Italian company, leader in the market of defence, homeland In this expression, the factors γ, DDB and DN ET are security and protection, information security, avionics and intrinsic characteristics of the software application, while βu , aerospace. The quality as perceived by the end user can be, eDB and eN ET depend on the hardware infrastructure. The however, different. User perceives system’s failures: one could sole estimation of the former parameters allows computing the have a very poor (internal) code quality in terms of violations, energy consumption of the software application. In previous but a good user experience. In fact, ASA violations are not studies [1], it was shown that the consumption of storage is necessarily defects: they can be either false warnings, when a almost independent of usage, as dynamic RAMs are constantly violation does not cause the software to fail, or can be actually refreshed and most of the energy consumed by disks is used for a defect, whose activation would lead the software to a failure. spinning. In addition, focusing on CPU-intensive applications, In [10], we therefore also explored the quality of such systems the EDB and EN ET are negligible. Thus we assume that in terms of actual defectiveness – an exercise that allowed us energy mainly depends on the usage pattern γ. In [7], we to assess not only the product but also the process quality. have tested the model by using both ammeter clamp directly Since defects are detected at later stages (i.e, during testing connected to the server’s CPU electric source and specific HP or even operation) compared to violations (detected soon Performance Center’s instrumentations. after coding), their treatment is much more expensive. Thus, it makes sense to try anticipating the knowledge about the The steps for prediction are: presence of defects as earlier as possible. To this aim, we • Retrieving data from ASA reports about a set of appli- are exploring any possible relationship between programming cations of a tested or an operational system. Tracking of rules violation and the presence of software defects. There defects found during testing and of energy consumption is indeed a reasonable expectation that ASA violations are measurements1 should be available. correlated to defects (as proven in some past papers [4] [11]), • Training of prediction models. We mean to test several and that some types of violations (namely, some rules) are different models, by varying both machine learning algo- more correlated than others. In Table II, we report preliminary rithms (we opt for ranking algorithms, such as regression- results about such correlation we are obtaining on a dataset based techniques [12], since binary classifiers provide no of 29 components of the same mission-critical system that we information about the relative weight of each sample), have studied in [9]. There is a significant correlation (with and the variables of interest (e.g., relying on absolute 95% of confidence) in several cases. Results can however be numbers or relative to the code size). improved by considering rules at finer grain (i.e., not by group) • Test of the models by 10-fold cross-validation, repeated than what we did so far. As in the previous case, this output N = 30 times per each classifier (for statistical validity), is going to be exploited for the method formulation. comparison and choice of the best model. Criteria for comparison can be several (the most used ones are the TABLE II: Spearman correlation among # violations with defects for each response variable value in the top 20% of samples and group of rules. Text in bold indicates a confidence > 95% Fault Percentile Average (FPA) [12]). Programming Rules Spearman ρ • Actual prediction on applications for which we know Metrics Rules 0.85 only the ASA violations but anything about energy con- Naming Convention Rules 0.46 Possible Bugs 0.45 sumption (and defects) yet. The output is a list of appli- Coding Convention Rules 0.45 cations rated by their expected energy consumption and Formatting Rules 0.45 a list rated by expected defectiveness. The ranking-based Memory and Resource 0.44 algorithms associate a ranking score with each sample in Comments Rules 0.41 MISRA Rules 0.37 the list. We call it pj (and qj , for energy consumption) and pj −min(pj ) OOP Rules 0.29 normalize it in [0,1]: ωdj = max(p j )−min(pj ) (similarly Optimization Rules 0.29 for qj , obtaining ωej ). Security 0.27 Threads & Synchronization 0.27 • Rules rating. We analyze the impact of attribute se- Resources 0.17 lection, by an attribute ranking algorithm such as the Initialization Rules 0.06 Information Gain, which rates an attribute according Exceptions Rules 0.01 to the gain of information obtained by knowing that attribute. The top rules useful for prediction are derived, III. O PTIMAL CODE SANITIZATION and each rule i is assigned a score si (hi for energy consumption list) denoting the importance of that rule for Based on the preliminary results about correlation, we prediction. The scores are then normalized in [0,1] to get propose the following allocation method. The method includes si −min(si ) the relative weight of each rule: ρdi = max(s i )−min(si ) two steps: i) a prediction step, in which we run machine hi −min(hi ) learning algorithms to build predictive models able of spotting and ρei = max(h i )−min(hi ) , in the two cases of defec- the most relevant applications and ASA rules, from both tiveness and energy consumption, respectively. energy and defectiveness point of view; ii) an optimization Prediction information (ωdj , ρdi and ωej , ρei ) is used to step, in which we exploit the output of prediction to orient the parametrize the optimization model. The steps we are going fixing of ASA rule violations in terms of: which applications to implement for optimization are in the next Section. and which rules should be prioritized in the correction. The B. Optimization goal is to provide quality team leaders with a quantitative means to figure out how to sanitize the code to pursue well- Optimization of violation fixing is conceived to exploit the defined objectives of energy efficiency, quality and cost. above predictions to prioritize applications and rules to fix, according to predefined objectives and constraints. We aim at A. Prediction providing the team leader with sets of alternative solutions The goal of prediction is twofold: for which the expected impact on cost, quality and energy efficiency is known. Thus a multi-objective optimization • Predict applications more likely containing defects and approach is formulated. The main objective of the proposed applications more prone to consume energy before the approach can be stated as follows: suggest the number and type integration/system/acceptance testing starts; of violations to remove and the applications from which they • Predict which rules more likely suggest the presence of should be removed in order to get the best tradeoffs among: i) defects and which ones suggest inefficiency bottlenecks (besides the side feedback of identifying the most critical 1 See [7] for a discussion about measurement of energy consumption; for rules that programmers are keen on violating). our purpose, a profiling tool can suffice to get realistic patterns of γ. minimizing the expected residual defectiveness (i.e., maximize consumption if sanitization actions are taken as suggested. quality), ii) minimizing the expected energy consumption (i.e., Note that this is just an instance of multi-objective models that maximize energy efficiency), iii) minimizing the expected cost, can be formulated, and that can give raise to numerous variants under user-defined constraints (e.g., maximum budget for code targeting specific needs. More generally, the core message is sanitization). Suppose vi,j denotes the number of violation of that we can setup code sanitization plans supported by quan- type i in the j-th application. Let us denote with Cf ixi the titative reasoning, by exploiting prediction and considering average expected cost to remove violations of type j (that quality and energy objectives together in the formulation. can be obtained from historical data about violation fixes). IV. ROADMAP Considering the indication about correlation weights, ρdi and ρei , each violation type is characterized by Cf ixi , ρdi and As first step, we are going to validate the prediction and the ρei . Furthermore, each rule is assigned a “severity” (Sdi and optimization steps separately. As for the prediction, we mean Sei , in the defect and energy case, respectively) by domain to extend the set of test applications (across multiple domains) expert, who usually wants to give priorities to the removal of to improve the generality of predictive models, and to extend violations of some types, according to their expected impact the set of metrics to improve their accuracy (e.g., with metrics (we, discretize the severity levels in the [0,1] range, by equally from version control systems [13]). As for optimization, we spacing n levels: S1 = 1/n, S2 = 2/n ...Sn = 1)2 will first validate the model numerically to get feedback and It would be extremely expensive to eliminate all violations refine the model (e.g., by comparing metaheuristics on sample of all types from all applications. The algorithm finds the best problems). Then, the whole method will be experimented on tradeoffs optimizing the objectives. Let us denote the decision real case studies. As we did in the past for the mentioned variables xi,j , being the number of violations of the i-th type cases, we aim at exploiting the strong collaborations with our that the algorithm proposes to eliminate from application j. industrial partners in order to create a long-term testbed where The three objectives are expressed as follows: this kind of models can be validated empirically. n h m X X i R EFERENCES Min! C= (xi,j · Cf ixi ) [1] E. Capra, G. Formenti, F. C., and S. Gallazzi, “The impact of mis soft- j i=1 ware on it energy consumption,” in European Conference on Information n XhX m i Systems (ECIS), 2010. Max! Q= (xi,j · ρdi · Sdi · ωdj ) (2) [2] E. Capra, C. Francalanci, and S. Slaughter, “Is software “green”? application development environments and energy efficiency in open j i=1 n m source applications,” Information and Software Technology, vol. 54, XhX i no. 1, pp. 60–71, 2012. Max! E= (xi,j · ρei · ·Sei ωej ) [3] “ISO/IEC 25010 - Software Product Quality.” http://iso25000. j i=1 com/index.php/en/iso-25000-standards/iso-25010, 2011. [4] N. Nagappan and T. Ball, “Static analysis tools as early indicators of subject to 0 ≤ xi,j ≤ vi,j and to the following bounds: pre-release defect density,” in Proc. 27th Int. Conference on Software C < C∗ (Maximum Budget) Engineering, pp. 580–586, ACM, 2005. [5] R. Pietrantuono, S. Russo, and K. Trivedi, “Software reliability and test- Q > Q∗ (Minimum Quality) ing time allocation: An architecture-based approach,” IEEE Transactions (3) on Software Engineering, vol. 36, no. 3, pp. 323–337, 2010. E > E∗ (Minimum Energy Reduction) [6] G. Carrozza, R. Pietrantuono, and S. Russo, “Dynamic test planning: a study in an industrial context,” International Journal on Software Tools Further constraints on violations can be added, such as: for Technology Transfer, vol. 16, no. 5, pp. 593–607, 2014. Xn [7] M. Bessi, E. Capra, and C. Francalanci, “A benchmarking methodology 1 Pm to assess the energy performance of mis applications,” in 34th Interna- N i=1 (vi,j − xi,j ) ≤ v̄max , meaning that the residual tional Conference on Information Systems, 2013. j=1 [8] G. Agosta, M. Bessi, E. Capra, and C. Francalanci, “Automatic mem- number of average violations must be less than (or equal oization for energy efficiency in financial applications,” Sustainable to) a target v̄max . The first objective function minimizes the Computing: Informatics and Systems, vol. 2, no. 2, pp. 105 – 115, 2012. expected cost for violation removal. The second objective IEEE International Green Computing Conference (IGCC 2011). [9] G. Carrozza, M. Cinque, U. Giordano, R. Pietrantuono, and S. Russo, function assesses the expected quality increase (to maximize) “Prioritizing correction of static analysis infringements for cost-effective as sum of violations xi,j to remove weighted by: the likelihood code sanitization,” in Proceedings of the Second International Workshop that violation of type i is actually correlated to defects, the on Software Engineering Research and Industrial Practice, SER&IP ’15, (Piscataway, NJ, USA), pp. 25–31, IEEE Press, 2015. likelihood that the interested application j is actually defect- [10] G. Carrozza, R. Pietrantuono, and S. Russo, “Defect analysis in mission- prone, and by the severity of the rule assigned by domain critical software systems: a detailed investigation,” Journal of Software: expert. Similarly, the third objective assesses the reduction Evolution and Process, vol. 27, no. 1, pp. 22–49, 2015. [11] R. Plosch, H. Gruber, A. Hentschel, G. Pomberger, and S. Schiffer, “On of energy consumption (to maximize) if we remove xi,j the relation between external software quality and static code analysis,” violations. The solution provides a matrix with the amount in 32nd IEEE Software Engineering Workshop, pp. 169–174, 2008. of violations engineers should remove for each type and for [12] X. Yang, K. Tang, and X. Yao, “A learning-to-rank approach to software defect prediction,” IEEE Transactions on Reliability, vol. 64, no. 1, each application, as well as the estimate of removal cost, the pp. 234–246, 2015. expected quality in terms of defects and the expected energy [13] D. G. Cavezza, R. Pietrantuono, and S. Russo, “Performance of de- fect prediction in rapidly evolving software,” in Release Engineering 2 For instance, reliability rules might be assigned a higher defectiveness- (RELENG), 2015 IEEE/ACM 3rd International Workshop on, pp. 8–11, related severity; similarly, for performance rules in the energy case. 2015.