An effort allocation method to optimal code
                 sanitization for quality-aware
                energy efficiency improvement
                        Marco Bessi∗ , Gabriella Carrozza† , Roberto Pietrantuono‡ , Stefano Russo‡ ,
                                     ∗ CAST Software, Via San Vittore n. 49, 20123 Milan, Italy
          † Accenture Operations - Infrastructure Services Manager, Piazzale dell’Industria n. 40, 00144, Rome, Italy
                   ‡ Università degli Studi di Napoli Federico II, DIETI, Via Claudio 21, 80125, Naples, Italy.

   Email: M.Bessi@castsoftware.com, gabriella.carrozza@accenture.com, {roberto.pietrantuono, stefano.russo}@unina.it


   Abstract—Software energy efficiency has been shown to re-           multi-facet attribute, including reliability, usability, portability,
markably affect the energy consumption of IT platforms. Besides        security, performance, efficiency [3], which are all affected by
“performance” of the code in efficiently accomplishing a task, its     how the software is designed and ultimately implemented. In
“correctness” matters too. Software containing defects is likely to
fail and the computational cost to complete an operation becomes       this position paper, we first explore the relationship between
much higher if the user encounters a failure. Both performance-        code quality and performance-related energy consumption as
related energy efficiency of software and its defectiveness are im-    well as between code quality and software defectiveness. Code
pacted by the quality of the code. Exploiting the relation between     quality is expressed in terms of degree of adherence to pre-
code quality and energy/defectiveness attributes is the main idea      defined sets of well-programming rules. The violation of rules
behind this position paper. Starting from the authors’ previous
experience in this field, we define a method to first predict the      is checked by means of a well-known tool, the Automatic
applications of a software system more likely to impact energy         Static Analysis (ASA) [4]. Some preliminary results are pre-
consumption and with higher residual defectiveness, and then           sented in this regard. Then, a method is proposed to:
to exploit the prediction for optimally scheduling the effort for
code sanitization – thus supporting, by quantitative figures, the
quality assurance teams’ decision-makers.                                • Predict the applications of a software system more likely
   Index Terms—Reliability allocation, effort allocation, test plan-       to impact performance-related energy consumption as
ning, quality assurance, static analysis, prediction, optimization         well as the applications more likely to contain defects.
                                                                           This is accomplished by machine learning algorithms
                                                                           to train predictive models inferring the relation between
                        I. I NTRODUCTION
                                                                           ASA rule violations and energy/defectiveness attributes.
   Energy consumption of IT systems is increasingly becom-                 By such a knowledge, quality assurance team leaders can
ing a concern, especially for large infrastructure managers.               rank the applications according to the criticality (both
Research on Green IT is developing at a rapid pace, spanning               in terms of performance-related energy consumption and
from fields centered on hardware device and architectural                  defectiveness) before the testing stage starts, and plan
design, to software design and development, as well as to                  actions to improve the code where needed.
operations management and usage practices.                               • Optimize the allocation of the effort for code improve-
   Software energy efficiency has been shown to remarkably                 ment to each application, based on the prediction results.
affect energy consumption of all the other infrastructural levels          Effort allocation models are a very useful practice for
[1] [2]. The efficiency of the code is indeed related to the com-          quality assurance planning on quantitative bases, espe-
putational expense needed to perform the intended task. We                 cially in large systems [5] [6]. We formulate a multi-
claim that the overall energy efficiency due to software is not            objective optimization model suggesting the best dis-
only related to how fast the code performs a task, but also to             tribution of effort i) across applications and ii) across
the correctness of its execution. A software containing defects            ASA rules, to sanitize the code under specific objectives
is likely to fail upon a request is made; the computational cost           in terms of: cost to fix violations, expected impact on
to accomplish a requested operation becomes much higher if                 energy, expected defectiveness reduction.
the user encounters a failure and must retry the operation, or,
even worse, if the system has to be restarted or switched off
in favour of a replica.                                                In the following, we first present our experience on both
   Both performance-related efficiency of software and its             energy efficiency measurement and software defects analysis
defectiveness are related to the quality of the code. Quality is a     (Section II), with preliminary results about correlation with
                                                                       ASA rule violations; then, the method is presented in Section
  † G. Carrozza was with CAST Software when this paper was prepared.   III; a roadmap is in Section IV.
      II. PAST E XPERIENCE AND P RELIMINARY R ESULTS                    Resource usage by the application can be estimated by
                                                                     dynamic analysis, namely by means of profiling during testing.
A. ASA-Energy Relationship
                                                                     However, there would be three problems with this approach: i)
   Measurement (and, in our case, prediction) of energy ef-          the result would depend on the goodness of test cases, namely
ficiency is as much important as underestimated. Previous            on their ability to well represent the operational usage; ii) the
studies shown that 86% of ICT departments in UK do not               result does not provide direct hints about how it is possible
know their CO2 emissions and 80% of companies do not know            to optimize the code for performance improvement; iii) the
their electric bill [1]. In the work we have done about this topic   result of such an evaluation could be provided too late, as
[7] [8], we introduced and tested a model to measure software        (operational) testing is done at the end of the lifecycle, whereas
application energy consumption that decouples the usage of           developers would like to benefit from the feedback earlier.
the computational resources from their energy consumption.              We therefore explore the performance indications that can
Such a method allows estimating the energy efficiency of             be given by ASA on the source code. In fact, the adherence
software applications independently of the hardware infras-          or violation to predefined programming rules and patterns is
tructure. Let us model the usage of resources by monitoring          able to suggest possible performance issues. This, in turn, is
three key entities: the CPU, the storage (e.g., let us assume        likely to impact energy consumption, as software with higher
a DB that is the common storage means on large systems),             performance is expected to be more energy-efficient. We are
the network. According to our model, energy consumption is           exploring if there is any relation between Automatic Static
expressed as:                                                        code Analysis (ASA) rules and the usage pattern factor γ,
        ET OT = Eidle + ECP U + EDB + EN ET
                                                                     which is the most relevant contributor to ET OT . As prelim-
                                                               (1)   inary result we are obtaining, we have observed that some
        = Pidle · tex + γ · βu + DDB · eDB + DN ET · eN ET           ASA rules, defined by means of the CAST c tool, are highly
                                                                     correlated to the γ metric, as reported in Table I.
in which:
  • Pidle is the power absorption of the CPU in the idle state       TABLE I: Spearman correlation among # violations with γ for each group
    and tex is the time in which the system is idle;                 of rules. Text in bold indicates confidence > 95%
  • γ is the usage of the CPU, and can be written as:                                  Programming Rules                     Spearman ρ
     1 t2
      R
    f t1 U% (t)dt · ω(t) where: U% is the percentage usage                         Avoid using Driver Manager                   0.93
    of the CPU at time t, ω is a function denoting the                      Avoid the use of “Instanceof” inside loops          0.87
    application-dependent consumption pattern, which may                            Avoid using Hashing Table                   0.85
                                                                           Avoid String initialization with String objects      0.85
    depend on many factors (e.g., number of concurrent users,                  Avoid String concatenation in loops              0.75
    type of operations, type of application, state of the DB,                   Avoid using Dynamic instantiation               0.72
    etc.); f is the clock frequency of the CPU; t1 and t2 are
    the initial and final measurement time;                            Based on this result, we are confident that the infringements
  • βu is the power absorption function of the CPU per               in terms of code rule violations as obtained by ASA are
    percent usage (it is approximated by a constant assuming         potentially correlated with energy efficiency. This is going to
    the power grows linearly with usage);                            be exploited for our method formulation.
  • DDB is the amount of data (in number bytes) exchanged
    with the DB, while eDB is the energy consumed to                 B. ASA-Defects Relationship
    exchange 1 byte with the DB;                                        The quality of software can be well assessed by ASA. In
  • DN ET is the amount of data (in number bytes) exchanged          our previous experience [9], we have studied the quality of
    with the network, while eN ET is the energy consumed             large-scale mission-critical software systems produced by a big
    to exchange 1 byte with the network.                             Italian company, leader in the market of defence, homeland
   In this expression, the factors γ, DDB and DN ET are              security and protection, information security, avionics and
intrinsic characteristics of the software application, while βu ,    aerospace. The quality as perceived by the end user can be,
eDB and eN ET depend on the hardware infrastructure. The             however, different. User perceives system’s failures: one could
sole estimation of the former parameters allows computing the        have a very poor (internal) code quality in terms of violations,
energy consumption of the software application. In previous          but a good user experience. In fact, ASA violations are not
studies [1], it was shown that the consumption of storage is         necessarily defects: they can be either false warnings, when a
almost independent of usage, as dynamic RAMs are constantly          violation does not cause the software to fail, or can be actually
refreshed and most of the energy consumed by disks is used for       a defect, whose activation would lead the software to a failure.
spinning. In addition, focusing on CPU-intensive applications,       In [10], we therefore also explored the quality of such systems
the EDB and EN ET are negligible. Thus we assume that                in terms of actual defectiveness – an exercise that allowed us
energy mainly depends on the usage pattern γ. In [7], we             to assess not only the product but also the process quality.
have tested the model by using both ammeter clamp directly              Since defects are detected at later stages (i.e, during testing
connected to the server’s CPU electric source and specific HP        or even operation) compared to violations (detected soon
Performance Center’s instrumentations.                               after coding), their treatment is much more expensive. Thus,
it makes sense to try anticipating the knowledge about the                  The steps for prediction are:
presence of defects as earlier as possible. To this aim, we                 • Retrieving data from ASA reports about a set of appli-
are exploring any possible relationship between programming                   cations of a tested or an operational system. Tracking of
rules violation and the presence of software defects. There                   defects found during testing and of energy consumption
is indeed a reasonable expectation that ASA violations are                    measurements1 should be available.
correlated to defects (as proven in some past papers [4] [11]),             • Training of prediction models. We mean to test several
and that some types of violations (namely, some rules) are                    different models, by varying both machine learning algo-
more correlated than others. In Table II, we report preliminary               rithms (we opt for ranking algorithms, such as regression-
results about such correlation we are obtaining on a dataset                  based techniques [12], since binary classifiers provide no
of 29 components of the same mission-critical system that we                  information about the relative weight of each sample),
have studied in [9]. There is a significant correlation (with                 and the variables of interest (e.g., relying on absolute
95% of confidence) in several cases. Results can however be                   numbers or relative to the code size).
improved by considering rules at finer grain (i.e., not by group)           • Test of the models by 10-fold cross-validation, repeated
than what we did so far. As in the previous case, this output                 N = 30 times per each classifier (for statistical validity),
is going to be exploited for the method formulation.                          comparison and choice of the best model. Criteria for
                                                                              comparison can be several (the most used ones are the
TABLE II: Spearman correlation among # violations with defects for each       response variable value in the top 20% of samples and
group of rules. Text in bold indicates a confidence > 95%
                                                                              Fault Percentile Average (FPA) [12]).
                  Programming Rules          Spearman ρ
                                                                            • Actual prediction on applications for which we know
                      Metrics Rules              0.85                         only the ASA violations but anything about energy con-
               Naming Convention Rules           0.46
                      Possible Bugs              0.45                         sumption (and defects) yet. The output is a list of appli-
                Coding Convention Rules          0.45                         cations rated by their expected energy consumption and
                    Formatting Rules             0.45                         a list rated by expected defectiveness. The ranking-based
                 Memory and Resource             0.44                         algorithms associate a ranking score with each sample in
                    Comments Rules               0.41
                     MISRA Rules                 0.37                         the list. We call it pj (and qj , for energy consumption) and
                                                                                                                     pj −min(pj )
                        OOP Rules                0.29                         normalize it in [0,1]: ωdj = max(p         j )−min(pj )
                                                                                                                                       (similarly
                   Optimization Rules            0.29                         for qj , obtaining ωej ).
                         Security                0.27
               Threads & Synchronization         0.27                       • Rules rating. We analyze the impact of attribute se-
                        Resources                0.17                         lection, by an attribute ranking algorithm such as the
                   Initialization Rules          0.06                         Information Gain, which rates an attribute according
                    Exceptions Rules             0.01
                                                                              to the gain of information obtained by knowing that
                                                                              attribute. The top rules useful for prediction are derived,
               III. O PTIMAL CODE SANITIZATION                                and each rule i is assigned a score si (hi for energy
                                                                              consumption list) denoting the importance of that rule for
   Based on the preliminary results about correlation, we
                                                                              prediction. The scores are then normalized in [0,1] to get
propose the following allocation method. The method includes                                                                      si −min(si )
                                                                              the relative weight of each rule: ρdi = max(s           i )−min(si )
two steps: i) a prediction step, in which we run machine                                     hi −min(hi )
learning algorithms to build predictive models able of spotting                and ρei = max(h   i )−min(hi )
                                                                                                              , in the two cases of defec-
the most relevant applications and ASA rules, from both                        tiveness and energy consumption, respectively.
energy and defectiveness point of view; ii) an optimization               Prediction information (ωdj , ρdi and ωej , ρei ) is used to
step, in which we exploit the output of prediction to orient the          parametrize the optimization model. The steps we are going
fixing of ASA rule violations in terms of: which applications             to implement for optimization are in the next Section.
and which rules should be prioritized in the correction. The
                                                                          B. Optimization
goal is to provide quality team leaders with a quantitative
means to figure out how to sanitize the code to pursue well-                 Optimization of violation fixing is conceived to exploit the
defined objectives of energy efficiency, quality and cost.                above predictions to prioritize applications and rules to fix,
                                                                          according to predefined objectives and constraints. We aim at
A. Prediction                                                             providing the team leader with sets of alternative solutions
  The goal of prediction is twofold:                                      for which the expected impact on cost, quality and energy
                                                                          efficiency is known. Thus a multi-objective optimization
  • Predict applications more likely containing defects and
                                                                          approach is formulated. The main objective of the proposed
    applications more prone to consume energy before the
                                                                          approach can be stated as follows: suggest the number and type
    integration/system/acceptance testing starts;
                                                                          of violations to remove and the applications from which they
  • Predict which rules more likely suggest the presence of
                                                                          should be removed in order to get the best tradeoffs among: i)
    defects and which ones suggest inefficiency bottlenecks
    (besides the side feedback of identifying the most critical             1 See [7] for a discussion about measurement of energy consumption; for
    rules that programmers are keen on violating).                        our purpose, a profiling tool can suffice to get realistic patterns of γ.
minimizing the expected residual defectiveness (i.e., maximize                   consumption if sanitization actions are taken as suggested.
quality), ii) minimizing the expected energy consumption (i.e.,                  Note that this is just an instance of multi-objective models that
maximize energy efficiency), iii) minimizing the expected cost,                  can be formulated, and that can give raise to numerous variants
under user-defined constraints (e.g., maximum budget for code                    targeting specific needs. More generally, the core message is
sanitization). Suppose vi,j denotes the number of violation of                   that we can setup code sanitization plans supported by quan-
type i in the j-th application. Let us denote with Cf ixi the                    titative reasoning, by exploiting prediction and considering
average expected cost to remove violations of type j (that                       quality and energy objectives together in the formulation.
can be obtained from historical data about violation fixes).
                                                                                                          IV. ROADMAP
Considering the indication about correlation weights, ρdi and
ρei , each violation type is characterized by Cf ixi , ρdi and                      As first step, we are going to validate the prediction and the
ρei . Furthermore, each rule is assigned a “severity” (Sdi and                   optimization steps separately. As for the prediction, we mean
Sei , in the defect and energy case, respectively) by domain                     to extend the set of test applications (across multiple domains)
expert, who usually wants to give priorities to the removal of                   to improve the generality of predictive models, and to extend
violations of some types, according to their expected impact                     the set of metrics to improve their accuracy (e.g., with metrics
(we, discretize the severity levels in the [0,1] range, by equally               from version control systems [13]). As for optimization, we
spacing n levels: S1 = 1/n, S2 = 2/n ...Sn = 1)2                                 will first validate the model numerically to get feedback and
   It would be extremely expensive to eliminate all violations                   refine the model (e.g., by comparing metaheuristics on sample
of all types from all applications. The algorithm finds the best                 problems). Then, the whole method will be experimented on
tradeoffs optimizing the objectives. Let us denote the decision                  real case studies. As we did in the past for the mentioned
variables xi,j , being the number of violations of the i-th type                 cases, we aim at exploiting the strong collaborations with our
that the algorithm proposes to eliminate from application j.                     industrial partners in order to create a long-term testbed where
The three objectives are expressed as follows:                                   this kind of models can be validated empirically.
                            n h m
                            X  X                         i                                                      R EFERENCES
             Min!     C=                 (xi,j · Cf ixi )                         [1] E. Capra, G. Formenti, F. C., and S. Gallazzi, “The impact of mis soft-
                              j    i=1                                                ware on it energy consumption,” in European Conference on Information
                              n
                            XhX     m                                 i               Systems (ECIS), 2010.
             Max!      Q=                 (xi,j · ρdi · Sdi · ωdj )        (2)    [2] E. Capra, C. Francalanci, and S. Slaughter, “Is software “green”?
                                                                                      application development environments and energy efficiency in open
                              j     i=1
                              n      m                                                source applications,” Information and Software Technology, vol. 54,
                            XhX                                  i
                                                                                      no. 1, pp. 60–71, 2012.
             Max!      E=                 (xi,j · ρei · ·Sei ωej )
                                                                                  [3] “ISO/IEC 25010 - Software Product Quality.” http://iso25000.
                              j     i=1                                               com/index.php/en/iso-25000-standards/iso-25010, 2011.
                                                                                  [4] N. Nagappan and T. Ball, “Static analysis tools as early indicators of
subject to 0 ≤ xi,j ≤ vi,j and to the following bounds:                               pre-release defect density,” in Proc. 27th Int. Conference on Software
              C < C∗              (Maximum Budget)                                    Engineering, pp. 580–586, ACM, 2005.
                                                                                  [5] R. Pietrantuono, S. Russo, and K. Trivedi, “Software reliability and test-
              Q > Q∗              (Minimum Quality)                                   ing time allocation: An architecture-based approach,” IEEE Transactions
                                                                           (3)
                                                                                      on Software Engineering, vol. 36, no. 3, pp. 323–337, 2010.
              E > E∗              (Minimum Energy Reduction)                      [6] G. Carrozza, R. Pietrantuono, and S. Russo, “Dynamic test planning: a
                                                                                      study in an industrial context,” International Journal on Software Tools
Further constraints on violations can be added, such as:                              for Technology Transfer, vol. 16, no. 5, pp. 593–607, 2014.
  Xn                                                                              [7] M. Bessi, E. Capra, and C. Francalanci, “A benchmarking methodology
1
      Pm
                                                                                      to assess the energy performance of mis applications,” in 34th Interna-
N       i=1 (vi,j − xi,j ) ≤ v̄max , meaning that the residual
                                                                                      tional Conference on Information Systems, 2013.
   j=1
                                                                                  [8] G. Agosta, M. Bessi, E. Capra, and C. Francalanci, “Automatic mem-
number of average violations must be less than (or equal                              oization for energy efficiency in financial applications,” Sustainable
to) a target v̄max . The first objective function minimizes the                       Computing: Informatics and Systems, vol. 2, no. 2, pp. 105 – 115, 2012.
expected cost for violation removal. The second objective                             IEEE International Green Computing Conference (IGCC 2011).
                                                                                  [9] G. Carrozza, M. Cinque, U. Giordano, R. Pietrantuono, and S. Russo,
function assesses the expected quality increase (to maximize)                         “Prioritizing correction of static analysis infringements for cost-effective
as sum of violations xi,j to remove weighted by: the likelihood                       code sanitization,” in Proceedings of the Second International Workshop
that violation of type i is actually correlated to defects, the                       on Software Engineering Research and Industrial Practice, SER&IP ’15,
                                                                                      (Piscataway, NJ, USA), pp. 25–31, IEEE Press, 2015.
likelihood that the interested application j is actually defect-                 [10] G. Carrozza, R. Pietrantuono, and S. Russo, “Defect analysis in mission-
prone, and by the severity of the rule assigned by domain                             critical software systems: a detailed investigation,” Journal of Software:
expert. Similarly, the third objective assesses the reduction                         Evolution and Process, vol. 27, no. 1, pp. 22–49, 2015.
                                                                                 [11] R. Plosch, H. Gruber, A. Hentschel, G. Pomberger, and S. Schiffer, “On
of energy consumption (to maximize) if we remove xi,j                                 the relation between external software quality and static code analysis,”
violations. The solution provides a matrix with the amount                            in 32nd IEEE Software Engineering Workshop, pp. 169–174, 2008.
of violations engineers should remove for each type and for                      [12] X. Yang, K. Tang, and X. Yao, “A learning-to-rank approach to software
                                                                                      defect prediction,” IEEE Transactions on Reliability, vol. 64, no. 1,
each application, as well as the estimate of removal cost, the                        pp. 234–246, 2015.
expected quality in terms of defects and the expected energy                     [13] D. G. Cavezza, R. Pietrantuono, and S. Russo, “Performance of de-
                                                                                      fect prediction in rapidly evolving software,” in Release Engineering
   2 For instance, reliability rules might be assigned a higher defectiveness-        (RELENG), 2015 IEEE/ACM 3rd International Workshop on, pp. 8–11,
related severity; similarly, for performance rules in the energy case.                2015.