=Paper= {{Paper |id=Vol-2361/short13 |storemode=property |title=Actionable Measurements – Improving The Actionability of Architecture Level Software Quality Violations |pdfUrl=https://ceur-ws.org/Vol-2361/short13.pdf |volume=Vol-2361 |authors=Wojciech Czabański,Magiel Bruntink,Paul Martin |dblpUrl=https://dblp.org/rec/conf/benevol/CzabanskiBM18 }} ==Actionable Measurements – Improving The Actionability of Architecture Level Software Quality Violations== https://ceur-ws.org/Vol-2361/short13.pdf
          Actionable Measurements – Improving The
          Actionability of Architecture Level Software
                       Quality Violations
                Wojciech Czabański                             Magiel Bruntink                             Paul Martin
              Institute for Informatics                   Software Improvement Group                  Institute for Informatics
             University of Amsterdam                      Amsterdam, the Netherlands                 University of Amsterdam
           Amsterdam, the Netherlands                      Email: m.bruntink@sig.eu                 Amsterdam, the Netherlands
       Email: wojciech.czabanski@gmail.com                                                           Email: p.w.martin@uva.nl




   Abstract—When system components become more coupled                       Currently, Better Code Hub provides an overview of com-
over time, more effort must be dedicated to software architecture         ponents and the interactions between them, such as incoming
refactoring. Tightly coupled components present higher risk—              and outgoing calls to and from modules in other compo-
they make the system more difficult to understand, test and
modify. In order to allocate the refactoring effort effectively, it       nents. It does not however provide specific guidance as to
is crucial to identify how severely components are coupled and            how the developer can reduce the component coupling and
which areas of the system involve the most risk to modify.                improve their independence. Attempts have been made to
   In this paper we apply the concept of architecture hotspots            generate suggestions for improving modularity by suggesting
together with the Software Improvement Group Maintainability              move module refactoring operations, framing the problem
Model to identify violations in architecture design. We propose a
prototype tool that can identify and present architecture smells to       as a multi-objective search [4]. Such refactoring operations
developers as refactoring recommendations. We then apply the              however may either not improve modularity or make the
tool to open-source projects, validating our approach through             codebase less semantically consistent. Identifying patterns in
interviews with developers. Developers found the hotspots com-            poorly modularized code can be a starting point for devising
prehensible and relevant, but there is room for improvement with          better recommendations as to how the components can be
regards to actionability.
                                                                          decoupled better. Thus we apply the architecture hotspot
                       I. I NTRODUCTION                                   patterns described by Mo et al. [5] to conduct a study on open
   Software maintainability is an internal quality of a soft-             source projects in order to evaluate whether hotspots can be
ware product that describes the effort required to maintain a             found in open source projects and used to provide refactoring
software system. Low maintainability is connected with low                recommendations. Furthermore we investigate whether pre-
comprehensibility. Glass argues that the most challenging part            senting quality violations based on hotspots helps developers
of maintaining software is understanding the existing prod-               decrease coupling between components. In order to validate
uct [1]. What follows is that code which is hard to understand            our approach, we construct a hotspot detector, integrate it with
is also difficult to modify in a controlled manner and test for           the Better Code Hub analysis tool and visualise the hotspots.
defects. If changes are difficult to introduce and code hard              Based on initial feedback from developers, indicating that
to understand, the probability of bugs being introduced is                the suggestions are comprehensible and relevant we finally
very high, which raises the cost of further developing and                consider how to build upon our work in future. We look to
maintaining the system.                                                   improve the tool by adding more detailed information which
   We focus on the Maintainability Model developed by the                 triggers the hotspot detection.
Software Improvement Group (SIG) [2]. From this model,
10 guidelines have been derived to help developers quickly
evaluate the maintainability of their code and provide ac-
                                                                                                II. BACKGROUND
tionable suggestions to increase its quality, such as keeping
complexity of units low and interfaces small. The guidelines
are implemented in the Better Code Hub tool [3], which                       Program analysis is the process of analysing the behaviour
applies them to provide feedback to developers. In particular             of a software program with regards to a certain properties such
we look at component independence, because it is considered               as correctness, robustness or maintainability [6]. There exist
the most challenging to improve, based on user feedback. Our              a number of means of program analysis already defined in
goal is to provide developers with more actionable feedback               research literature, including both static and dynamic analysis,
in addition to the diagnosis provided by Better Code Hub, so              maintainability guidelines and detection of ‘code smells’. We
that they can improve the maintainability of the code.                    survey a few of these approaches below.




                                                                      1
A. Static and dynamic program analysis                                                                  Table I
                                                                                   H OTSPOT INSTANCES OVERVIEW IN SELECTED SYSTEMS
   Source code is commonly used as input for static analysis
tools. In certain cases other inputs are used as well such




                                                                                                                    Inheritances




                                                                                                                                   cycles (files)


                                                                                                                                                    cycles (files)
                                                                                                                    Unhealthy
                                                                                               Language
as revision history. Syntactic analysis and software metrics




                                                                                                          LOC (k)




                                                                                                                                                    Package
                                                                                                                                   module
                                                                                   System




                                                                                                                                   Cross-
computation involves analysing the source code of a system,




                                                                                                                    (files)
often represented as a graph. Examples of tools for obtaining
the source code graph and metrics include Understand1 , the              Bitcoin               C++          120      16 (75)       31 (108)          49 (117)
Software Analysis Toolkit from SIG2 and SonarQube [7]. Our               Jenkins               Java         100     80 (170)       10 (403)         513 (372)
intention was to improve the actionability of measurements.              jME                   Java         240     69 (436)       59 (402)         335 (410)
                                                                         JustDecompileEngine   C#           115     79 (290)        8 (205)           92 (89)
In this respect, SonarQube was aimed at integration within a             nunit                 C#            59      24 (94)         6 (62)           62 (74)
CI/CD pipeline, making it difficult to use in a research setting,        openHistorian         C#            72      12 (37)        31 (89)          63 (114)
because of the existing pipeline and the time limitations                OpenRA                C#           110     19 (150)       35 (273)         202 (206)
                                                                         pdfbox                Java         150     64 (252)       23 (379)         261 (301)
of the project make it unfeasible to modify it. Understand               Pinta                 C#            54      17 (57)       12 (112)          109 (91)
exported the dependency graph to a commonly used format                  ShareX                C#            95      11 (76)       38 (205)         189 (248)
and supported a variety of programming languages, but was
challenging to integrate with the SIG infrastructure, which
left us choosing the Software Analysis Toolkit to pre-process           to an increased software defect density [5]. Macia et al.
source code for further analysis.                                       designed a DSL for describing architectural concerns and code
   We also investigated dynamic analysis methods review-                anomalies [17]. In addition to the source code and metrics,
ing tools such as Valgrind [8], Google Sanitizers [9] and               they use the metadata defined using a DSL to detect code
Daikon [10]. We chose to focus however on analysing source              anomalies in a tool (SCOOP). Martin focused on framing
code only—to use dynamic analysis, the executable for every             component design guidelines using 3 principles; violating
project would need to be built locally. In addition to that, the        those principles constitutes an architectural smell. [18] Garcia
reviewed tools detect possible faults in the code as opposed            et al. define four architecture smells [19].
to analysing maintainability.                                              We believe that connecting maintainability measurements
                                                                        with architecture smells will allow us to provide more action-
B. SIG Maintainability Model
                                                                        able and relevant refactoring recommendations for developers
   SIG developed a maintainability model for software based             using Better Code Hub compared with relying on metrics
on empirical software engineering research [11]. They use a             alone. It will also make it possible to offer advice on how
in-house tool, the Software Analysis Toolkit, to conduct static         to deal with the detected type of smell. Only the classification
analyses of software systems. The Maintainability Model is              from Mo et al. draws a clear connection between the archi-
also accessible for GitHub developers through the Better Code           tectural smells and maintainability, which is why we chose to
Hub tool, which conducts an analysis of a repository and                use it to enhance the refactoring recommendations generated
evaluates it against the ten SIG guidelines [3]. In our paper           by Better Code Hub [5].
we focus on the ‘Couple Architecture Components Loosely’
guideline, which advises minimising the number of throughput                                         III. E XPERIMENTS
components. These have a high fan-in/fan-out values [12].               A. Data sources
Similarly to modules, components that act as intermediaries
                                                                           We selected a number of GitHub repositories that are both
are more tightly coupled and more difficult to maintain in
                                                                        available as open source projects and contain the majority of
isolation.
                                                                        code in languages that are supported by Better Code Hub as
C. Software design defect classifications                               data sources to validate our approach. The projects we targeted
                                                                        needed to have between 50k and 200k lines of code, be at least
   Low maintainability can manifest itself by the presence
                                                                        three years old and be written in a strongly and statically typed
of certain antipatterns, called ‘code smells’. The concept of
                                                                        language supporting inheritance (e.g. Java, C# or C++).
‘code smells’ as software design defects was popularised by
Fowler [13]. We looked into both architecture and design                B. Hotspot distribution
smells. Suranarayana and Sharma proposed that architecture
                                                                          We used the Understand code analyser to generate source
smells represent design violations impacting both component
                                                                        graphs which we then fed into an existing tool called Titan
and system levels [14]. Sharma provided the definition of a
                                                                        tool [20], which identifies architecture hotspots. We aggre-
design smell [15]; Fontana et al. investigated Java projects
                                                                        gated the hotspot quantities and types per analysed system
for recurring patterns of architecturally relevant code anoma-
                                                                        in Table I. The file count indicates how many distinct files
lies [16]. Architecture hotspot smells are code anomalies
                                                                        contain hotspots. A file can be a part of multiple hotspots, but
introduced in the paper from Mo et al. which are related
                                                                        we count the files only once.
  1 https://scitools.com/                                                 In order to reason about the impact of hotspots on the
  2 https://www.sig.eu/                                                 overall maintainability of projects, we compare the number




                                                                    2
                                 Table II                                           can be reached by the Edge component and presented by the
                  H OTSPOT IMPACT ON SELECTED SYSTEMS                               visualisation part of the prototype.
                                                                                           Detector: The control flow of the detector is as follows:




                                                            Files
                                                    affected by
                                     by hotspots
                                                                                    first, the class hierarchies are extracted from the source code




                                                                    measured
                          analyzed




                                                                    by BCH
                                                    hotspots
                                     affected
                                                                                    graph as separate subgraphs; second, each hierarchy is checked
         System




                          Files


                                     Files
                                                                                    for presence of two types of the Unhealthy Inheritance hotspot:




                                                                    CI
                                                    %
                                                                                    internal and external. Internal Unhealthy Inheritance hotspot
 Bitcoin                   675                117       17.33%         0.9894       is a class hierarchy in which at least one base class depends
 Jenkins                  1112                403       36.24%         0.9868
 jME                      2077                436       20.99%         0.6812       on or refers to a derived class. External Unhealthy Inheritance
 JustDecompileEngine       814                290       35.62%         0.8311       hotspot is a class hierarchy which has client classes that refer
 nunit                     781                 94       12.03%         0.6329       to both based derived classes of the hierarchy at the same time.
 openHistorian             726                114       15.70%         0.9572
 OpenRA                   1157                273       23.60%         0.8362       While detecting internal hotspots we investigate the classes
 pdfbox                   1279                379       29.63%         0.6283       and edges that belong to the hierarchy. For external hotspots
 Pinta                     400                112       28.00%         0.7421       we also check the neighbourhood of the class—clients of the
 ShareX                    677                248       36.63%         0.6842
                                                                                    class hierarchy, being classes which have a dependency on any
                                                                                    of the classes in the analysed hierarchy.
of files affected by hotspots with the number of code files in                                        IV. P ROTOTYPE EVALUATION
the project. Kazman et al. show that files containing hotspots                         The prototype evaluation involved integration with Better
are more bug-prone and exhibit maintenance problems, from                           Code Hub and the gathering of feedback via structured inter-
which we infer that higher percentage of files affected by                          views with developers who used the prototype. We intended
hotspots makes a codebase less maintainable [5]. The percent-                       evaluating the comprehensibility, relevance and actionability of
age of file affected by hotspots is then juxtaposed with the                        the refactoring recommendations by asking scaled questions
component independence metric (CI - percentage of source                            with a Likert scale [21]. Furthermore we asked developers
lines of code which are interface code or code which is called                      what would they need to make the feedback more actionable
from outside of the component in which it resides in and also                       and how would they address the problem. The integration of
calls code outside of the component) measured by Better Code                        the hotspot detection into the existing system involved two
Hub (BCH) in Table II.                                                              steps: generating refactoring recommendations and visualisa-
      Discussion: We expected the percentage of files affected                      tion.
by hotspots to be negatively correlated with component in-                             Refactoring recommendations are generated in two stages.
dependence (CI) (see table II). The correlation coefficient                         First, the detector part identifies hotspots and generates a
is -0.0162, indicating no correlation. Based on the above                           recommendation for every source node that is a part of a
analysis, this indicates that the overall impact of hotspots on                     hotspot. Secondly, recommendations are filtered and ordered.
the codebase may not be measurable using the Better Code                               The visualisation for the user contains three additions to
Hub’s Component Independence metric.                                                Better Code Hub: information about the number of hotspots
                                                                                    in the refactoring candidate, a visualisation of the hotspot in
C. Prototype                                                                        the refactoring candidate and contextual information about a
                                                                                    specific hotspot: what causes it, what its consequences are
   The research environment defined limitations on our inputs
                                                                                    and suggested resolution strategies. We chose to visualize the
and tools that we could use, therefore we decided to implement
                                                                                    hotspot as edges and vertices. It allows the user to manipulate
a detector for Better Code Hub based on the state-of-the-
                                                                                    the graph, by rearranging the nodes. Edges and vertices make
art hotspot approach described in [5]. However, we used the
                                                                                    it easier to convey more information visually such as type of
source code graph created by the Software Analysis Toolkit
                                                                                    dependency (inheritance, dependency, violating dependency)
as opposed to the Understand source code analyser.
                                                                                    or type of source node (parent, child, client). Thus the user
      Overview: The prototype consists of detector and vi-
                                                                                    process is as follows:
sualisation parts. The visualisation is a part of the Edge
                                                                                       1) As the user logs into the Better Code Hub, a list of
front-end component and only consumes the hotspot data
                                                                                           repositories is revealed.
produced by the detector. The detector itself is a part of the
                                                                                       2) Once the user enters the repository analysis screen, a list
GuidelineChecker component. In addition, the Scheduler is
                                                                                           of guidelines is shown with a tick or cross beside each
an orchestrating component which starts a Jenkins task and
                                                                                           indicating if the code in the repository is compliant.
notifies the Edge component that the analysis is finished. The
                                                                                       3) As the user reviews the details of a specific guideline, a
Jenkins component clones the source code repository, invokes
                                                                                           list of refactoring guidelines is provided for review.
the Software Analysis Toolkit which outputs a source code
                                                                                       4) In the hotspot visualisation screen the user can see the
graph generated from the repository and the GuidelineChecker
                                                                                           graph representing the hotspot visualised as a dynamic
checks the source graph against the guidelines. Our detector
                                                                                           force simulation3 which can then be manipulated.
is invoked as a part of the guideline check. Finally, the
analysis result is stored in a MongoDB database, where it                             3 https://github.com/d3/d3-force




                                                                                3
Figure 1. An example for the visualisation of an external Unhealthy Hierarchy                             VI. C ONCLUSION
hotspot.                                                                               To improve the actionability of architecture level code
                                                                                    quality violations we created a prototype tool that identifies
                                                                                    structural problems in the codebase which impact modularity.
                                                                                    We then provided refactoring recommendations based on the
                                                                                    identified problems and interviewed developers to gather their
                                                                                    feedback on comprehensibility, actionability and relevance of
                                                                                    the presented recommendations. The prototype refactoring tool
                                                                                    provides the following contributions:
                                                                                       • Detection of architecture smells in source code graphs.
                                                                                       • Refactoring recommendations to the user based on the
                                                                                          presence of hotspots.
                                                                                       • Visualisation of hotspots, emphasising those dependen-
                                                                                          cies negatively impacting modularity.
                                                                                       • Guidance for the users regarding the impact and structure
                                                                                          of hotspots occurring in the analysed codebase.
                                                                                    As part of our analysis of repositories, we performed:
                                                                                       • A study of the reliability of hotspot detection on statically
   5) Finally, we present the hotspot description which our                               typed languages (Java, C# and C++).
      prototype provides upon the user pressing the question                           • An analysis of the overall impact of code containing
      mark button in the upper left corner of the visualisation.                          hotspots on the system’s modularity.
In Figure 1 we present an example of an external Unhealthy                          A number of areas of future work have been identified:
Inheritance hotspot, a violation where a client class of the                             a) More structural problems: We limited our detector to
hierarchy refers to the base class and all of its children. In this                 one kind of hotspot. We also chose to use our own detector as
case the client class is UriConsoleAnnotator, the base class is                     opposed to Titan, with a different source code analyzer, which
AbstractMarkupText and the child classes are MarkupText and                         means that there may be a mismatch between the results [20].
SubText. The violations in this case are references from the                             b) More detailed information about the violation: We
UriConsoleAnnotator to all the classes in the hierarchy.                            only outline violating classes and dependencies. Using the
                                                                                    same data the feedback can be improved by providing the exact
                                                                                    calls along with the code snippets that trigger the violation.
                            V. D ISCUSSION                                               c) Co-evolutionary coupling reasons: Co-evolutionary
                                                                                    coupling is a term used to refer to classes which change
   For the evaluation we interviewed experienced developers.                        together in time. It is much more difficult to address co-
They had no prior experience with Better Code Hub and the                           evolutionary hotspots. Firstly, co-evolutionary relationship
codebase that they were assigned to evaluate the prototype on.                      data contains more noise. For example, a copyright header
Our aim was to devise recommendations which can be useful                           update will create a coupling between all the files in the
to a user who does not yet have intimate knowledge of the                           project. Also, co-evolutionary relationships stay in history of
system architecture and implementation.                                             the project. Secondly, it is more challenging to reason about
   We only had time to evaluate the approach on a few systems.                      the intention of the developer for changing files without a
We made an attempt to choose systems representing different                         structural coupling together. Nevertheless, it would be inter-
domains, architectures and languages; a broader test would be                       esting identify whether there are common reasons for co-
necessary to make sure that the conclusions do not stem from                        evolutionary hotspot pattern occurrences.
the selection bias. We hypothesise that the findings should                              d) Hotspot prioritization: We did not explicitly prioritise
be applicable to any strongly typed language that supports                          hotspots. However, it could be useful as the budget to address
packages, modules and inheritance.                                                  technical debt (e.g. architecture smells) is usually limited and
   Even though before applying the method we found a                                decisions need to be made as to which issues should be
strong correlation between hotspot density and the number                           addressed. A prioritisation could be used to suggest fixing
of interface lines of code in a component, we did not find                          those hotspots first, which exhibit a balance between effort
a causal link between removing hotspots and a decreased                             needed to fix them and the impact on the maintainability.
value of the component interface code as measured by the                               Based on a preliminary evaluation we conducted through
Software Analysis Toolkit. However, Mo et al. did show that                         interviews with a panel of experts and the analysis of open
the presence of hotspots indicates areas of code which are                          source repositories we can say that users see the compli-
especially prone to introducing bugs, therefore, even if the                        mentary information as a promising starting point for further
removal of hotspots will not be reflected in the measurement,                       investigations, but will need additional work to make the
it would still improve the maintainability of the system [5].                       recommendations actionable.




                                                                                4
                             R EFERENCES                                                 formation and Communications Technology (QUATIC 2007)(QUATIC),
                                                                                         volume 00, pages 30–39, 09 2007.
 [1] Robert L Glass. Facts and fallacies of software engineering. Addison-
     Wesley Professional, 2002.                                                     [12] Eric Bouwers, Arie van Deursen, and Joost Visser. Quantifying the
 [2] Ilja Heitlager, Tobias Kuipers, and Joost Visser. A practical model for             encapsulation of implemented software architectures. In Software Main-
     measuring maintainability. In Quality of Information and Communica-                 tenance and Evolution (ICSME), 2014 IEEE International Conference
     tions Technology, 2007. QUATIC 2007. 6th International Conference on                on, pages 211–220. IEEE, 2014.
     the, pages 30–39. IEEE, 2007.                                                  [13] Martin Fowler and Kent Beck. Refactoring: improving the design of
 [3] Joost Visser, Sylvan Rigal, Rob van der Leek, Pascal van Eck, and                   existing code. Addison-Wesley Professional, 1999.
     Gijs Wijnholds. Building Maintainable Software, Java Edition: Ten
                                                                                    [14] Tushar Sharma. Does your architecture smell?, 2017. Last accessed:
     Guidelines for Future-Proof Code. " O’Reilly Media, Inc.", 2016.
                                                                                         2018-06-03.
 [4] Teodor Kurtev. Extending actionability in better code hub, suggesting
     move module refactorings. Master’s thesis, University of Amsterdam,            [15] Tushar Sharma. Designite: A customizable tool for smell mining in c#
     July 2017.                                                                          repositories. In 10th Seminar on Advanced Techniques and Tools for
 [5] Ran Mo, Yuanfang Cai, Rick Kazman, and Lu Xiao. Hotspot patterns:                   Software Evolution, Madrid, Spain, 2017.
     The formal definition and automatic detection of architecture smells. In       [16] Francesca Arcelli Fontana, Ilaria Pigazzini, Riccardo Roveda, and Marco
     Software Architecture (WICSA), 2015 12th Working IEEE/IFIP Confer-                  Zanoni. Automatic detection of instability architectural smells. In
     ence on, pages 51–60. IEEE, 2015.                                                   Software Maintenance and Evolution (ICSME), 2016 IEEE International
 [6] Flemming Nielson, Hanne R Nielson, and Chris Hankin. Principles of                  Conference on, pages 433–437. IEEE, 2016.
     program analysis. Springer, 2015.
                                                                                    [17] Isela Macia, Roberta Arcoverde, Elder Cirilo, Alessandro Garcia, and
 [7] Daniel Guaman, PA Sarmiento, L Barba-Guamán, P Cabrera, and
                                                                                         Arndt von Staa. Supporting the identification of architecturally-relevant
     L Enciso. Sonarqube as a tool to identify software metrics and technical
                                                                                         code anomalies. In Software Maintenance (ICSM), 2012 28th IEEE
     debt in the source code through static analysis. In 7th International
                                                                                         International Conference on, pages 662–665. IEEE, 2012.
     Workshop on Computer Science and Engineering, WCSE, pages 171–
     175, 2017.                                                                     [18] Robert C Martin. Clean architecture: a craftsman’s guide to software
 [8] Nicholas Nethercote and Julian Seward. Valgrind: a framework for                    structure and design. Prentice Hall Press, 2017.
     heavyweight dynamic binary instrumentation. In ACM Sigplan notices,            [19] Joshua Garcia, Daniel Popescu, George Edwards, and Nenad Medvi-
     volume 42, pages 89–100. ACM, 2007.                                                 dovic. Toward a catalogue of architectural bad smells. In International
 [9] Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and                     Conference on the Quality of Software Architectures, pages 146–162.
     Dmitriy Vyukov. Addresssanitizer: A fast address sanity checker. In                 Springer, 2009.
     USENIX Annual Technical Conference, pages 309–318, 2012.
[10] Michael D Ernst, Jeff H Perkins, Philip J Guo, Stephen McCamant,               [20] Lu Xiao, Yuanfang Cai, and Rick Kazman. Titan: A toolset that connects
     Carlos Pacheco, Matthew S Tschantz, and Chen Xiao. The daikon                       software architecture with quality analysis. In Proceedings of the 22Nd
     system for dynamic detection of likely invariants. Science of Computer              ACM SIGSOFT International Symposium on Foundations of Software
     Programming, 69(1-3):35–45, 2007.                                                   Engineering, pages 763–766. ACM, 2014.
[11] T. Kuipers, I. Heitlager, and J. Visser. A practical model for measuring       [21] Rensis Likert. A technique for the measurement of attitudes. Archives
     maintainability. In 6th International Conference on the Quality of In-         of psychology, 1932.




                                                                                5