How Should We Measure the Relationship Between
   Code Quality and Software Sustainability?
                    Aseel Aldabjan                                Robert Haines                               Caroline Jay
           School of Computer Science                    School of Computer Science                   School of Computer Science
           University of Manchester, UK                  University of Manchester, UK                University of Manchester, UK
                                                                                                     caroline.jay@manchester.ac.uk


   Abstract—Software sustainability has been proposed as a non-                      II. A METRIC FOR SOFTWARE SUSTAINMENT
functional requirement of a codebase. The aim of sustainable
research software development is to produce reliable code that                 We take an empirical approach to understanding what
supports reproducible results, and can be reused in future                   makes a project sustainable, by examining how aspects of
projects. At present, research software is often not developed               a project—in this case its code quality—vary as a function
in a sustainable manner, partly due to the funding environment               of its sustainment, or active life. Our definition of software
within which it exists, but also because there is no concrete metric         sustainment is the time period from the initial creation of the
with which to measure software sustainability, nor any concrete
guidance on how to achieve it. We propose that empirical studies             software in a repository—the first commit—through to the last
determining the relationship between measurable aspects of a                 commit in the original repository (see equation 1):
project, and its active life—a period we define using a metric
of software sustainment—are a strong means of understanding                                S = tlast_commit − tinitial_commit               (1)
how requirements encapsulating software sustainability should
ultimately be defined. Here, we report the results of a sustainment             where S is our software sustainment metric, measured in
analysis of projects in GitHub, and describe the opportunities and           days. This measured difference reflects the period over which
challenges of understanding the relationship between sustainment
and code quality.                                                            the project is actively maintained or developed.
                                                                                We calculate S for the default branch of the repository,
                       I. I NTRODUCTION                                      as indicated in the meta-data we mine from GitHub, as we
   Sustainable research software is defined as software that                 recognise that not all repositories use the ‘master’ branch as
can be reused, in whole or in part, in future projects1 . It is a            their default. We are only considering the default branch of
truism that good coding practices will lead to more sustainable              the original project when calculating our sustainment metric
software, a view which is supported by research software engi-               as simply picking the most recently updated branch, or fork,
neers [1], and it has been proposed that sustainability should be            has a high chance of containing incomplete, untested and
considered as a requirement of a project [2, 3]. However, there              non-working versions of the code. Nevertheless, there is an
is currently no concrete definition of software sustainability,              argument that if the code lives on in subsequent forks it has
nor any concrete guidance on how to achieve it [4]. This                     been sustained, even if the original project has not, so we will
research aims to contribute to an understanding of what makes                consider forks—and how they relate to the sustainability of
software sustainable, by measuring the relationship between                  both the project and the code—in a future study.
characteristics of a project’s codebase and the project’s active
life, which we term software sustainment.                                      III. T HE SUSTAINMENT OF JAVA PROJECTS IN G IT H UB
   We collect data for this study from GitHub, selecting a                      Projects were mined from GitHub according to the follow-
subset of repositories based on their start date and the language            ing criteria: they were created between 1st January and 31st
they are written in. As GitHub repositories include a number                 December 2009; they had at least one commit; the first commit
of software engineering artefacts—such as an issue tracker,                  occurred after 1st January 2009; they were written in Java.
documentation wiki, web pages and collaboration data—we                      Projects were retrieved on 26th July 2016, so S was calculated
consider a repository as a proxy for a software project2 .                   for each project at that point in time.
   In Section II we define our software sustainment metric,                     Figure 1 shows the distribution of projects as a function
and in Section III we report on the distribution of open source              of S, in days. Of 3113 projects in total, 22% (682) had an S
Java projects according to this metric. Section IV describes                 value of 0, and 35% (1076) had an S value < 7, indicating that
the metrics we will use to measure code quality, and finally                 over a third of the projects were sustained for only a week. A
Section V discusses the challenges of linking code quality to                cursory inspection reveals some of these projects to be quite
sustainment, to ultimately determine the characteristics of soft-            large, so it is likely that in these cases the development period
ware that are key to fulfilling the requirement of sustainability.           was longer than the calculated sustainment metric, and that
  1 http://software.ac.uk/                                                   the project was only put into Git version control some time
  2 Data and analysis code are available here: https://github.com/hainesr/   after its real start date. After the steep drop off at around seven
sustainment-analysis                                                         days, the curve gradually flattens over time.


                        This work is licenced under a Creative Commons Attribution-ShareAlike 4.0 International License.
                                                                                           the ILCOM metric indicates a lack of methods in the class,
                        2000                                                               while a value of one represents a high level of cohesion. A
                                                                                           value greater than one indicates cohesion is low, and the class
                                                                                           may benefit from being divided into separate classes.
                        1500                                                                  Lack of Documentation (LOD) was chosen as an interesting
                                                                                           metric that considers comments in the code, with at least one
   Number of projects


                                                                                           comment per method and one per class as a minimum target.
                        1000                                                               Comments often make the purpose of methods and classes
                                                                                           clearer, increasing maintainability and facilitating the reuse
                                                                                           of the code. Comments in Java code can also be used to
                         500                                                               automatically build API documentation for a project, so one
                                                                                           might expect well maintained code to include at least one
                                                                                           comment per method and per class for this purpose. A caveat
                           0                                                               is that the content of the comments is not considered.
                               0   450   900          1350        1800   2250   2700
                                               Sustainment (days)
                                                                                                V. L INKING CODE QUALITY TO SUSTAINABILITY
Fig. 1. Projects in GitHub as a function of their software sustainment in days.               We have proposed a simple metric with which to measure
                                                                                           sustainment, and suggested a static analytic approach to quan-
                                                                                           tifying code quality. Although these measures provide values
                                   IV. C ODE QUALITY METRICS                               that can be compared quantitatively, there remain considerable
   We hypothesize that the following static analytic metrics [5]                           challenges in determining the relationships between them:
are related to sustainability:                                                                • Refining the sustainment metric. At present S only ac-
   Lines of Code (LOC) is an indication of class size, where                                     counts for the time that activity on the project occurs in
a higher value means longer and potentially more complex                                         GitHub. Should we filter projects further, to ensure S is
code. It is advisable to treat this metric in relative, rather than                              a true representation of the lifetime of the project?
absolute terms, as lines of code may vary with programming                                    • Reconciling units of assessment. How should we com-
language, or the individual style of a programmer.                                               pare class-level software quality metrics with a project-
   Number of Local Methods (NOM), an indicator of interface                                      level sustainment metric? For example, should we use
complexity, measures the number of methods locally declared                                      median/mean values as input to the analysis, or look at
in a class. As the interface grows, the class becomes more                                       the proportion of classes that meet a certain criteria?
complex, and more difficult to test. The optimum value for                                    • Determining the appropriate point for assessment. Is the
this metric is considered to be between 3 and 7. If there are                                    final/current state of the code enough to draw conclu-
fewer than 3, the class might simply be a data holder; if there                                  sions? If not, should we combine metrics from various
are more than 7, the class might be in need of decomposition.                                    points in the project’s history or monitor changes? How
   Depth of Inheritance Tree (DIT) calculates the complexity                                     do we select those points in the project history?
of a software entity based on the distance between a node and                                 • Determining the appropriate statistical procedures for
its root down the inheritance tree. As the code goes down the                                    assessment. Several of the code quality metrics are non-
inheritance tree, testing becomes more difficult as the control                                  linear in terms of their optimal values, so a simple
flow becomes more complicated. A value between 0 and 4 is                                        correlation may not be the best way to assess their
generally considered to indicate an adequate balance between                                     relationship with sustainability. What is the best approach
complexity and the use of inheritance.                                                           to take in these cases?
   Coupling Between Objects (CBO) calculates the complexity
                                                                                                                    R EFERENCES
of a class through its dependencies: a class is considered well
designed when it is loosely coupled. Classes with a large                                  [1] M. R. de Souza et al., “Defining Sustainability through
number of dependencies are more difficult to maintain and test.                                Developers’ Eyes: Recommendations from an Interview
The reusability of classes is limited by high levels of coupling                               Study,” in WSSSPE 2, 2014.
because if a class depends on other classes, it is difficult to                            [2] C. Venters et al., “The blind men and the elephant:
reuse it in another system. A value of CBO greater than 4 is                                   Towards an empirical evaluation framework for software
generally considered undesirable because it indicates a high                                   sustainability,” JORS, vol. 2, no. 1, 2014.
number of dependencies.                                                                    [3] R. Chitchyan et al., “Sustainability design in requirements
   Improvement of Lack of Cohesion in Methods (ILCOM)                                          engineering,” in ICSE 2016, 2016.
provides a measure of class cohesion, by calculating the                                   [4] C. Venters et al., “Software sustainability: The modern
number of connected components in a class. High cohesion                                       Tower of Babel.” in RE4SuSy, 2014.
is a desirable characteristic within a class in object oriented                            [5] R. Lincke and W. Löwe, “Compendium of Software
languages, as it is usually harder to test classes that do not                                 Quality Standards and Metrics,” 2005. [Online]. Available:
have cohesion between their components. A value of zero in                                     http://www.arisa.se/compendium/


                                                                                       2