How Should We Measure the Relationship Between Code Quality and Software Sustainability? Aseel Aldabjan Robert Haines Caroline Jay School of Computer Science School of Computer Science School of Computer Science University of Manchester, UK University of Manchester, UK University of Manchester, UK caroline.jay@manchester.ac.uk Abstract—Software sustainability has been proposed as a non- II. A METRIC FOR SOFTWARE SUSTAINMENT functional requirement of a codebase. The aim of sustainable research software development is to produce reliable code that We take an empirical approach to understanding what supports reproducible results, and can be reused in future makes a project sustainable, by examining how aspects of projects. At present, research software is often not developed a project—in this case its code quality—vary as a function in a sustainable manner, partly due to the funding environment of its sustainment, or active life. Our definition of software within which it exists, but also because there is no concrete metric sustainment is the time period from the initial creation of the with which to measure software sustainability, nor any concrete guidance on how to achieve it. We propose that empirical studies software in a repository—the first commit—through to the last determining the relationship between measurable aspects of a commit in the original repository (see equation 1): project, and its active life—a period we define using a metric of software sustainment—are a strong means of understanding S = tlast_commit − tinitial_commit (1) how requirements encapsulating software sustainability should ultimately be defined. Here, we report the results of a sustainment where S is our software sustainment metric, measured in analysis of projects in GitHub, and describe the opportunities and days. This measured difference reflects the period over which challenges of understanding the relationship between sustainment and code quality. the project is actively maintained or developed. We calculate S for the default branch of the repository, I. I NTRODUCTION as indicated in the meta-data we mine from GitHub, as we Sustainable research software is defined as software that recognise that not all repositories use the ‘master’ branch as can be reused, in whole or in part, in future projects1 . It is a their default. We are only considering the default branch of truism that good coding practices will lead to more sustainable the original project when calculating our sustainment metric software, a view which is supported by research software engi- as simply picking the most recently updated branch, or fork, neers [1], and it has been proposed that sustainability should be has a high chance of containing incomplete, untested and considered as a requirement of a project [2, 3]. However, there non-working versions of the code. Nevertheless, there is an is currently no concrete definition of software sustainability, argument that if the code lives on in subsequent forks it has nor any concrete guidance on how to achieve it [4]. This been sustained, even if the original project has not, so we will research aims to contribute to an understanding of what makes consider forks—and how they relate to the sustainability of software sustainable, by measuring the relationship between both the project and the code—in a future study. characteristics of a project’s codebase and the project’s active life, which we term software sustainment. III. T HE SUSTAINMENT OF JAVA PROJECTS IN G IT H UB We collect data for this study from GitHub, selecting a Projects were mined from GitHub according to the follow- subset of repositories based on their start date and the language ing criteria: they were created between 1st January and 31st they are written in. As GitHub repositories include a number December 2009; they had at least one commit; the first commit of software engineering artefacts—such as an issue tracker, occurred after 1st January 2009; they were written in Java. documentation wiki, web pages and collaboration data—we Projects were retrieved on 26th July 2016, so S was calculated consider a repository as a proxy for a software project2 . for each project at that point in time. In Section II we define our software sustainment metric, Figure 1 shows the distribution of projects as a function and in Section III we report on the distribution of open source of S, in days. Of 3113 projects in total, 22% (682) had an S Java projects according to this metric. Section IV describes value of 0, and 35% (1076) had an S value < 7, indicating that the metrics we will use to measure code quality, and finally over a third of the projects were sustained for only a week. A Section V discusses the challenges of linking code quality to cursory inspection reveals some of these projects to be quite sustainment, to ultimately determine the characteristics of soft- large, so it is likely that in these cases the development period ware that are key to fulfilling the requirement of sustainability. was longer than the calculated sustainment metric, and that 1 http://software.ac.uk/ the project was only put into Git version control some time 2 Data and analysis code are available here: https://github.com/hainesr/ after its real start date. After the steep drop off at around seven sustainment-analysis days, the curve gradually flattens over time. This work is licenced under a Creative Commons Attribution-ShareAlike 4.0 International License. the ILCOM metric indicates a lack of methods in the class, 2000 while a value of one represents a high level of cohesion. A value greater than one indicates cohesion is low, and the class may benefit from being divided into separate classes. 1500 Lack of Documentation (LOD) was chosen as an interesting metric that considers comments in the code, with at least one Number of projects comment per method and one per class as a minimum target. 1000 Comments often make the purpose of methods and classes clearer, increasing maintainability and facilitating the reuse of the code. Comments in Java code can also be used to 500 automatically build API documentation for a project, so one might expect well maintained code to include at least one comment per method and per class for this purpose. A caveat 0 is that the content of the comments is not considered. 0 450 900 1350 1800 2250 2700 Sustainment (days) V. L INKING CODE QUALITY TO SUSTAINABILITY Fig. 1. Projects in GitHub as a function of their software sustainment in days. We have proposed a simple metric with which to measure sustainment, and suggested a static analytic approach to quan- tifying code quality. Although these measures provide values IV. C ODE QUALITY METRICS that can be compared quantitatively, there remain considerable We hypothesize that the following static analytic metrics [5] challenges in determining the relationships between them: are related to sustainability: • Refining the sustainment metric. At present S only ac- Lines of Code (LOC) is an indication of class size, where counts for the time that activity on the project occurs in a higher value means longer and potentially more complex GitHub. Should we filter projects further, to ensure S is code. It is advisable to treat this metric in relative, rather than a true representation of the lifetime of the project? absolute terms, as lines of code may vary with programming • Reconciling units of assessment. How should we com- language, or the individual style of a programmer. pare class-level software quality metrics with a project- Number of Local Methods (NOM), an indicator of interface level sustainment metric? For example, should we use complexity, measures the number of methods locally declared median/mean values as input to the analysis, or look at in a class. As the interface grows, the class becomes more the proportion of classes that meet a certain criteria? complex, and more difficult to test. The optimum value for • Determining the appropriate point for assessment. Is the this metric is considered to be between 3 and 7. If there are final/current state of the code enough to draw conclu- fewer than 3, the class might simply be a data holder; if there sions? If not, should we combine metrics from various are more than 7, the class might be in need of decomposition. points in the project’s history or monitor changes? How Depth of Inheritance Tree (DIT) calculates the complexity do we select those points in the project history? of a software entity based on the distance between a node and • Determining the appropriate statistical procedures for its root down the inheritance tree. As the code goes down the assessment. Several of the code quality metrics are non- inheritance tree, testing becomes more difficult as the control linear in terms of their optimal values, so a simple flow becomes more complicated. A value between 0 and 4 is correlation may not be the best way to assess their generally considered to indicate an adequate balance between relationship with sustainability. What is the best approach complexity and the use of inheritance. to take in these cases? Coupling Between Objects (CBO) calculates the complexity R EFERENCES of a class through its dependencies: a class is considered well designed when it is loosely coupled. Classes with a large [1] M. R. de Souza et al., “Defining Sustainability through number of dependencies are more difficult to maintain and test. Developers’ Eyes: Recommendations from an Interview The reusability of classes is limited by high levels of coupling Study,” in WSSSPE 2, 2014. because if a class depends on other classes, it is difficult to [2] C. Venters et al., “The blind men and the elephant: reuse it in another system. A value of CBO greater than 4 is Towards an empirical evaluation framework for software generally considered undesirable because it indicates a high sustainability,” JORS, vol. 2, no. 1, 2014. number of dependencies. [3] R. Chitchyan et al., “Sustainability design in requirements Improvement of Lack of Cohesion in Methods (ILCOM) engineering,” in ICSE 2016, 2016. provides a measure of class cohesion, by calculating the [4] C. Venters et al., “Software sustainability: The modern number of connected components in a class. High cohesion Tower of Babel.” in RE4SuSy, 2014. is a desirable characteristic within a class in object oriented [5] R. Lincke and W. Löwe, “Compendium of Software languages, as it is usually harder to test classes that do not Quality Standards and Metrics,” 2005. [Online]. Available: have cohesion between their components. A value of zero in http://www.arisa.se/compendium/ 2