<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Accessed:</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>The Relation between Software Maintainability and Issue Resolution Time: A Replication Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>The Netherlands d.bijlsma@sig.eu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ana-Maria Oprescu University of Amsterdam Amsterdam</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Cuiting Chen Software Improvement Group Amsterdam</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Joren Wijnmaalen University of Amsterdam Amsterdam</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>201</volume>
      <fpage>8</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>Higher software maintainability comes with certain bene ts. For example, software can be updated more easily to embrace new features or to x bugs. Previous research has shown that there is a positive correlation between the maintainability score measured by the SIG maintainability model and shorter issue resolution time. This study, however, dates back to 2010. Eight years later, the software industry has evolved with a fast pace, as well as the SIG maintainability model. We would like to rerun the experiment to test if the previously found relations are still valid.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>When remeasuring the maintainability of the
systems with the new version of the SIG
maintainability model (2018), we nd that
majority of the systems score lower maintainability
ratings. The overall maintainability
correlation with defect resolution time decreased
signi cantly while the original system properties
correlate similar with defect resolution time
compared to the original study.</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>The de nition for software quality has been
standardized by the International Organization for
Standardization (ISO) since 2001 in their document ISO 9126
[ISO11b]. Since then, the de nition has undergone a
variety of changes as it has been revised into the ISO
25010 in 2011 [ISO11a]. The standard decomposes
software quality into a set of characteristics. Software
maintainability is one of such characteristics.</p>
      <p>Research has shown the importance of high software
maintainability. Bakota et al. found an exponential
relationship between maintainability and cost [BHL+12].
Bijlsma and Luijten showed a strong positive
correlation between software maintainability and issue
resolution time [BFLV12]. Maintenance activities largely
involve solving issues that arise during development
or when the product is in-use. A better maintainable
code base decreases the amount of time needed to
resolve such issues.</p>
      <p>However, the study by Bijlsma and Luijten dates
back to 2012. To assess the maintainability of
systems, they made use of the maintainability model
developed by the Software Improvement Group (SIG),
dating back to 2010. This model refers to ISO 9126
for their de nition of software quality, more speci
cally, maintainability. Over the years the SIG
maintainability model has been evolving (a new model has
been announced in 2018 [sig]), implementing a
variety of smaller changes along the new software quality
de nition as documented in ISO 25010. Furthermore,
the software industry has been evolving at a fast pace.
The oldest systems Bijlsma and Luijten assessed for
their empirical results date back to the beginning of
the 2000s. A lot has changed in the software
industry since then. Both the landscape of technologies has
changed, as well as the processes around software. For
example, DevOps has emerged since the mid 2010s,
introducing concepts such as continuous integration and
delivery. These concepts, potentially, change the way
how issues are being resolved as integration is being
largely automated instead of being a manual action.</p>
      <p>Around the broader question: "What is the relation
between software maintaina bility and issue resolution
time?", we propose the following research question:
RQ1.1 Does the previously found strong
correlation between maintainability and issue resolution
time still hold given the latest (2018) SIG
maintainability model?
2
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Background</title>
      <sec id="sec-3-1">
        <title>The SIG Maintainability Model</title>
        <p>The ISO 25010 standard de nes Software Quality
through a range of quality characteristics. Each of
these characteristics is further subdivided into a set
of sub-characteristics. Software maintainability is one
of such characteristics and is further subdivided into
the following sub-characteristics: analyzability,
modi ability, testability, modularity and reusability. The
standard, however, does not provide how to directly
measure the various quality characteristics and
subcharacteristics. Instead, the Software Improvement
Group (SIG) provides a pragmatic model to directly
assess maintainability through the static analysis of
source code [HKV07]. The SIG maintainability model
lists a set of source code metrics, also called
software product properties. The following software
product properties are measured: volume, duplication,
unit size, unit complexity, unit interfacing, module
coupling, component balance and component
independence. These product properties are then mapped to
the sub-characteristics as de ned in the ISO 25010
standard. These mappings, what product properties
in uence what characteristics, are based on expert
opinion. Table 1 illustrates these mappings.</p>
        <p>To calculate the maintainability rating of a
system, the model rst measures the product
properties. These raw measures are converted to a star
based rating based on a benchmark internal to SIG
(1 to 5 stars, where 3 stars is the market average).
Note, the stars do not divide the distribution of
systems into even buckets, instead 5% of systems are
assigned one star, 30% two, 30% three, 30% four and 5%
ve stars. Secondly, the product property ratings are
aggregated into the maintainability sub-characteristic
ratings based on the relations as de ned in Table 1.
Finally, the sub-characteristics ratings are all aggregated
into a single nal maintainability rating.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Evolution of the SIG Maintainability Model</title>
        <p>Both Bijlsma and Luijten assessed the
maintainability characteristic of software quality as described by
the ISO 9126 standard using the SIG maintainability
model. Since the ISO 9126 standard has been revised
into the ISO 25010 standard, the SIG maintainability
model has evolved accordingly, as is part of the
motivation for this replication study. In order to reason about
the results of this replication study, the di erences
between the 'modern' SIG maintainability model
(hereinafter referred to as the new model) and the model
used by Bijlsma and Luijten (hereinafter referred to as
the old model) need to be highlighted.</p>
        <p>Compared to ISO 9126, ISO 25010 adds the
subcharacteristic modularity to maintainability.
Modularity is de ned as "The degree to which a
system or computer program is composed of discrete
components such that a change to one component
has minimal impact on other components." [ISO11a].
In order to account for this new sub-characteristic,
two new system properties were introduced in the
new model: component balance and component
independence. Apart from accounting for the new
subcharacteristic, these properties were expected to
stimulate discussions about the architecture of systems and
to incorporate a common viewpoint in the assessment
of implemented architectures, as mentioned by
Bouwers et al. in their evaluation of the SIG maintainability
model metrics [BvDV13].</p>
        <p>Introduction of these properties also raises
questions on the de nition for a component. Visser de nes
the term component as the following in his technical
report: "A component is a subdivision of a system in
which source code modules are grouped together based
on a common trait. Often components consist of
modules grouped together based on a shared technical or
functional aspect" [Vis18]. In practice, this de nition
still deems too vague. It introduces the need for an
external evaluator to point out the core components
of any speci c system, based on their perception on
how functionality is grouped and it's granularity.
2.2</p>
      </sec>
      <sec id="sec-3-3">
        <title>Issue Resolution Time</title>
        <p>Both Luijten and Bijlsma look at issue resolution time
in their studies. Bijlsma de nes issue resolution time
as "the total time an issue is in open state. [...]
Resolution time is not simply the time between the issue
being reported and the issue being resolved." [BFLV12]
Instead, Bijlsma illustrates the life cycle of an issue
using Figure 1. Bijlsma measured the highlighted
period of time in the Figure for his study. Even though
it would seem better to start measuring when the
status of an issue is set to assigned, this was realistically
not a possibility for the data Bijlsma obtained. Many
projects Bijlsma analyzed were inconsistent in using
the assigned property in their Issue Tracking Systems
(ITS), making it impossible to accurately determine
when a developer started working on an issue.</p>
        <p>Next to the issue resolution time life cycle, there
is also the notion of issue types. Various issue
tracking systems use di erent terms to denote the variety
in issues. Bijlsma de ned the following types: defect,
enhancement, patch and task. A defect, according to
Bijlsma, is a "problem in the system" [BFLV12]. An
enhancement can be "the addition of a new feature,
or an improvement of an existing feature". Tasks and
patches are "usually one time activities" and unify
various other issue types with a range of urgencies. The
tools Bijlsma and Luijten used in their experiment
normalized all issues obtained from the various ITS's
towards these four types only. Luijten originally focussed
on issues of type defect, where Bijlsma expanded with
issues of type enhancement.
2.3
Given the de nition for the issue resolution time
metric, both Luijten and Bijlsma collected measurements
from various projects. In order to compare these
resolution times on a project level, the resolution times
per issue need to be aggregated. Intuitively, statistical
properties such as the mean come to mind. However,
as Luijten points out, the resolution times collected
are not normally distributed [Lui10]. Therefore,
Luijten created various risk categories, such that the issue
resolution distribution is divided into buckets,
choosing thresholds such that the buckets are lled equally.
Table 2 illustrates the risk categories with their
thresholds as de ned by Luijten for issues of type defect.
Similar thresholds are de ned for issues of type
enhancement.</p>
        <p>Based on these categories Luijten continues to
dene quality ratings, to further align with the rating
system the SIG maintainability model implements. Table
3 shows the mapping between risk categories and
quality ratings. The thresholds are chosen such that 5% of
the systems will receive a 5-star rating, 30% four stars,
30% three, 30% two, and 5% one star (the same
distribution as the SIG maintainability model uses, Section
2.1). For measurement purposes, these star ratings
are interpolated between the interval [0.5, 5.5], as is
standard in the SIG maintainability model as well.
times still hold using the new SIG maintainability
model to assess maintainability. Therefore, the same
systems and issue tracking data will be used as in
Bijlsma's experiment. Bijlsma assessed 10 open source
systems looking at multiple snapshots situated in
various points of time of the systems lifespan. Given the
de nition for issue resolution time, and the concept of
issue resolution time quality ratings, as described in
Section 2.3, ratings are calculated per snapshot. For
each snapshot, Bijlsma and Luijten consider all issues
that are closed and/or resolved between that
snapshot and the next as relevant for that snapshot [Lui10].
These ratings are directly re-used in this experiment
as calculated by Bijlsma, as they were archived and
directly ready for use.</p>
        <p>Table 4 shows the systems assessed by Bijlsma.
Since the original snapshots Bijlsma used in his
study were not archived, the snapshots had to be
reobtained. For every snapshot Bijlsma listed a version
and a date. Using this data we were able to retrieve
all snapshots by using the following two methods for
retrieval:</p>
        <p>O cial System Archives Some systems
maintain an o cial archive. Snapshots matching date
and version number are directly retrieved from
these archives. For a small amount of snapshots
the date Bijlsma listed deviate a couple of days
from the date coupled with the version number in
the archive. These snapshots were still retrieved
as it was assumed that deviation in dates was
caused by human error.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Version Control Systems (VCS) If a systems</title>
        <p>organization stopped hosting, or does not have
an archive containing older versions, the snapshot
was retrieved by traversing the system's
respective VCS. The majority of the systems assessed
by Bijlsma make use of Subversion as their main
VCS. Subversion, by default, contains the root
folders '/trunk', '/branches' and '/tags'. Where
trunk is the directory where the main
development takes place and branches contain the
features that parallel main development, the tags
directory is speci cally interesting as it contains
read only copies of the source code in a speci ed
set of time.</p>
        <p>Note that the Subversion repositories default to
the trunk, branches and tags directories but are
not limited to this structure. For example,
Webkit adds the '/releases' directory to the
repositories root, containing all major releases (where tags
are of a ner granularity). In this case, both the
tags and releases are navigated to nd the right
snapshot.</p>
        <p>In a small amount of cases the matching
snapshots were not listed in the tags, releases or other
archiving Subversion root directories. In that
case, the Subversion trunk directory was checked
out at the revision closest to the date listed by
Bijlsma.</p>
        <p>For every snapshot new maintainability ratings are
calculated. These ratings are obtained using the SIG
software analysis toolkit (SAT). The SAT implements
the latest (2018) version of the SIG maintainability
model. It provides the nal maintainability rating,
along with its sub-characteristics as described by the
ISO 25010 standard.
3.1</p>
      </sec>
      <sec id="sec-3-5">
        <title>Data Acquisition</title>
        <p>In order to replicate Bijlsma's original study, the same
data is needed. We assume, since the snapshots were
retrieved from the o cial system archives or version
control systems, that the contents of the snapshots
retrieved are the same. The only other metric provided
by Bijlsma to verify this assumption is the size of the
snapshot in LOC. Figure 2 compares the snapshot sizes
as found by Bijlsma against the snapshot sizes found
by us. It can be immediately seen that the blue and
red graphs do not align. This behaviour is expected,
however, as these numbers are retrieved after the SAT
analysis.</p>
        <p>The SAT requires a scoping de nition per system
in order to function. Scoping is the process of
determining what parts and les of the system should be
included in the calculation. For example, source les
in a '/libs/' folder should not be included in the
calculation as they are external dependencies and are not
maintained by the development team directly. Ideally,
the scoping per system should be exactly the same
as Bijlsmas original scoping when rerunning the SAT.
However, scoping les were not documented in
Bijlsmas study, so we had to do our own scoping. Luckily,
Bijlsma provided feedback in person to check if the
scoping les were roughly the same.</p>
        <p>Given the scoping di erences, a slight deviation in
SAT considered lines of code for the calculation can be
explained. We do expect, however, that the newly
acquired snapshots follow the same trend line as the old
snapshots. For the majority of the systems listed in
Figure 2 this is the case (e.g. abiword, ant, argouml,
checkstyle), but some other systems stand out.
Webkit, for example, only has a third of the original size
(in KLOC). The cause for this large di erence remains
unknown to us, as running the SAT with all junk les
included doesn't come near the originally reported size
numbers. As such, the newly measured Webkit
maintainability numbers can not be compared against the
old, and will impact the correlation results.
System
To summarize, some elements of the original study's
method have been kept exactly the same while other
elements have been changed.</p>
      </sec>
      <sec id="sec-3-6">
        <title>Di erences</title>
        <p>Since the original snapshots were not archived, the
snapshots had to be reacquired. This results in small
data inconsistencies for most systems and for large
inconcistencies for one systems in particular (Webkit).
Additionally, as is the purpose of this study, the new
SIG maintainability model is used to measure
maintainability as opposed to the original SIG
maintainability model dating from 2010.</p>
      </sec>
      <sec id="sec-3-7">
        <title>Equalities</title>
        <p>Given the concept of issue resolution time and
issue resolution time quality ratings, these ratings for
all snapshots and systems have been re-used directly.
The respective issue tracking systems were not mined
again. Furthermore, the correlations are calculated in
the same manner.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <sec id="sec-4-1">
        <title>Comparing Maintainability</title>
        <p>Maintainability ratings are directly compared between
the old and the new model to gain a better
understanding to how the SIG maintainability model evolved. It
also serves as a validation step to see if the new
maintainability ratings are reasonable within expectation
compared to the old ones. Figure 3 gives a high level
overview on how the maintainability ratings per
system (distributed over all snapshots) compare between
the old and the new SIG maintainability model. We
observe for the majority of the systems (ant through
tomcat) new maintainability ratings are lower
compared the the old model. This behaviour is expected
because the SAT uses benchmark data to determine
the thresholds of the rating buckets and has been rising
over the years (See Section 5 for further elaboration).
Note that, even though plotted, maintainability
cannot be compared for webkit as the reacquired snapshot
has over 500 KLOC less than the original.</p>
        <p>The maintainability rating calculated by the SIG
maintainability model is composed by a double
aggregation. Figure 3 adds another aggregation on top of
this, combining multiple maintainability ratings per
system into a single boxplot. The gure provides a
high level overview, but in order to discover the other
factors that cause the deviation in maintainability
ratings we need to zoom in on the low level metrics.
Figure 4 illustrates all unit complexity ratings of all
systems ordered by snapshot date. The gure shows how
in general the new ratings follow the same trend as
the old ratings, but on a slightly lower rating
altogether. Speci cally the systems ant, jedit, tomcat and
springframework show this behaviour well. The lower
rating can again be explained by the rising benchmark
thresholds. Webkit consistently rates higher for all
system metrics, but are insigni cant due to the large
deviation in reacquired snapshot sizes. ArgoUML
consistently shows lower ratings for the original system
properties (without the modularity system properties),
but shows a higher rating in overall maintainability
(Figure 3).
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Comparing Correlations</title>
        <p>Bijlsma and Luijting classi ed four types of issues:
defect, enhancement, patch and task. Bijlsma and
Luijten investigated issues of type defect and
enhancement. Tables 5 and 6 illustrate the new correlations
found for these two types of issues. Every correlation
is tested for signi cance, given the following set of
hypotheses H0f = 0g and HAf &gt; 0g. For the zero
hypothesis to be rejected, a con dence threshold of
5% is used.</p>
        <p>Given the new defect correlations, the correlations
of the original system properties are comparable,
except for module coupling which shows a signi cant
drop from 0.55 to 0.36. The other surprising result
is the large drop of maintainability from 0.64 to 0.33.
The negative correlation of modularity is surprising,
as it goes against our intuition. Intuitively, modular
systems should be easier to modify than systems with
huge, monolithic, components. Further, unit
interfacing has vastly decreased in signi cance, from 0.042
towards 0.640.</p>
        <p>Table 6 shows the same comparison of correlations
as Table 5, but for enhancement resolution speed.
The di erence in maintainability correlations is a lot
smaller compared to the di erence found in the defect
correlations. The modularity correlations also stand
out, since both the coe cients of modularity and
component balance cannot be assumed given their p-values
being larger than 0.05. The decrease in signi cance is
speci cally interesting compared to the defect
correlations in Table 5.
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <sec id="sec-5-1">
        <title>Comparing Maintainability</title>
        <p>One explanation for the lower maintainability is the
calibration of benchmark thresholds to determine the
ratings. Over the years new systems that are
measured are added to the SAT benchmark. The
observation is that the distribution of quality ratings in the
benchmark, over time, shifts towards a higher
average. In order to compensate for this phenomenon, SIG
calibrates the thresholds for all ratings (both
systemproperties and characteristics) yearly. This means
that the thresholds (for most of the characteristics)
have become stricter. This is also documented in the
'Guidance for Producers' documents, which SIG
releases yearly. For example, in their 2018 document
it is mentioned for unit complexity that "To be
eligible for certi cation at the level of 4 stars, for each
programming language used the percentage of lines of
code residing in units with McCabe complexity
number higher than 5 should not exceed 21.1%" [Vis18]
while their 2017 document states the same but with
a threshold of 24.3% [Vis17]. Remeasuring the same
systems again with the stricter benchmark thresholds
results in overall lower maintainability scores.</p>
        <p>This expected behaviour of lower maintainability
ratings is consistent for eight out of ten systems. The
systems Abiword and Webkit stand out as they both
score higher compared to the original rating.</p>
        <p>Webkit can be considered an outlier. The system is
composed by a single snapshot that consistently scores
higher for all system properties and aggregated
ratings. This may be the result of the re-acquisition of the
snapshot, as the newly obtained snapshot has roughly
500 KLOC less than documented by Bijlsma.</p>
        <p>Abiword, however, does follow the expectations of
lower ratings for system metrics. The overall higher
maintainability score can be speculated by the new
properties introduced in the new model (component
balance and component independence). Speci cally
because the component independence scores for the
Abiword snapshots read 5.23, 5.23 and 2.50 ordered
by date respectively.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Comparing Correlations</title>
        <p>Since the original system properties are similar, it
seems like the added maintainability sub-characteristic
modularity with its system properties component
balance and component independence are the biggest
causing factor for the maintainability correlation to
drop from 0.64 to 0.33. The negative correlation for
modularity and component balance is surprising as it
goes against our intuition. Overall one would assume a
modular program would help defect and enhancement
issue resolution time instead of the opposite. However,
perhaps the results make an argument for the way
modularity is assessed currently. The performance of
component balance, for example, has been debated
before [BvDV13] (speci cally, the discussion around the
optimal number of components and the performance
on smaller systems).
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Threats to Validity</title>
        <p>One of the main threats to validity is the variety in
SAT scoping. In order to get accurate replication
results, ideally, the scoping per system should be exactly
the same as Bijlsma's original scoping when rerunning
the SAT. As a consequence, results obtained may
deviate slightly. However, given that the SIG
maintainability model uses two level aggregation to compute
the nal maintainability score, small deviations in
results should not a ect the nal maintainability score
by a large margin.</p>
        <p>An additional di erence in scoping is the
component depth property, which was introduced when
evolving according the new ISO 25010 standard (as
described in section 2.1). This property needs to be
set to show were the highest level components in the
directory of a system reside. This is needed in order to
calculate the modularity system properties. The
ambiguity of the component de nition requires an external
validator to check for correctness. In our case, given
the age of the systems, no external validator was
approached to check if we de ned the right highest level
components. The component depth property was set
in accordance with our own interpretation of the
system.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In order to answer the research question: What is
the relation between software maintainability and issue
resolution time?, in this paper we provide answers to
the sub-question: "Does the previously found strong
correlation between maintainability and issue
resolution time still hold given the latest (2018) SIG
maintainability model?". The experiment to nd
correlations between maintainability (as assessed by the SIG
maintainability model) and issue resolution time, as
originally de ned and executed by Bijlsma and Luijten
in 2012 [BFLV12] has been replicated. The experiment
was run on the same, reacquired (with small
deviations), snapshots of systems as in the original study
with the new (2018) version of the SIG
maintainability model.</p>
      <p>Many similar correlations are observed between the
2010 and 2018 maintainability ratings versus the
resolution time of defects and enhancements. However,
regarding two new metrics in the 2018 model: (1)
component balance does not correlate as expected, and (2)
component independence correlates only in cases
enhancements are considered.</p>
      <p>Our next steps are to investigate the cause of the
observed di erences and further validate the
underlying data. Additionally we would like to extend the
data set to modern software systems.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Future Work</title>
      <p>The system property component balance and its
associated quality characteristic modularity can be
considered a reason why the overall defect maintainability
correlation is much lower than in the original study.
Future work can expand in this direction,
researching the e ect of modularity on issue resolution time.
Speci cally, does the modularity coe cient look any
di erent when the enhancement results are signi cant?</p>
      <p>Next to expanding in the direction of modularity,
more questions need to be answered in order to fully
show the relation between maintainability and issue
resolution time. Does the previusly found relation still
hold when tested against modern systems?
Furthermore, Bijlsma analyzed mainly Java systems. How
does this extend towards other languages? In this
paper we tested against maintainability as assessed
by the SIG maintainability model. However, in
order to make the concept of maintainability more
generalizable, do the correlations still hold when tested
against other maintainability implementations (e.g.
the maintainability index as proposed by Oman et al.
[CALO94])?
[sig]
[Vis18]</p>
      <sec id="sec-7-1">
        <title>Bart Luijten. Faster defect resolution with</title>
        <p>higher technical quality of software. 2010.</p>
      </sec>
      <sec id="sec-7-2">
        <title>Joost Visser. Sig/tuvit evaluation crite</title>
        <p>ria trusted product maintainability:
Guidance for producers. Software Improvement
Group, Tech. Rep., page 7, 2017.</p>
      </sec>
      <sec id="sec-7-3">
        <title>Joost Visser. Sig/tuvit evaluation crite</title>
        <p>ria trusted product maintainability:
Guidance for producers. Software Improvement
Group, Tech. Rep., page 7, 2018.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [BFLV12]
          <string-name>
            <given-names>Dennis</given-names>
            <surname>Bijlsma</surname>
          </string-name>
          , Miguel Alexandre Ferreira, Bart Luijten, and
          <string-name>
            <given-names>Joost</given-names>
            <surname>Visser</surname>
          </string-name>
          .
          <article-title>Faster issue resolution with higher technical quality of software</article-title>
          .
          <source>Software quality journal</source>
          ,
          <volume>20</volume>
          (
          <issue>2</issue>
          ):
          <volume>265</volume>
          {
          <fpage>285</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [BHL+12]
          <string-name>
            <surname>Tibor</surname>
            <given-names>Bakota</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Hegedus</surname>
          </string-name>
          , Gergely Ladanyi,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Kortvelyesi</surname>
          </string-name>
          , Rudolf Ferenc, and
          <string-name>
            <given-names>Tibor</given-names>
            <surname>Gyimothy</surname>
          </string-name>
          .
          <article-title>A cost model based on software maintainability</article-title>
          .
          <source>In Software Maintenance (ICSM)</source>
          ,
          <year>2012</year>
          28th IEEE International Conference on, pages
          <volume>316</volume>
          {
          <fpage>325</fpage>
          . IEEE,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [BvDV13]
          <string-name>
            <given-names>Eric</given-names>
            <surname>Bouwers</surname>
          </string-name>
          , Arie van Deursen,
          <string-name>
            <given-names>and Joost</given-names>
            <surname>Visser</surname>
          </string-name>
          .
          <article-title>Evaluating usefulness of software metrics: an industrial experience report</article-title>
          .
          <source>In 2013 35th International Conference on Software Engineering (ICSE)</source>
          , pages
          <fpage>921</fpage>
          {
          <fpage>930</fpage>
          . IEEE,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [CALO94]
          <string-name>
            <given-names>Don</given-names>
            <surname>Coleman</surname>
          </string-name>
          , Dan Ash, Bruce Lowther, and
          <string-name>
            <given-names>Paul</given-names>
            <surname>Oman</surname>
          </string-name>
          .
          <article-title>Using metrics to evaluate software system maintainability</article-title>
          .
          <source>Computer</source>
          ,
          <volume>27</volume>
          (
          <issue>8</issue>
          ):
          <volume>44</volume>
          {
          <fpage>49</fpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [HKV07]
          <string-name>
            <given-names>Ilja</given-names>
            <surname>Heitlager</surname>
          </string-name>
          , Tobias Kuipers, and
          <string-name>
            <given-names>Joost</given-names>
            <surname>Visser</surname>
          </string-name>
          .
          <article-title>A practical model for measuring maintainability</article-title>
          . In null, pages
          <volume>30</volume>
          {
          <fpage>39</fpage>
          . IEEE,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [ISO11a] ISO/IEC 25010:
          <year>2011</year>
          ,
          <article-title>Systems and software engineering { Systems and software Quality Requirements and Evaluation (SQuaRE) { System and software quality models</article-title>
          . Standard, International Organization for Standardization, Geneva,
          <string-name>
            <surname>CH</surname>
          </string-name>
          ,
          <year>March 2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [ISO11b] ISO/IEC 25010:
          <year>2011</year>
          ,
          <article-title>Software engineering { Product quality { Part 1: Quality model</article-title>
          . Standard, International Organization for Standardization, Geneva,
          <string-name>
            <surname>CH</surname>
          </string-name>
          ,
          <year>March 2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>[Lui10] [Vis17]</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>