<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Karine Mordal, Nicolas Anquetil, Jannik Laval,
Alexander Serebrenik, Bogdan Vasilescu, and
Stephane Ducasse. Software quality metrics ag-
gregation in industry. Journal of Software: Evo-
lution and Process</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Towards Meaningful Software Metrics Aggregation</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Maria Ulan, Welf Lowe, Morgan Ericsson, Anna Wingkvist Department of Computer Science and Media Technology Linnaeus University</institution>
          ,
          <addr-line>Vaxjo</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <volume>25</volume>
      <issue>10</issue>
      <abstract>
        <p>Aggregation of software metrics is a challenging task, it is even more complex when it comes to considering weights to indicate the relative importance of software metrics. These weights are mostly determined manually, it results in subjective quality models, which are hard to interpret. To address this challenge, we propose an automated aggregation approach based on the joint distribution of software metrics. To evaluate the e ectiveness of our approach, we conduct an empirical study on maintainability assessment for around 5 000 classes from open source software systems written in Java and compare our approach with a classical weighted linear combination approach in the context of maintainability scoring and anomaly detection. The results show that approaches assign similar scores, while our approach is more interpretable, sensitive, and actionable.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Index terms|
Weights, Copula</p>
      <p>Software metrics, Aggregation,
Quality models provide a basic understanding of
what data to collect and what software metrics to
use. However, they do not provide how software
(sub-)characteristics should be quanti ed, and metrics
should be aggregated.</p>
      <p>The problem of metrics aggregation is addressed by
the research community. Metrics are often de ned at
a method or class level, but quality assessment
sometimes requires insights at the system level. One bad
metric value can be evened out by other good
metric values when summing them up or computing their
mean [1]. Some e ort has been directed into
metrics aggregation based on inequality indices [2, 3], and
based on thresholds [4{8] to map source code level
measurements to software system rating.</p>
      <p>In this research, we do not consider aggregation
along the structure of software artifacts, e.g., from
classes to the system. We focus on another type
of metrics aggregation, from low-level to higher-level
quality properties, Mordal-Manet et al. call such type
of aggregation metrics composition [9].</p>
      <p>Di erent software quality models that use weighted
metrics aggregation have been proposed, such
as QMOOD [10], QUAMOCO [11], SIG [12],
SQALE [13], and SQUALE [14]. The weights in these
models are de ned based on experts' opinions or
surveys. It is questionable whether manual weighting and
combination of the values with an arbitrary (not
necessarily linear) function are acceptable operations for
metrics of di erent scales and distributions.</p>
      <p>As a countermeasure, we propose to use a
probabilistic approach for metrics aggregation. In previous
research, we considered software metrics to be equally
important and developed a software metrics
visualization tool. This tool allowed the user to de ne and
manipulate quality models to reason about where
quality problems were located, to detect patterns,
correlations, and anomalies [15].</p>
      <p>Here, we de ne metrics scores by probability as
complementary Cumulative Distribution Function and
link them with joint probability by the so-called
copula function. We determine weights from the joint
distribution and aggregate software metrics by weighted
product of the scores. We formalize quality models
to expresses quality as the probability of observing a
software artifact with equal or better quality. This
approach is objective since it relies solely on data. It
allows to modify quality models on the y, and it
creates a realistic scale since the distribution represents
quality scores for a set of software artifacts.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Approach Overview</title>
      <p>We consider a joint distribution of software metrics
values, and for each software artifact, we assign a
probabilistic score. W.l.o.g, we assume that all
software metrics are de ned such that larger values
indicate lower quality. The joint distribution of software
metrics provides the means of objective comparison
of software artifacts in terms of their quality scores,
which represent the relative rank of the software
artifact within the set of all software artifacts observed so
far, i.e., how good or bad a quality score compare to
other quality scores.</p>
      <p>Let A = fa1; : : : ; akg be a set of k software artifacts,
and M = fm1; : : : ; mng be a set of n software metrics.
Each software artifact is assessed by metrics from M ,
and the result of this assessment is represented as k n
performance matrix of metrics values.</p>
      <p>We denote by ej (ai) for 8i 2 f1; kg; 8j 2 f1; ng
an (i; j)-entry, which shows the degree of performance
for an software artifact ai measured for metric mj . We
denote by Ej = [ej (a1); : : : ; ej (ak)]T 2 Ejk the j-th
column of performance matrix, which represents metrics
values for all software artifacts with respect to metric
mj where Ej is the domain of these values.</p>
      <p>
        For each software artifact ai 2 A and metric mj 2
M , we de ne a score sj (ai), which indicates the degree
to which this software artifact meets the requirements
for the metric. Formally, for each metric mj we de ne
a score function sj :
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
      </p>
      <p>Based on the score functions sj for each metric, our
goal is to de ne an overall score function such that, for
any software artifact, it indicates the degree to which
this software artifact satis es all metrics. Formally, we
are looking for a function:</p>
      <p>F (s1; : : : ; sn) : [0; 1]n 7! [0; 1]</p>
      <p>Such an aggregation function takes an n-tuple of
metrics scores and returns a single overall score. We
require the following properties:
1. If a software artifact does not meet the
requirements for one of the metrics, the overall score
should be close to zero.</p>
      <p>
        F (s1; : : : ; sn) ! 0 as sj ! 0
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
      </p>
      <p>si1 sl1 ^
F (si1; : : : ; sin)
^ sin sln )</p>
      <p>F (sl1; : : : ; sln);
where sij = sj (ej (ai)); slj = sj (ej (al))
3. If the software artifact perfectly meets all but one
metric, the overall score is equal to that metrics
score.</p>
      <p>F (1; : : : ; 1; sj ; 1; : : : ; 1) = sj</p>
      <p>
        We propose to express the degree of satisfaction
with respect to a metric using probability. We de ne
the score function of Equation (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) as follows:
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
(9)
sj (ej (a)) = P r(Ej &gt; ej (a)) = CCDF ej (a)
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
      </p>
      <p>We calculate the Complementary Cumulative
Distribution Function (CCDF). This score represents the
probability of nding another software artifact with an
evaluation value greater than the given value. For a
multi-criteria case, we can specify a joint distribution
in terms of n marginal distributions and a so-called
copula function [16]:</p>
      <p>Cop(CCDF e1 (a); : : : ; CCDF en (a)) =</p>
      <p>
        P r(E1 &gt; e1(a); : : : ; En &gt; en(a))
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
      </p>
      <p>
        The copula representation of a joint probability
distribution allows us to model both marginal
distributions and dependencies. The copula function Cop
satis es the signature (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) and ful lls the required
properties (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ), (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ), and (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ).
      </p>
      <p>We consider a weight vector, where each wi
represents the relative importance of metric mi compared
to the others:
w = [w1; : : : ; wn]T ; where
n
X wi = 1
i=1</p>
      <p>
        We compute weights using a non-linear
exponential regression model for a sample of software artifacts
mapping metrics scores of Equation(
        <xref ref-type="bibr" rid="ref6">6</xref>
        ) to copula value
of Equation(
        <xref ref-type="bibr" rid="ref7">7</xref>
        ). Note that these weights regard
dependencies between software metrics. Finally, we de ne
software metrics aggregation as a weighted
composition of metrics score functions:
      </p>
      <p>F (s1; : : : ; sn) =
n
Y swj</p>
      <p>j
j=1</p>
      <p>
        We consider a software artifact al to be better than
or equally good as another software artifact ai, if the
total score according to Equation (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) of al is greater
than or equal the total score of ai:
al
ai , F (al)
      </p>
      <p>F (ai)
(10)
Aggregation is de ned as a composition of the product,
exponential, and CCDF functions, which are
monotonic functions. Hence, the score which is obtained by
aggregation allows to rank set A of software artifacts
with respect to metrics set M :</p>
      <sec id="sec-2-1">
        <title>Rank (al)</title>
        <p>Rank (ai) , F (al)</p>
        <p>F (ai)
(11)</p>
        <p>From a practical point of view, probabilities can be
calculated empirically, and each score can be obtained
as a ratio of the number of software artifacts with lower
than a given metric value to the number jAj of software
artifacts.</p>
        <p>The proposed aggregation approach makes it
possible to express the score for a software artifact as the
probability to observe something with equal or worse
metrics values, based on all software artifacts observed.
Once the quality scores are computed, the software
artifacts can trivially be ranked by the score by simply
ordering the values from smallest to largest. We
assign the same rank for software artifacts in case their
total scores are equal. Low (high) ranks correspond
to high (low) probabilities. This interpretation is the
same on all levels of aggregation, from metrics scores
to the total quality scores.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Preliminary Evaluation</title>
      <p>We apply our approach to assess Maintainability and
compare the results with the aggregation approach
based on a weighted linear combination of software
metrics. We measure the di erence between rankings
obtained by these approaches and study the agreement
between aggregated scores. Finally, we compare
approaches by means of sensitivity, and the ability to
detect extreme values and Pareto optimal solutions.</p>
      <p>In the following subsections, we investigate Java
classes and their quality assessment using two research
questions:
RQ1 How e ective is our approach for a quality
assessment?
RQ2 How actionable is our approach by means of
sensitivity and anomaly detection?
3.1</p>
      <sec id="sec-3-1">
        <title>Quality Model Description</title>
        <p>We consider a quality model for maintainability
assessment of classes, which relies on well-known
software metrics from Chidamber &amp; Kemerer [17] software
metrics suit:</p>
        <p>CBO, Coupling Between Objects
DIT, Depth of Inheritance Tree
LCOM, Lack of Cohesion in Methods
NOC, Number Of Children
RFC, Response For a Class
WMC, Weighted Method Count
(using Cyclomatic Complexity as method weight)
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data Set Description</title>
        <p>We chose to investigate three open-source software
systems. The systems were chosen by such criteria:
(i) they are written in Java, (ii) available in GitHub,
(iii) they were forked at least once, (iv) they are
sufciently large (several tens of thousands of lines of
code and several hundreds of classes), and (v) they
have been under active development for several years.
The projects we selected are three well-known and
frequently used systems: JabRef 1, JUnit 2, and
RxJava.3 Table 1 shows descriptive statistics for these
systems.
The result of the aggregation is a maintainability score,
and a ranked list of software artifacts according to
their maintainability score. To evaluate our approach,
we compare it to a well-known approach considering
the following measures:</p>
        <p>Correlation We study the Spearman's
correlation [18] between maintainability scores to assess the
ordering, relative spacing, and possible functional
dependency.</p>
        <p>Ranking distance We measure a distance between
the two rankings based on the Kendall tau distance,
which counts the number of pairwise disagreements
between two lists [19].</p>
        <p>1JabRef, Graphical Java application for managing BibTeX
and biblatex databases, https://github.com/JabRef/jabref
2JUnit, A framework to write repeatable tests for the Java
programming language, https://github.com/junit-team/
junit5</p>
        <p>3RxJava, Reactive Extensions for the JVM { a library
for composing asynchronous and event-based programs using
observable sequences for the Java VM, https://github.com/
ReactiveX/RxJava</p>
        <p>Agreement We measure agreement between
maintainability scores using Bland-Altman statistics [20].</p>
        <p>To evaluate if the aggregated scores can be used to
detect extreme values and Pareto optimal solutions,
we consider the following measures:</p>
        <p>Sensitivity We study a variety of values to
understand a percentage of software artifacts that have the
same maintainability score. The overall sensitivity is
the ratio of unique scores and the number of software
artifacts.</p>
        <p>Anomaly detection We compare approaches in
terms of their ability to detect anomalies (extreme
values and Pareto optimal solutions) using a ratio of the
number of detected anomalies and the total number of
anomalies in a sample data set.
3.4</p>
      </sec>
      <sec id="sec-3-3">
        <title>Preliminary Results and Analysis</title>
        <p>We implemented all algorithms and statistical analyses
in R4. The metrics data for analysis was collected with
VizzMaintenance.5 We collected the metrics values for
classes of JabRef, JUnit, and RxJava software systems
(5 317 classes in total). We considered their packages
structure to group classes and applied
KolmogorovSmirnov statistical test [21] to select a subset for
further statistical analysis, which was composed of 5 101
classes. Moreover, we consider the quality assessment
of each system separately to study potential di
erences between software systems. We apply our
aggregation approach (See Equation (12)) and compare
the results with a weighted linear sum of metrics (see
Equation(13)), which we normalized by the min-max
transformation.
swC1BO
swD2IT
sLwC3OM
swN4OC
swR5FC
swW6MC (12)
w1
w4</p>
        <p>CBO + w2</p>
        <p>DIT + w3
NOC + w5</p>
        <p>RFC + w6</p>
        <sec id="sec-3-3-1">
          <title>LCOM + WMC</title>
          <p>(13)
RQ1-e ectiveness
We compare approaches within a single software
system and the merged data set. First, we study a
correlation between aggregation results. Second, we rank
software classes based on maintainability scores
obtained by two approaches. Table 2 shows Kendall Tau
distance and Spearman's rho correlation. We observe a
strong correlation between maintainability scores and
low distance between rankings.</p>
          <p>4The R Project for Statistical Computing, https://www.
r-project.org</p>
          <p>5VizzMaintenance, Eclipse plug-in, http://www.arisa.se/
products.php</p>
          <p>Third, we study an agreement, in the Bland-Altman
plot each class is represented by a point with the
average of the maintainability scores obtained by two
approaches as the x-value and the di erence between
these two scores as the y-value. The blue line
represents the mean di erence between scores and the red
lines the 95% con dence interval (mean 1:96SD). We
can observe that plots for JabRef and RxJava have a
similar shape (cf. Figure 1, Figure 3) compare to
jUnit (cf. Figure 2). We can observe a similar shape for
merged data set (cf. Figure 4), since in total JabRef
and RxJava have almost four times more classes than
jUnit. We can observe that in all plots measurements
are mostly concentrated near the blue line and only a
few of them are outside of the red lines. The di erence
for jUnit is slightly smaller than for JabRef and
RxJava. In sum, we conclude that the approaches agree,
i.e., aggregation results do not di er statistically, and
may be used interchangeably for the ranking of
software classes.</p>
          <p>RQ2-actionability
First, we study the variety of values for each metric
and number of extreme values, which we de ne by
means of outliers. We detected 19 extreme values in
total. In Table 3 we can observe that metrics have
quite low sensitivity, for each metric 40 values on
average are unique.</p>
          <p>We consider a multi-objective optimization problem
based on metrics, and we detect ve possible Pareto
optimal solutions, i.e., none of the metrics values can
be improved without degrading some of the other
metrics values. Second, we study the sensitivity and
ability to detect anomalies (extreme values and Pareto
optimal solutions) for both approaches. In Table 4 we
can observe that our approach is more sensitive and
more suitable for anomaly detection.
We de ne metric scores by means of probability, as
it provides a simple interpretation for a quality score
by means of the joint distribution. In contrast,
quality scores obtained by a weighted linear combination
of metrics do not provide clear interpretation,
especially when metrics are incomparable. We assume that
larger metrics values indicate worse quality, however
both too small and too large values can be problematic
for some of the software metrics. Note that it is not
a limitation since we could transform metrics to have
this property. We extracted weights from joint
distribution, which we consider as a ground truth. This
might be a threat to internal validity. We compare our
approach with a weighted linear combination of
metrics, it might be a treat as well since we do not compare
it with other approaches. In this preliminary
evaluation, we consider six metrics, three software systems
written in Java, and focus on maintainability. This
might be a threat to external validity.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Future Work</title>
      <p>In conclusion, we de ned an automated aggregation
approach for software quality assessment. We de ned
probabilistic scores based on software metrics
distributions and aggregate them using the weighted product,
we obtained the weights from joint distribution. To
evaluate the e ectiveness and actionability of our
approach, we conducted an empirical study for
maintainability assessment. We collected CBO, DIT, LCOM,
NOC, RFC, and WMC metrics from Chidamber &amp;
Kemerer metrics suit for classes of JabRef, JUnit, and
RxJava software systems, and compared our approach
with a weighted linear combination of metrics. The
results showed that the approaches agree and can be
used interchangeably for ranking software artifacts.
However, our approach is more e ective and
actionable, i.e., it has clear interpretation, higher sensitivity,
and is better at detecting extreme values and Pareto
optimal solutions.</p>
      <p>Our approach is mathematically well-de ned since
generalization is not questionable, and can be
theoretically validated. For example, we can conduct
simulation experiments to study the deviation between
our and other approaches depending on the number of
classes, number of metrics, levels of aggregation, etc.
However, there is still a need for empirical validation
of our approach. In the future, we plan to evaluate
our approach on other data sets, such as The GitHub
Java corpus, which contains around 15 000 software
systems [22]. We also plan to compare our approach
with Bakota et al. probabilistic approach [23].</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>
        We thank the anonymous reviewers whose comments
and suggestions helped us improve and clarify the
research onto paper.
[10] Jagdish Bansiya and Carl G Davis. A
hierarchical model for object-oriented design quality
assessment. IEEE Transactions on software
engineering, 28(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ):4{17, 2002.
[11] Stefan Wagner, Andreas Goeb, Lars Heinemann,
Michael Klas, Constanza Lampasona, Klaus
Lochmann, Alois Mayr, Reinhold Plosch, Andreas
Seidl, Jonathan Streit, et al. Operationalised
product quality models and assessment: The
quamoco approach. Information and Software
Technology, 62:101{123, 2015.
[12] Robert Baggen, Jose Pedro Correia, Katrin Schill,
and Joost Visser. Standardized code quality
benchmarking for improving software
maintainability. Software Quality Journal, 20(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ):287{307,
2012.
[13] Jean-Louis Letouzey and Thierry Coq. The sqale
analysis model: An analysis model compliant
with the representation condition for assessing
the quality of software source code. In
Advances in System Testing and Validation Lifecycle
(VALID), 2010 Second International Conference
on, pages 43{48. IEEE, 2010.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Bogdan</given-names>
            <surname>Vasilescu</surname>
          </string-name>
          , Alexander Serebrenik, and Mark Van den Brand.
          <article-title>By no means: A study on aggregating software metrics</article-title>
          .
          <source>In Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics</source>
          , pages
          <volume>23</volume>
          {
          <fpage>26</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Rajesh</given-names>
            <surname>Vasa</surname>
          </string-name>
          , Markus Lumpe, Philip Branch, and
          <string-name>
            <given-names>Oscar</given-names>
            <surname>Nierstrasz</surname>
          </string-name>
          .
          <article-title>Comparative analysis of evolving software systems using the gini coe cient</article-title>
          .
          <source>In 2009 IEEE International Conference on Software Maintenance</source>
          , pages
          <volume>179</volume>
          {
          <fpage>188</fpage>
          . IEEE,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Serebrenik</surname>
          </string-name>
          and Mark van den Brand.
          <article-title>Theil index for aggregation of software metrics values</article-title>
          .
          <source>In 2010 IEEE International Conference on Software Maintenance</source>
          , pages
          <fpage>1</fpage>
          <article-title>{9</article-title>
          . IEEE,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Ilja</given-names>
            <surname>Heitlager</surname>
          </string-name>
          , Tobias Kuipers, and
          <string-name>
            <given-names>Joost</given-names>
            <surname>Visser</surname>
          </string-name>
          .
          <article-title>A practical model for measuring maintainability</article-title>
          . In null, pages
          <volume>30</volume>
          {
          <fpage>39</fpage>
          . IEEE,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Jose</given-names>
            <surname>Pedro</surname>
          </string-name>
          Correia and
          <string-name>
            <given-names>Joost</given-names>
            <surname>Visser</surname>
          </string-name>
          .
          <article-title>Certi cation of technical quality of software products</article-title>
          .
          <source>In Proc. of the Int'l Workshop on Foundations and Techniques for Open Source Software Certi cation</source>
          , pages
          <volume>35</volume>
          {
          <fpage>51</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Tiago</surname>
            <given-names>L Alves</given-names>
          </string-name>
          , Jose Pedro Correia, and
          <string-name>
            <given-names>Joost</given-names>
            <surname>Visser</surname>
          </string-name>
          .
          <article-title>Benchmark-based aggregation of metrics to ratings</article-title>
          .
          <source>In 2011 Joint Conference of the 21st International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement</source>
          , pages
          <volume>20</volume>
          {
          <fpage>29</fpage>
          . IEEE,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Paloma</given-names>
            <surname>Oliveira</surname>
          </string-name>
          , Fernando P Lima, Marco Tulio Valente, and
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Serebrenik</surname>
          </string-name>
          .
          <article-title>Rttool: A tool for extracting relative thresholds for source code metrics</article-title>
          .
          <source>In 2014 IEEE International Conference on Software Maintenance and Evolution</source>
          , pages
          <volume>629</volume>
          {
          <fpage>632</fpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Kazuhiro</given-names>
            <surname>Yamashita</surname>
          </string-name>
          , Changyun Huang, Meiyappan Nagappan, Yasutaka Kamei, Audris Mockus, Ahmed E Hassan, and
          <string-name>
            <given-names>Naoyasu</given-names>
            <surname>Ubayashi</surname>
          </string-name>
          .
          <article-title>Thresholds for size and complexity metrics: A case study from the perspective of defect density</article-title>
          .
          <source>In 2016 IEEE international conference on software quality, reliability and security (QRS)</source>
          , pages
          <fpage>191</fpage>
          {
          <fpage>201</fpage>
          . IEEE,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>K.</given-names>
            <surname>Mordal-Manet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Balmas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Denier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ducasse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wertz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Laval</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bellingard</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Vaillergues</surname>
          </string-name>
          .
          <article-title>The squale model; a practice-based industrial quality model</article-title>
          .
          <source>In 2009 IEEE Int. Conf. on Software Maintenance (ICSM)</source>
          , pages
          <fpage>531</fpage>
          {
          <fpage>534</fpage>
          ,
          <string-name>
            <surname>Sept</surname>
          </string-name>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Maria</surname>
            <given-names>Ulan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebastian</surname>
            <given-names>H</given-names>
          </string-name>
          onel, Rafael M Martins, Morgan Ericsson, Welf Lowe, Anna Wingkvist, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Kerren</surname>
          </string-name>
          .
          <article-title>Quality models inside out: Interactive visualization of software metrics by means of joint probabilities</article-title>
          .
          <source>In 2018 IEEE Working Conference on Software Visualization (VISSOFT)</source>
          , pages
          <fpage>65</fpage>
          {
          <fpage>75</fpage>
          . IEEE,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Roger</surname>
            <given-names>B Nelsen.</given-names>
          </string-name>
          <article-title>An introduction to copulas</article-title>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Shyam R Chidamber and Chris F Kemerer</surname>
          </string-name>
          .
          <article-title>A metrics suite for object oriented design</article-title>
          .
          <source>IEEE Transactions on software engineering</source>
          ,
          <volume>20</volume>
          (
          <issue>6</issue>
          ):
          <volume>476</volume>
          {
          <fpage>493</fpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>C.</given-names>
            <surname>Spearman</surname>
          </string-name>
          .
          <article-title>General intelligence, objectively determined and measured</article-title>
          .
          <source>The American Journal of Psychology</source>
          ,
          <volume>15</volume>
          (
          <issue>2</issue>
          ):
          <volume>201</volume>
          {
          <fpage>292</fpage>
          ,
          <year>1904</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Maurice</given-names>
            <surname>Kendall</surname>
          </string-name>
          .
          <article-title>Rank correlation methods</article-title>
          .
          <source>Grifn</source>
          ,
          <year>1948</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J Martin</given-names>
            <surname>Bland</surname>
          </string-name>
          and
          <string-name>
            <given-names>Douglas</given-names>
            <surname>Altman</surname>
          </string-name>
          .
          <article-title>Measuring agreement in method comparison studies</article-title>
          .
          <source>Statistical methods in medical research</source>
          ,
          <volume>8</volume>
          (
          <issue>2</issue>
          ):
          <volume>135</volume>
          {
          <fpage>160</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Myles</given-names>
            <surname>Hollander</surname>
          </string-name>
          and
          <article-title>Douglas A Wolfe. Nonparametric statistical methods</article-title>
          . Wiley-Interscience,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Miltiadis</given-names>
            <surname>Allamanis</surname>
          </string-name>
          and
          <string-name>
            <given-names>Charles</given-names>
            <surname>Sutton</surname>
          </string-name>
          .
          <article-title>Mining source code repositories at massive scale using language modeling</article-title>
          .
          <source>In Proceedings of the 10th Working Conference on Mining Software Repositories</source>
          , pages
          <volume>207</volume>
          {
          <fpage>216</fpage>
          . IEEE Press,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Tibor</surname>
            <given-names>Bakota</given-names>
          </string-name>
          ,
          <article-title>Peter Hegedu}s, Peter Kortvelyesi</article-title>
          , Rudolf Ferenc, and
          <string-name>
            <given-names>Tibor</given-names>
            <surname>Gyimothy</surname>
          </string-name>
          .
          <article-title>A probabilistic software quality model</article-title>
          .
          <source>In 2011 27th IEEE International Conference on Software Maintenance (ICSM)</source>
          , pages
          <fpage>243</fpage>
          {
          <fpage>252</fpage>
          . IEEE,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>