1


Approaches for Software Metrics Threshold Derivation:
A Preliminary Review
TINA BERANIČ and MARJAN HERIČKO, University of Maribor


Knowing the reliable software metrics threshold can contribute to product quality evaluation and, consequently, increase the
usefulness of software metrics in practice. How to derive software metrics thresholds is a topic of many researchers, either
proposing new approaches, or verifying existing methods on different practical projects. A literature review was conducted to
summarise the available knowledge in this area. The aim was to find approaches for software metric threshold derivation and
extract relevant information, such as programming language for which an approach was proposed and tested, what metrics
were derived, and if approaches are automated with supporting tools. Data extracted from 14 primary studies reveal that the
majority of studies present new approaches and approaches usually apply statistical analysis for threshold derivation. Primary
programming language for which metrics are derived is Java, followed by C, C++ and C#. We also came across some tools
used for threshold derivation, but in practise their use was not detected. Metrics derived, purpose of metrics derivation and
other information are also summarised in the tables bellow.
Categories and Subject Descriptors: D.2.8 [Software Engineering] Metrics; D.2.9 [Software Engineering] Management-
Software quality assurance (SQA)
General Terms: Software Metrics Threshold Derivation
Additional Key Words and Phrases: Software Metrics, Thresholds, Reference Values, Object-oriented, Literature Review


1.   INTRODUCTION
To deliver software product in required quality, it is crucial to address and manage the quality assur-
ance domain properly. Software metrics can be used for reflection of the qualitative characteristics of
software modules in a quantitative way [Arar and Ayan 2016], presenting a control instrument in a
software development and maintenance process [Alves et al. 2010]. Software metrics assess software
from different views and, therefore, belong to different metrics groups, but overall, reflect the internal
quality of software systems [Arar and Ayan 2016]. The usefulness of metrics without knowing their
reference values is very limited, due mainly to interpretation difficulties [Ferreira et al. 2012]. To over-
come the above-mentioned difficulties it is important that reliable reference values of software metrics
are available.
  Thresholds are heuristic values that are used to set ranges of desirable and undesirable metric
values for measured software and, furthermore, used to identify anomalies which may be an actual
problem [Lorenz and Kidd 1994]. Threshold values explain if a metric value is in the normal range
[Ronchieri and Canaparo 2016] and, consequently, provide a scale for paying attention to threshold-
exceeding components [Arar and Ayan 2016]. There are many approaches available to compute thresh-


Author’s address: T. Beranič, Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova
17, 2000 Maribor, Slovenia; email: tina.beranic@um.si; M. Heričko, Faculty of Electrical Engineering and Computer Science,
University of Maribor, Smetanova 17, 2000 Maribor, Slovenia; email: marjan.hericko@um.si.

Copyright c by the paper’s authors. Copying permitted only for private and academic purposes.
In: Z. Budimac (ed.): Proceedings of the SQAMIA 2017: 6th Workshop of Software Quality, Analysis, Monitoring, Improvement,
and Applications, Belgrade, Serbia, 11-13.9.2017, Also published online by CEUR Workshop Proceedings (http://ceur-ws.org,
ISSN 1613-0073)
1:2      •    T. Beranič and M. Heričko

old values. Anyway, as summarised by [Arar and Ayan 2016], threshold definition methodology should
meet the following requirements: (1) It is built on representative benchmark data and not on opinions;
(2) Data should be analysed statistically; (3) It is repeatable and clear [Alves et al. 2010; Sánchez-
González et al. 2012].
   The aim of the presented paper is to gather available knowledge regarding software metrics thresh-
old derivation techniques. The first step in a literature review is to find already available related
studies. [Ronchieri and Canaparo 2016] present a preliminary mapping study in the area of software
metric thresholds in which they aim to present the identified state of current affairs. Two main re-
search questions were formed, dealing with identification of the currently most important papers in
software metrics threshold research community, and with the meaningful aggregation of those papers.
During their review, they observed three main domain problems: (1) Unclear explanation of the method
for selecting the technique that calculates threshold; (2) A direct application of the metric threshold
values to different code context; (3) A lack of objective analysis of calculated thresholds [Ronchieri and
Canaparo 2016]. Anyhow, the extracted domain specific data comprise the study topic and categorisa-
tion, number and type of analysed projects, types of derived metrics and programming language and
computed threshold values. According to our knowledge, this is the only available review study deal-
ing with the software metrics threshold values. This was also confirmed by [Ronchieri and Canaparo
2016].
   The previously mentioned mapping study presented the starting point in our literature review.
We addressed different research questions, so different extraction categories were defined. Since our
search provided some additional articles that [Ronchieri and Canaparo 2016] did not examined, some
extraction categories remain the same. Also, the search resources were expanded in our review.
   The article is organised as follows. After the Introduction that presents an insight into the theo-
retical background and related work, Section 2 presents the performed literature review. Basic data
extracted from primary studies are presented in this section. Section 3 explains and describes available
approaches for software metrics threshold derivation, together with their connections and expansions.
Extracted software metrics used in primary studies and details about derived threshold values are
presented in Section 4. The last chapter concludes the article providing the discussion and future work
opportunities.

2.     A PRELIMINARY LITERATURE REVIEW OF THE RESEARCH AREA
Without knowing the threshold values of software metrics, their usefulness is limited. As stated by
[Ferreira et al. 2012; Alves et al. 2010], threshold values for most metrics are still unknown. Never-
theless, numerous approaches for metrics derivation exist; there are many research opportunities that
can be addressed aimed at improving domain knowledge. The performed literature review presents an
insight into previous research in the area.
  Based on our research purpose, four main research question were formed:
      . RQ1: What approaches and methods are used for software metrics threshold derivation?
      . RQ2: What is the purpose of threshold derivation?
      . RQ3: For which software metrics thresholds are derived?
      . RQ4: What tools are available for metrics‘ collection and threshold derivation?
  Based on the research questions, the following search string was formed: (”software metrics” OR
”source code metrics”) AND (”threshold” OR ”reference values”). With it, the search was conducted in
selected digital libraries. Search results by data sources and final number of selected studies after
applying inclusion and exclusion criteria are presented in Table I. Among the applied inclusion criteria
                         Approaches for Software Metrics Threshold Derivation: A Preliminary Review       •   1:3

                       Table I. Digital Libraries, Search Results and Selected Studies
              Electronic Base        URL                           Number of Results   Selected Studies
              ScienceDirect          https://www.sciencedirect.com 8                   2
              ACM Digital Library    https://dl.acm.org            8                   4
              IEEE Xplore            https://ieeexplore.ieee.org   119                 10
              Scopus                 https://www.scopus.com        76                  12
              Web of Science         https://webofknowledge.com    32                  5

are that the study is available in selected digital libraries and is available in English, the study is from
software engineering domain and that approaches for threshold derivation are presented or if they
are just validated, the used metrics should be explained. Where available and appropriate, the search
was limited to abstract, title and keywords. Since some search tools are looking across different digital
libraries, some search results are listed in multiple categories.
   In the end, 18 primary studies were selected for detailed analysis. Among them, four articles were
excluded from the data extraction process. [Veado et al. 2016; Foucault et al. 2014; Sousa et al. 2017]
are presenting developed tools for threshold derivation and [Vale and Figueiredo 2015] are presenting
a tailored approach for threshold derivation for use in the context of software product lines. Finally,
the data was extracted from 14 primary studies.
   In Table II, basic information is presented about the selected primary studies. The first one is used
programming language for which software metric thresholds were derived. As can be seen, the majority
of studies derive threshold values for the Java programming language, while reference values for other
programming languages, specifically C, C++ and C# are presented in 4 studies. Threshold values for
other programming languages were not detected. The second information matter was the purpose of
metric derivation. The majority of studies derive threshold to find or predict defects and errors in
programmes, while some of them compute reference values for code smells detection purposes.

                                Table II. Basic Information About Primary Studies
               Study                            Programming Language     Purpose of metrics derivation
               [Alves et al. 2010]              Java, C#                 Maintainability level
               [Arar and Ayan 2016]             Java                     Fault prediction
               [Benlarbi et al. 2000]           C++                      Fault prediction
               [Boucher and Badri 2016]         Java                     Fault prediction
               [Ferreira et al. 2012]           Java                     Detecting design flaws
               [Fontana et al. 2015]            Java                     Code smell detection
               [Foucault et al. 2014]           Java                     Risky classes
               [Herbold et al. 2011]            Java, C#, C++, C         Evaluate quality attributes
               [Hussain et al. 2016]            Java                     Fault prediction
               [Mihancea and Marinescu 2005]    Java, C++                Detecting design flaws
               [Oliveira et al. 2014b]          Java                     Internal quality
               [Shatnawi 2010]                  Java                     Error risk level
               [Shatnawi et al. 2010]           Java                     Error risk level
               [Yamashita et al. 2016]          Java                     Defect risk


  The process of metric computation or threshold derivation is sometimes automated with tool support.
As already mentioned, [Veado et al. 2016] present a TDTool for threshold derivation and [Oliveira et al.
2014a] present an RTTool for the same purpose. While TDTool supports any kind of metrics, RTTool
compute threshold only for class level metrics that have heavy-tailed distribution. [Sousa et al. 2017]
propose a FindSmells tool for smells detection in software systems based on software metrics and their
thresholds. While linked to threshold computation, tools for metrics derivation was listed in some
1:4    •    T. Beranič and M. Heričko

articles. The tools that were used in the selected primary studies are Conecta [Ferreira et al. 2012],
ckjm tool [Arar and Ayan 2016], Borland Together [Shatnawi 2010], Moose platform and VerveineJ
[Oliveira et al. 2014b] and SIG Software Analysis Toolkit [Alves et al. 2010].


3.    APPROACHES FOR THRESHOLD DERIVATION OF SOFTWARE METRICS
3.1   Novel Approaches
Different derivation approaches are proposed among the articles about metrics thresholds. According
to [Fontana et al. 2015], approaches can be based on observation, statistical analysis, metrics analy-
sis and machine learning. Detected approaches from primary studies are summarised and presented
below in chronological order of publications.
   [Erni and Lewerentz 1996] present a statistical method based on [DeMarco 1986] statement talking
about judging metric values based on comparison to similar projects. They compute the average (µ) and
standard deviation (σ) of metric values of one software system, on the assumption that the metrics are
distributed normally. The lower (Tmin = µ−σ) and the higher (Tmax = µ+σ) thresholds can be produced
based on calculated values. Values that are outside the interval µ ± σ are considered as outliers.
   A method called Tuning Machine is presented by [Mihancea and Marinescu 2005]. The method is
proposing a solution dealing with the problem of threshold values that should be used by a detection
strategy. The new method for establishing proper threshold values is proposed with the help of a
genetic algorithm.
   Identifying the threshold values using Receiver Operating Characteristic (ROC) curves is presented
by [Shatnawi et al. 2010]. They claim that meaningful and useful threshold values must be associated
with design factors of interest. Therefore, their identification of metric thresholds is associated with
class error probability, since their goal is to identify faults. ROC, a diagnostic accuracy test [Zweig
and Campbell 1993], was used previously to make decisions in the diagnostic area in Radiology [Han-
ley and McNeil 1982] and clinical medicine [Zweig and Campbell 1993]. Since medicine represents a
field in which metrics and their thresholds represent crucial information [Ferreira et al. 2012], many
approaches now used in software engineering have roots in the medical domain. When calculating
reference values of object-oriented metrics, several ways can be followed. However, [Shatnawi et al.
2010] are making the first attempt to use it for identifying object-oriented metric threshold values.
They analyse three versions of Eclipse software, also using the error data from Bugzilla. The errors
were then divided into three categories (Low, Medium, High) and statistical analyses were conducted
to identify metric values that could classify Eclipse classes into different error categories (binary and
ordinal).
   Existing approaches that derive metrics threshold values systematically do not take into account the
different statistical properties of software metrics [Alves et al. 2010]. For that reason, a method that
derives metrics thresholds values based on empirical measurement is presented by [Alves et al. 2010].
It considers different data distribution and scales of source code metrics, and it is also resilient against
outliers. For the presented method, which contains six steps, the benchmark of 100 object-oriented
Java and C# systems were used for defining metrics reference values.
   A statistical approach for threshold derivation with the help of a logistic regression model is pre-
sented by [Shatnawi 2010]. Logistic regression validates metrics and constructs a threshold model. The
connection was checked between single metric and fault probability. For significant metrics thresholds,
values were investigated with the use of the Bender method [Bender 1999], that has its source in the
medical domain. Bender [Bender 1999] proposed two methods for finding possible thresholds: Value of
an Acceptable Risk Level (VARL) and lower Value of an Acceptable Risk Gradient (VARG).
                         Approaches for Software Metrics Threshold Derivation: A Preliminary Review   •   1:5

   Machine learning techniques were used in the approach by [Herbold et al. 2011]. They presented a
programming language independent data driven method for threshold value calculation. Methodology
can also be used for efficiency optimisation of existing metric sets and for the simplification of complex
classification models.
   [Ferreira et al. 2012] aimed at defining threshold values for object-oriented software metrics. Thresh-
olds were derived by analysing the statistical properties of the data obtained by a large collection of
different size and domain projects. After data were gathered, the values used most commonly in prac-
tice were identified. 40 projects were analysed and reference values for six metrics were calculated,
taking into account the relevant data distribution.
   Since it was shown that thresholds are connected to the context of a project, the approach by [Fou-
cault et al. 2014] aims to overcome the mentioned drawback by presenting a method that can compute
the threshold for any given context. They select software projects and entities randomly and follow the
[Chidamber et al. 1998] principle that considers that a metric threshold can be derived from a quantile
of the distribution of metric values. The input is quantile and estimation of corresponding thresholds
is computed using a process that relies on the bootstrap statistical approach.
   The Bender approach [Bender 1999] was used as a basis by [Hussain et al. 2016]. They proposed a
model for metrics threshold derivation with use of non-linear functions, described with logistic regres-
sion coefficient.
3.2   Extensions and validations of Existing Approaches and Replicated Studies
An empirical method for extracting relative threshold for software metrics of real systems based on
benchmark data is presented by [Oliveira et al. 2014b]. The absolute metrics threshold can be com-
plemented and the definition of relative threshold introduced: p% of the entities should have M ≤ k.
M is the source code metric calculated and p is the percentage of entities that upper limit k should be
applied to. Values of p and k are calculated.
   [Fontana et al. 2015] also present a benchmark based data driven approach for deriving metrics
reference values. Similar to [Oliveira et al. 2014b], the presented approach is inspired by [Alves et al.
2010]. The main goal is to design a method that can be applied on metrics that do not represent ratio
values, typically ranging in the interval [0,1]. The method takes into account the statistical properties
of obtained data of 74 open source projects.
   Logistic regression was used by [Benlarbi et al. 2000], investigating the connection between faults
and software metrics. Two models were constructed, one with threshold values and the other without.
The models were then compared looking for connection.
   [Arar and Ayan 2016] replicated the study by [Shatnawi 2010] and used a broader range of datasets
and metrics, trying to resolve the problem regarding project dependent threshold values, so that other
projects could benefit from the derived thresholds. Another replication study was published by [Ya-
mashita et al. 2016] using the method by [Alves et al. 2010] is used.
   The techniques based on the benchmark based method [Alves et al. 2010] and ROC curves [Shatnawi
et al. 2010] were compared and investigated by [Boucher and Badri 2016]. Both methods were assessed
as fault predictors and compared with the help of machine learning algorithms, which are often used
as a fault-proneness prediction.

4.    SOFTWARE METRIC THRESHOLDS
A number of different metrics were used in the selected primary studies and different threshold values
were derived. Which software metrics in particular are presented Table III. Metrics are presented used
either in the derivation or validation process. Most commonly, primary studies use the CK (Chidamber
& Kemerer) object-oriented metrics [Chidamber and Kemerer 1994], consisting of six metrics: WMC
1:6    •       T. Beranič and M. Heričko

(Weighted Methods Per Class), DIT (Depth of Inheritance Tree), NOC (Number of Children), CBO
(Coupling between Object Classes), RFC (Response For a Class), LCOM (Lack of Cohesion of Methods).
Together with them, some other metrics are also used. Table III also gives information about the form
in which the derived threshold values are presented. Usually, reference values are presented as a single
numerical value, sometimes in the form of intervals. Rarely, primary studies present just approaches,
and do not provide any concrete threshold value.

             Table III. Software Metrics Addressed in Primary Studies and Corresponding Thresholds
           Study                             Metrics                                        Presented Thresholds
           [Arar and Ayan 2016]          WMC, CBO, RFC, LCOM, Ca,                          Yes (numeric)
                                         Ce, NPM, AMC, MAX CC, AVG CC, LOC
           [Alves et al. 2010]           SIG quality model metrics                         No (risk intervals)
                                         (e.g. LOC, MCCabe, fan-in, number of methods
                                         number of parameters)
           [Benlarbi et al. 2000]        CK metrics (without LCOM)                         Yes (numeric)
           [Boucher and Badri 2016]      CK metrics and SLOC                               Yes (numeric)
           [Ferreira et al. 2012]        LCOM, DIT, coupling factor, afferent couplings,   Yes (interval
                                         number of public methods, number of public fields good, regular, bad)
           [Fontana et al. 2015]         ATFD, WMC, NOAP + NOAM, WMC                       Yes (interval
                                         LOC, CYCLO, MAXNESTING, NOAV                      low, medium, high)
                                         CINT, CM, CC and some others
           [Foucault et al. 2014]        NOA+NOM, CBO                                      No
           [Herbold et al. 2011]         VG, NBD, NFC, NST, WMC,                           Yes (numeric)
                                         CBO, RFC, NORM, NOM, LOC, NSM
           [Hussain et al. 2016]         DIT, CA, LCOM, NPM                                Yes (numeric)
           [Mihancea and Marinescu 2005] ATFD, WMC, TCC, WOC, NOPA, NOAM                   No
           [Oliveira et al. 2014b]       NOM, LOC, FAN-OUT, RFC, WMC,                      Yes (numeric)
                                         LCOM, PUBA/NOA
           [Shatnawi 2010]               CK metrics                                        Yes (numeric)
           [Shatnawi et al. 2010]        CK metrics                                        Yes (numeric)
           [Yamashita et al. 2016]       LOC, number of methods, fan-in, McCabe            Yes (interval)


  Another point of interest was also the area of software metrics‘ correlation and combination. The
correlation of metrics, meaning combining semantically related metrics, were not detected. On the
other hand, the metric combination was presented only for purpose of composing rules for code smell
detection [Mihancea and Marinescu 2005; Fontana et al. 2015].

5.    CONCLUSION
As presented in literature, thresholds are crucial for software metrics‘ interpretation and, conse-
quently, for increasing their use in practice [Ferreira et al. 2012]. Many derivation approaches can
be found in the literature, each having its own characteristics. Deriving software metric thresholds is
an important research topic ,which was confirmed with the performed literature review and selected
primary studies. Findings of the review are summarised in Table I, II and III, presenting answers to
the proposed research questions.
  In the context of RQ1, we searched for available software metrics derivation approaches, which are
presented in chapter 3. First, approaches that were proposed for the first time are presented, like
[Alves et al. 2010; Erni and Lewerentz 1996; Ferreira et al. 2012; Shatnawi 2010], after them, if avail-
able, the validation or extension of those approaches can be seen. The majority of primary studies
derive thresholds for error finding purposes, some of them also for code smell detection, regarding
                             Approaches for Software Metrics Threshold Derivation: A Preliminary Review             •     1:7

the answer on RQ2. With RQ3, we were looking for software metrics for which thresholds were de-
rived. As can be seen in Table III, authors often use CK object-oriented metrics, adding other software
metrics. The last research question, RQ4, was aimed at identifying tools, either for metric calculation
or threshold derivation. Among primary studies, tools are proposed like TDTool [Veado et al. 2016],
RTTool [Oliveira et al. 2014a] and FindSmells tool [Sousa et al. 2017] are proposed.
  According to our findings, possible research directions can be highlighted. The presented approaches
for software metric derivation that are, nowadays, usually used to find and predict errors, defects or
deficiencies in software modules, can be used in the context of threshold derivation for code smell
detection. Also, derived thresholds aimed at code smell detection can be compared to ones derived
for fault prediction. Since different programme languages are not represented widely among primary
studies, thresholds can be derived also for other languages and compared to already derived thresholds
for Java. The use and comparisons of presented threshold derivation tools can be done, assessing and
comparing their derived threshold values. Finally, tools can also be used in the context of code smell
detection.
  Acknowledgements
The authors acknowledge the financial support from the Slovenian Research Agency under The Young
Researchers Programme (SICRIS/SRA code 35512, RO 0796, Programme P2-0057).

REFERENCES
T. L. Alves, C. Ypma, and J. Visser. 2010. Deriving metric thresholds from benchmark data. In 2010 IEEE International
  Conference on Software Maintenance. 1–10.
Ömer Faruk Arar and Kürşat Ayan. 2016. Deriving thresholds of software metrics to predict faults on open source software:
  Replicated case studies. Expert Systems with Applications 61 (2016), 106 – 121.
Ralf Bender. 1999. Quantitative Risk Assessment in Epidemiological Studies Investigating Threshold Effects. Biometrical
 Journal 41, 3 (1999), 305–319.
S. Benlarbi, K. El Emam, N. Goel, and S. Rai. 2000. Thresholds for object-oriented measures. In Proceedings 11th International
  Symposium on Software Reliability Engineering. ISSRE 2000. 24–38.
A. Boucher and M. Badri. 2016. Using Software Metrics Thresholds to Predict Fault-Prone Classes in Object-Oriented Software.
  In 2016 4th Intl Conf on Applied Computing and Information Technology/3rd Intl Conf on Computational Science/Intelligence
  and Applied Informatics/1st Intl Conf on Big Data, Cloud Computing, Data Science Engineering (ACIT-CSII-BCD). 169–176.
S. R. Chidamber, D. P. Darcy, and C. F. Kemerer. 1998. Managerial use of metrics for object-oriented software: an exploratory
  analysis. IEEE Transactions on Software Engineering 24, 8 (Aug 1998), 629–639.
S. R. Chidamber and C. F. Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on Software Engineering
   20, 6 (Jun 1994), 476–493.
T. DeMarco. 1986. Controlling Software Projects: Management, Measurement, and Estimates. Prentice Hall PTR, Upper Saddle
   River, NJ, USA.
K. Erni and C. Lewerentz. 1996. Applying design-metrics to object-oriented frameworks. In Proceedings of the 3rd International
  Software Metrics Symposium. 64–74.
Kecia A.M. Ferreira, Mariza A.S. Bigonha, Roberto S. Bigonha, Luiz F.O. Mendes, and Heitor C. Almeida. 2012. Identifying
 thresholds for object-oriented software metrics. Journal of Systems and Software 85, 2 (2012), 244 – 257. Special issue with
 selected papers from the 23rd Brazilian Symposium on Software Engineering.
F. Arcelli Fontana, V. Ferme, M. Zanoni, and A. Yamashita. 2015. Automatic Metric Thresholds Derivation for Code Smell
  Detection. In 2015 IEEE/ACM 6th International Workshop on Emerging Trends in Software Metrics.
Matthieu Foucault, Marc Palyart, Jean-Rémy Falleri, and Xavier Blanc. 2014. Computing Contextual Metric Thresholds. In
 Proceedings of the 29th Annual ACM Symposium on Applied Computing (SAC ’14). ACM, New York, NY, USA, 1120–1125.
J A Hanley and B J McNeil. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve.
  Radiology 143, 1 (1982), 29–36.
Steffen Herbold, Jens Grabowski, and Stephan Waack. 2011. Calculation and optimization of thresholds for sets of software
  metrics. Empirical Software Engineering 16, 6 (01 Dec 2011), 812–841.
1:8     •     T. Beranič and M. Heričko

S. Hussain, J. Keung, A. A. Khan, and K. E. Bennin. 2016. Detection of Fault-Prone Classes Using Logistic Regression Based
   Object-Oriented Metrics Thresholds. In 2016 IEEE International Conference on Software Quality, Reliability and Security
   Companion (QRS-C). 93–100.
Mark Lorenz and Jeff Kidd. 1994. Object-oriented Software Metrics: A Practical Guide. Prentice-Hall, Inc., Upper Saddle River,
   NJ, USA.
P. F. Mihancea and R. Marinescu. 2005. Towards the Optimization of Automatic Detection of Design Flaws in Object-Oriented
   Software Systems. In Ninth European Conference on Software Maintenance and Reengineering. 92–101.
Paloma Oliveira, Fernando P. Lima, Marco Tulio Valente, and Alexander Serebrenik. 2014a. RTTool: A Tool for Extracting Rela-
   tive Thresholds for Source Code Metrics. In Proceedings of the 2014 IEEE International Conference on Software Maintenance
   and Evolution (ICSME ’14). IEEE Computer Society, Washington, DC, USA, 629–632.
P. Oliveira, M. T. Valente, and F. P. Lima. 2014b. Extracting relative thresholds for source code metrics. In 2014 Software
   Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). 254–
   263.
E. Ronchieri and M. Canaparo. 2016. A preliminary mapping study of software metrics thresholds. In ICSOFT 2016 - Pro-
   ceedings of the 11th International Joint Conference on Software Technologies, Maciaszek L., Cardoso J., Cabello E., van Sin-
   deren M., Maciaszek L., Ludwig A., and Cardoso J. (Eds.), Vol. 1. SciTePress, 232–240.
R. Shatnawi. 2010. A Quantitative Investigation of the Acceptable Risk Levels of Object-Oriented Metrics in Open-Source
   Systems. IEEE Transactions on Software Engineering 36, 2 (March 2010), 216–225.
Raed Shatnawi, Wei Li, James Swain, and Tim Newman. 2010. Finding Software Metrics Threshold Values Using ROC Curves.
   Journal of Software Maintenance and Evolution: Research and Practice 22, 1 (jan 2010), 1–16.
Bruno L. Sousa, Priscila P. Souza, Eduardo Fernandes, Kecia A. M. Ferreira, and Mariza A. S. Bigonha. 2017. FindSmells:
   Flexible Composition of Bad Smell Detection Strategies. In Proceedings of the 25th International Conference on Program
   Comprehension (ICPC ’17). IEEE Press, Piscataway, NJ, USA, 360–363.
L. Sánchez-González, F. Garcı́a, F. Ruiz, and J. Mendling. 2012. A study of the effectiveness of two threshold definition tech-
   niques. In 16th International Conference on Evaluation Assessment in Software Engineering (EASE 2012). 197–205.
G. A. D. Vale and E. M. L. Figueiredo. 2015. A Method to Derive Metric Thresholds for Software Product Lines. In 2015 29th
   Brazilian Symposium on Software Engineering. 110–119.
Lucas Veado, Gustavo Vale, Eduardo Fernandes, and Eduardo Figueiredo. 2016. TDTool: Threshold Derivation Tool. In Pro-
   ceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering (EASE ’16). ACM, New
   York, NY, USA, 24:1–24:5.
K. Yamashita, C. Huang, M. Nagappan, Y. Kamei, A. Mockus, A. E. Hassan, and N. Ubayashi. 2016. Thresholds for Size and
   Complexity Metrics: A Case Study from the Perspective of Defect Density. In 2016 IEEE International Conference on Software
   Quality, Reliability and Security (QRS). 191–201.
M H Zweig and G Campbell. 1993. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical
   medicine. Clinical Chemistry 39, 4 (1993), 561–577.