=Paper=
{{Paper
|id=Vol-1771/paper11
|storemode=property
|title=Using Analytics to Quantify Interest of Self-Admitted Technical Debt
|pdfUrl=https://ceur-ws.org/Vol-1771/paper11.pdf
|volume=Vol-1771
|authors=Yasutaka Kamei,Everton Maldonado,Emad Shihab,Naoyasu Ubayashi
|dblpUrl=https://dblp.org/rec/conf/apsec/KameiMSU16
}}
==Using Analytics to Quantify Interest of Self-Admitted Technical Debt==
<pdf width="1500px">https://ceur-ws.org/Vol-1771/paper11.pdf</pdf>
<pre>
                             1st International Workshop on Technical Debt Analytics (TDA 2016)


            Using Analytics to Quantify the Interest of
                  Self-Admitted Technical Debt
                     Yasutaka Kamei† , Everton Maldonado†† , Emad Shihab†† , and Naoyasu Ubayashi†
                       †
                        Principles of Software Languages Group (POSL), Kyushu University, Fukuoka, Japan
            ††
                 Department of Computer Science and Software Engineering, Concordia University, Montréal, Canada
                     Email: † {kamei, ubayashi}@ait.kyushu-u.ac.jp, †† {e silvam, eshihab}@encs.concordia.ca


   Abstract—Technical debt refers to the phenomena of taking                  to perform fine-grained analysis of the code, which enables us
a shortcut to achieve short term development gain at the cost                 to quantify interest of the debt. In this paper, the interest refers
of increased maintenance effort in the future. The concept of                 to the additional difficulty in repaying the debt.
debt, in particular, the cost of debt has not been widely studied.
Therefore, the goal of this paper is to determine ways to measure                We first propose the use of code metrics, in particular the
the ‘interest’ on the debt and use these measures to see how much             well-known Lines of Code (LOC) and Fan-In, to measure
of the technical debt incurs positive interest, i.e., debt that indeed        interest. We use LOC since it highly correlates with most code
costs more to pay off in the future. To measure interest, we use              complexity metrics and Fan-In1 since it allows us to measure
the LOC and Fan-In measures. We perform a case study on the
                                                                              how much a piece of code is depended on by other code.
Apache JMeter project and find that approximately 42 - 44% of
the technical debt incurs positive interest.                                  Then, we use the developed measure to determine how much
                                                                              of the SATD incurs positive interest. In a case study on the
                       I. I NTRODUCTION                                       Apache JMeter project, we find that using LOC, 44.2% and
   Technical debt was first coined by Cunningham in 1992 to                   using Fan-In 42.2% of the SATD in JMeter incurs positive
refer to the phenomena of taking a shortcut to achieve short                  interest.
term development gain at the cost of increased maintenance                       The rest of the paper is organized as follows; Section II
effort in the future [3]. The technical debt community, orga-                 introduces our approach to quantify interest of SATD. Section
nized through the managing technical debt workshop [1], has                   III describes a preliminary study using the developed measure.
studied many aspects of technical debt, including its detection               Finally, Section IV draws conclusions and our future work.
[12], impact [11] and the appearance of technical debt in
the form of code smells [4]. Most recently, we developed                                                   II. A PPROACH
an approach to identify technical debt from code comments,
referred to as self-admitted technical debt (SATD). SATD                        To perform our study, we need to determine the SATD in the
refers to the situation where developers know that the current                codebase, locate when the SATD was introduced in the project
implementation is not optimal and write comments alerting                     and when it was later removed. Then, we use our measures
the inadequacy of the solution.                                               of interest, i.e., LOC and Fan-In, to compare the size of the
   In the last few years, an increasing amount of work has                    code and the amount of dependence other code had on the TD
focused on SATD. In particular, our prior work focused on the                 code in order to quantify interest (Figure 1).
detection of SATD [6] and the classification of different types               1. SATD Extraction. In order to measure interest of the
of SATD and the development of datasets to enable future                      TD, our first step is to identify where it exists. Since we
studies on SATD [5]. Other work by Bavota and Russo [2]                       focus particularly on SATD, we use code comments found
performed an empirical study of SATD on a large number                        in the source code. We extract and parse the source code
of Apache projects showed that SATD is prevalent in open                      of JMeter version 2.10. To perform the parsing, we use the
source projects, is long lived and is increasing over time. A                 JD EODORANT tool [9], which allows us to extract a comment
study by Wehaibi et al. [10] examined the impact of SATD                      and map it to its corresponding method. Then, we apply a
on quality and found that SATD does not necessarily relate                    series of filters to remove irrelevant comments, e.g., copyright-
to more defects, however, it does make the software system                    related comments. Finally, the 2nd author2 manually classified
more complex.                                                                 all comments to determine if they are SATD comments or not
   Although the metaphor of technical debt has been well                      and mapped these comments to their respective methods. In
studied, to the best of our knowledge, the cost of debt/interest              this study, we assume that SATD exists in the method where
has not been extensively studied. Measuring the interest of
the technical debt is one of the challenges in the field, since                 1 U NDERSTAND calculates Fan-In as the number of inputs a function uses

it requires for the detection of the technical debt, the tracking             plus the number of unique subprograms calling the function [7].
                                                                                2 The 2nd author who made the classification has more than 8 years of
of the debt over time and the development of measures to
                                                                              experience working in the industry as a software engineer, during this time he
accurately quantify this debt. Given that SATD allows us to                   designed, implemented and maintained several programs using, in particular
know the exact method the technical debt exists in, we are able               the Java programming language.


                                                                         68
                             1st International Workshop on Technical Debt Analytics (TDA 2016)


                                                                                                              File                  Interest (%)
                                              Identifying          Determining
                           SATD                                                                               JsseSSLManager.java           10.5
     Source                                      SATD              Metrics that             Calculating
                         Extraction                                                                           MonitorGraph.java              2.1
      Code                                   Introduction           Measure                  Interest
                       (Details in [7] )                                                                      ProxyControl.java               8.3
    Repository                               and Removal             Interest
                                                                                                              SmtpPanel.java                  2.1


                                                    Fig. 1. The overview of our approach.


the comment is identified. Details regarding the dataset and                method where the SATD exists, the relative size is 100 (i.e,
the filtering applied can be found in our earlier work [5].                 100 ⇤ (201010) ). In cases where the SATD is not yet removed,
2. Identifying SATD Introduction and Removal. Since we                      we use the numbers from the latest version of JMeter. Our
are interested in measuring the interest, we need to determine              assumption here is that if the SATD incurs positive interest,
the ‘change’ over time in these SATD methods. For each                      then it will be more difficult to remove in the future, e.g., if
of the SATD comments identified by us, we use several                       the code becomes more complex compared to when the debt
git commands (e.g., git log -- <PATH_TO_FILE> and                           was taken, then it will be more difficult to deal with.
git cat-file <SHA1>:<PATH_TO_FILE>), to trace a                                While the paper tackles the research topic that accelerates a
comment back to the commit where it was introduced. We                      new research direction (i.e., quantifying interest of SATD), it
perform this task by replaying the history commit-by-commit.                also has the weakness of our current approach. We elaborate
Using the same technique, we are also able to detect the                    on the weakness of our current approach in Section IV.
removal of SATD. We detect the removal of SATD when we                                        III. I NITIAL C ASE S TUDY
find that the commit is removed or changed.
3. Determining Metrics that Measure Interest. Once we                       Motivation. There exist several previous studies that fo-
are able to determine the SATD comments and their associated                cused on understanding SATD (e.g., the detection of technical
methods, we would like to calculate the interest that is incurred           debt [6, 12] and the impact of SATD on software quality [10]).
over time (i.e., from the introduction of the technical debt to             However, to the best of our knowledge, there are no studies
its removal). To do so, we extracted 16 code metrics using                  that help in the quantification of SATD interest. Therefore, we
the U NDERSTAND T OOL [8]. In particular, we selected all                   would like to know how we can measure interest and if SATD
method-level complexity and size metrics that Understand is                 actually incurs positive interest.
able to provide.                                                            Datasets. To conduct our initial case study, we use data from
   The reasons that we focused on complexity and size metrics               the Apache JMeter open source. We use JMeter since we have
are: 1) our intuition tells us that if a piece of code is introduced        used this dataset in the past [5, 6], and know that it cotains
and then becomes more complex, then that is a good proxy                    instances of SATD and uses Git as the version control system,
for it being more difficult to deal with in the future, i.e., it            which many of our tools are designed to work on. In particular,
incurred interest; and 2) prior work has shown that size metrics            we use release v2.10 of JMeter, which contains 81,307 SLOC
are typically highly correlated with complexity metrics, hence,             in 1,181 classes, contains 20,084 comments, and has 33 unique
we figured using size metrics (if they are highly correlated with           contributors.
complexity metrics in our case) would be an easier alternative              Approach. To calculate interest of SATD, we follow the
to using complexity metrics.                                                approach we explained in Section II. We show the number of
   We measured the Spearman correlation between the com-                    SATD, the percentage of the technical debt that has positive
plexity and size metrics and found that indeed all metrics                  interest, and the distribution of interest for technical debt that
except Fan-In are highly correlated with LOC. Therefore, we                 incurs an positive interest rate.
decided to use the LOC metric as a measure of interest. In                  Results. We find that there is a high correlation between
addition, since Fan-In is an indicator of how much a method                 LOC and the other product metrics, except Fan-In. From the
is depended on, we decided to also include the Fan-In metric                highly correlated metrics, we selected LOC as the metric
when calculating interest. The intuition being that if a method             to calculate interest, since intuitively it is easier to measure
is depended on lightly when the SATD is introduced and then                 and comprehend. Therefore, we settled on using two product
has many more dependencies in the future, then dealing with                 metrics (i.e., LOC and Fan-In) to measure interest.
this SATD is much more difficult (since many dependencies                      Table I shows the number of SATD and the percentage of
may be affected). In the end, we settled on using the two                   the technical debt that has positive interest in all technical
metrics, LOC and Fan-In, as measures of interest.                           debt. The table shows that 44.2% of technical debt incurs a
4. Calculating Interest. Using our metrics, we consider the                 positive interest rate in terms of LOC and 42.2% of the SATD
relative LOC and Fan-In values between the introduced and                   has it in terms of Fan-In. We can see that in some cases, there
removed versions as interest. We calculate the interest per                 can be negative interest (13.8% using LOC and 8.1% using
SATD instance. For example, if arbitrary metric values in                   Fan-In), where the SATD method gets smaller or have less
the introduced and removed versions are 10 and 20 in the                    Fan-In after the introduction of the SATD. There are also cases


                                                                       69
                                              1st International Workshop on Technical Debt Analytics (TDA 2016)


                  0.04                                                                                                0.04


                  0.03                                                                                                0.03
        density


                                                                                                            density
                  0.02                                                                                                0.02


                  0.01                                                                                                0.01


                  0.00                                                                                                0.00

                         0               25                  50                    75            100                         0        25               50         75      100
                                                         Interest (%)                                                                              Interest (%)
                                                (a) JMeter (LOC)                                                                           (b) JMeter (Fan-In)

                                         Fig. 2. Distribution of SATD Interest in JMeter. Interest is Measured Using LOC and Fan-In.


                           TABLE I                                                                          The results of our initial case study using the Apache JMeter
 T HE P ERCENTAGE OF SATD THAT HAS P OSITIVE , N EGATIVE AND N O                                            project show that 44.2% of technical debt has a positive rate
                      C HANGE IN I NTEREST
                                                                                                            in terms of LOC and 42.2% of technical debt has it in terms
                             # SATD           Positive     Negative           No Change                     of Fan-In.
                             instances
       LOC                   181               44.2%              13.8%                 42.0%               Future work. This paper only shows an early idea to quantity
      Fan-In                 161               42.2%               8.1%                 49.7%               the interest of SATD. Therefore, there remain many challenges
                                                                                                            to address in the future.
                             TABLE II                                                                                   • To calculate interest, we use the relative size of metric
       S TATISTICS OF THE SATD THAT INCURS A POSITIVE RATE
                                                                                                                          values between two versions of SATD-introduction and
                             Min.   1st Qu.         Median              3rd Qu.          Max.                             removal. However, the period, in time, is not considered
       LOC                    1.6        6.9          18.0                  50.0        6667.0                            to calculate the interest. Therefore, in the future we would
      Fan-In                  5.6       12.4          25.0                  50.0         900.0
                                                                                                                          like to take the period into account when calculating the
                                                                                                                          interest.
                                                                                                                        • There are several type of SATD, such as defect and design
where nothing changes in terms of LOC and Fan-In between
the SATD introduction and removal. Lastly, it is important to                                                             SATD. The previous study [5] shows that the percentage
note that there is not large difference between in the amount                                                             of SATD varies depending on the type of technical debt
of positive and no change interest rates using LOC and Fan-In.                                                            and the studied systems. For example, the projects that
   Next, we would like to know how high is the positive                                                                   have limited time to develop features are likely to leave
interest rate. This analysis provides us with more insight about                                                          comments of features that need to be implemented in the
the SATD that incurs a positive interest rate. Table II and                                                               future. To better understand the interest, in the future we
Figure 2 show that the distribution of interest for the SATD                                                              would like to analyze the interest per type of SATD.
                                                                                                                        • The interest varies among technical debt. If we can under-
that incurs a positive rate. We note that we limit the x-axis of
Figure 2 to 100% for readability. We see from Figure 2, that                                                              stand the reason why some of SATD has larger interest,
the distributions are left-skewed, indicating that the majority                                                           we can make use of such insights for future development.
of the SATD ranges between 6.9-12.4 and 50.0 in terms of                                                                  Therefore, we would like to manually investigate why
LOC and Fan-In. Our findings clearly indicate that there is                                                               some of the SATD has larger interest.
                                                                                                                        • Generally speaking, software systems are always evolving
SATD that incurs a positive interest rate and different types
of SATD have different values of interest, which shows that                                                               over time for implementing new functionality and fixing
we should be prioritizing SATD based on its interest, i.e., all                                                           defects. Therefore, even if the size of the SATD method
SATD is not equal.                                                                                                        increases, it is not clear how best to evaluate the effects
                                                                                                                          on the interest of SATD. We would like to compare
 44.2% of technical debt incurs a positive rate in terms                                                                  the impact of software evolution on methods in the
 of LOC and 42.2% of technical debt incurs it in terms of                                                                 two groups, SATD v.s. non-SATD, to draw a relative
 Fan-In.                                                                                                                  comparison that controls for general evolution.

                                    IV. C ONCLUSION
                                                                                                                                            ACKNOWLEDGMENT
   In this paper, we introduced an approach to quantify inter-
est of SATD. Our proposed approach uses software product                                                      This research was partially supported by JSPS KAKENHI
metrics to lead to measure the interest from software projects.                                             Grant Numbers 15H05306.


                                                                                                       70
                          1st International Workshop on Technical Debt Analytics (TDA 2016)


                       R EFERENCES                                      [7] Scientific Toolworks, Inc. FANIN - Understand 2.6.
                                                                            https://scitools.com/support/metrics list/?metricGroup=
[1] International workshop on managing technical debt                       count.
    (MTD).       https://www.sei.cmu.edu/community/td2016/.             [8] Scientific Toolworks, Inc. Understand 2.6. http://www.
    Accessed: 2016-10-16.                                                   scitools.com/.
[2] G. Bavota and B. Russo. A large-scale empirical study               [9] N. Tsantalis, T. Chaikalis, and A. Chatzigeorgiou.
    on self-admitted technical debt. In Proc. Int’l Conf.                   Jdeodorant: Identification and removal of type-checking
    on Mining Software Repositories (MSR), pages 315–326,                   bad smells. In Proc. European Conf. on Software
    2016.                                                                   Maintenance and Reengineering (CSMR), pages 329–
[3] W. Cunningham. The WyCash portfolio management                          331, 2008.
    system. In Addendum to the Proc. on Object-oriented                [10] S. Wehaibi, E. Shihab, and L. Guerrouj. Examining
    Programming Systems, Languages, and Applications,                       the impact of self-admitted technical debt on software
    pages 29–30, 1992.                                                      quality. In Proc. of the Int’l Conference on Software
[4] F. Fontana, V. Ferme, and S. Spinelli. Investigating the                Analysis, Evolution, and Reengineering (SANER), pages
    impact of code smells debt on quality code evaluation.                  179–188, 2016.
    In Proc. of the Int’l Workshop on Managing Technical               [11] N. Zazworka, M. A. Shaw, F. Shull, and C. Seaman.
    Debt (MTD), pages 15–22, 2012.                                          Investigating the impact of design debt on software
[5] E. D. S. Maldonado and E. Shihab. Detecting and quan-                   quality. In Proc. of the Int’l Workshop on Managing
    tifying different types of self-admitted technical debt. In             Technical Debt (MTD), pages 17–23, 2011.
    Proc. of the Int’l Workshop on Managing Technical Debt             [12] N. Zazworka, R. O. Spı́nola, A. Vetro’, F. Shull, and
    (MTD), pages 9–15, 2015.                                                C. Seaman. A case study on effectively identifying
[6] A. Potdar and E. Shihab. An exploratory study on self-                  technical debt. In Proc. of the Int’l Conf. on Evaluation
    admitted technical debt. In Proc. of the Int’l Conf. on                 and Assessment in Software Engineering (EASE), pages
    Software Maintenance and Evolution (ICSME), pages                       42–47, 2013.
    91–100, 2014.


                                                                  71

</pre>