=Paper=
{{Paper
|id=Vol-1771/paper11
|storemode=property
|title=Using Analytics to Quantify Interest of Self-Admitted Technical Debt
|pdfUrl=https://ceur-ws.org/Vol-1771/paper11.pdf
|volume=Vol-1771
|authors=Yasutaka Kamei,Everton Maldonado,Emad Shihab,Naoyasu Ubayashi
|dblpUrl=https://dblp.org/rec/conf/apsec/KameiMSU16
}}
==Using Analytics to Quantify Interest of Self-Admitted Technical Debt==
1st International Workshop on Technical Debt Analytics (TDA 2016) Using Analytics to Quantify the Interest of Self-Admitted Technical Debt Yasutaka Kamei† , Everton Maldonado†† , Emad Shihab†† , and Naoyasu Ubayashi† † Principles of Software Languages Group (POSL), Kyushu University, Fukuoka, Japan †† Department of Computer Science and Software Engineering, Concordia University, Montréal, Canada Email: † {kamei, ubayashi}@ait.kyushu-u.ac.jp, †† {e silvam, eshihab}@encs.concordia.ca Abstract—Technical debt refers to the phenomena of taking to perform fine-grained analysis of the code, which enables us a shortcut to achieve short term development gain at the cost to quantify interest of the debt. In this paper, the interest refers of increased maintenance effort in the future. The concept of to the additional difficulty in repaying the debt. debt, in particular, the cost of debt has not been widely studied. Therefore, the goal of this paper is to determine ways to measure We first propose the use of code metrics, in particular the the ‘interest’ on the debt and use these measures to see how much well-known Lines of Code (LOC) and Fan-In, to measure of the technical debt incurs positive interest, i.e., debt that indeed interest. We use LOC since it highly correlates with most code costs more to pay off in the future. To measure interest, we use complexity metrics and Fan-In1 since it allows us to measure the LOC and Fan-In measures. We perform a case study on the how much a piece of code is depended on by other code. Apache JMeter project and find that approximately 42 - 44% of the technical debt incurs positive interest. Then, we use the developed measure to determine how much of the SATD incurs positive interest. In a case study on the I. I NTRODUCTION Apache JMeter project, we find that using LOC, 44.2% and Technical debt was first coined by Cunningham in 1992 to using Fan-In 42.2% of the SATD in JMeter incurs positive refer to the phenomena of taking a shortcut to achieve short interest. term development gain at the cost of increased maintenance The rest of the paper is organized as follows; Section II effort in the future [3]. The technical debt community, orga- introduces our approach to quantify interest of SATD. Section nized through the managing technical debt workshop [1], has III describes a preliminary study using the developed measure. studied many aspects of technical debt, including its detection Finally, Section IV draws conclusions and our future work. [12], impact [11] and the appearance of technical debt in the form of code smells [4]. Most recently, we developed II. A PPROACH an approach to identify technical debt from code comments, referred to as self-admitted technical debt (SATD). SATD To perform our study, we need to determine the SATD in the refers to the situation where developers know that the current codebase, locate when the SATD was introduced in the project implementation is not optimal and write comments alerting and when it was later removed. Then, we use our measures the inadequacy of the solution. of interest, i.e., LOC and Fan-In, to compare the size of the In the last few years, an increasing amount of work has code and the amount of dependence other code had on the TD focused on SATD. In particular, our prior work focused on the code in order to quantify interest (Figure 1). detection of SATD [6] and the classification of different types 1. SATD Extraction. In order to measure interest of the of SATD and the development of datasets to enable future TD, our first step is to identify where it exists. Since we studies on SATD [5]. Other work by Bavota and Russo [2] focus particularly on SATD, we use code comments found performed an empirical study of SATD on a large number in the source code. We extract and parse the source code of Apache projects showed that SATD is prevalent in open of JMeter version 2.10. To perform the parsing, we use the source projects, is long lived and is increasing over time. A JD EODORANT tool [9], which allows us to extract a comment study by Wehaibi et al. [10] examined the impact of SATD and map it to its corresponding method. Then, we apply a on quality and found that SATD does not necessarily relate series of filters to remove irrelevant comments, e.g., copyright- to more defects, however, it does make the software system related comments. Finally, the 2nd author2 manually classified more complex. all comments to determine if they are SATD comments or not Although the metaphor of technical debt has been well and mapped these comments to their respective methods. In studied, to the best of our knowledge, the cost of debt/interest this study, we assume that SATD exists in the method where has not been extensively studied. Measuring the interest of the technical debt is one of the challenges in the field, since 1 U NDERSTAND calculates Fan-In as the number of inputs a function uses it requires for the detection of the technical debt, the tracking plus the number of unique subprograms calling the function [7]. 2 The 2nd author who made the classification has more than 8 years of of the debt over time and the development of measures to experience working in the industry as a software engineer, during this time he accurately quantify this debt. Given that SATD allows us to designed, implemented and maintained several programs using, in particular know the exact method the technical debt exists in, we are able the Java programming language. 68 1st International Workshop on Technical Debt Analytics (TDA 2016) File Interest (%) Identifying Determining SATD JsseSSLManager.java 10.5 Source SATD Metrics that Calculating Extraction MonitorGraph.java 2.1 Code Introduction Measure Interest (Details in [7] ) ProxyControl.java 8.3 Repository and Removal Interest SmtpPanel.java 2.1 Fig. 1. The overview of our approach. the comment is identified. Details regarding the dataset and method where the SATD exists, the relative size is 100 (i.e, the filtering applied can be found in our earlier work [5]. 100 ⇤ (201010) ). In cases where the SATD is not yet removed, 2. Identifying SATD Introduction and Removal. Since we we use the numbers from the latest version of JMeter. Our are interested in measuring the interest, we need to determine assumption here is that if the SATD incurs positive interest, the ‘change’ over time in these SATD methods. For each then it will be more difficult to remove in the future, e.g., if of the SATD comments identified by us, we use several the code becomes more complex compared to when the debt git commands (e.g., git log --and was taken, then it will be more difficult to deal with. git cat-file : ), to trace a While the paper tackles the research topic that accelerates a comment back to the commit where it was introduced. We new research direction (i.e., quantifying interest of SATD), it perform this task by replaying the history commit-by-commit. also has the weakness of our current approach. We elaborate Using the same technique, we are also able to detect the on the weakness of our current approach in Section IV. removal of SATD. We detect the removal of SATD when we III. I NITIAL C ASE S TUDY find that the commit is removed or changed. 3. Determining Metrics that Measure Interest. Once we Motivation. There exist several previous studies that fo- are able to determine the SATD comments and their associated cused on understanding SATD (e.g., the detection of technical methods, we would like to calculate the interest that is incurred debt [6, 12] and the impact of SATD on software quality [10]). over time (i.e., from the introduction of the technical debt to However, to the best of our knowledge, there are no studies its removal). To do so, we extracted 16 code metrics using that help in the quantification of SATD interest. Therefore, we the U NDERSTAND T OOL [8]. In particular, we selected all would like to know how we can measure interest and if SATD method-level complexity and size metrics that Understand is actually incurs positive interest. able to provide. Datasets. To conduct our initial case study, we use data from The reasons that we focused on complexity and size metrics the Apache JMeter open source. We use JMeter since we have are: 1) our intuition tells us that if a piece of code is introduced used this dataset in the past [5, 6], and know that it cotains and then becomes more complex, then that is a good proxy instances of SATD and uses Git as the version control system, for it being more difficult to deal with in the future, i.e., it which many of our tools are designed to work on. In particular, incurred interest; and 2) prior work has shown that size metrics we use release v2.10 of JMeter, which contains 81,307 SLOC are typically highly correlated with complexity metrics, hence, in 1,181 classes, contains 20,084 comments, and has 33 unique we figured using size metrics (if they are highly correlated with contributors. complexity metrics in our case) would be an easier alternative Approach. To calculate interest of SATD, we follow the to using complexity metrics. approach we explained in Section II. We show the number of We measured the Spearman correlation between the com- SATD, the percentage of the technical debt that has positive plexity and size metrics and found that indeed all metrics interest, and the distribution of interest for technical debt that except Fan-In are highly correlated with LOC. Therefore, we incurs an positive interest rate. decided to use the LOC metric as a measure of interest. In Results. We find that there is a high correlation between addition, since Fan-In is an indicator of how much a method LOC and the other product metrics, except Fan-In. From the is depended on, we decided to also include the Fan-In metric highly correlated metrics, we selected LOC as the metric when calculating interest. The intuition being that if a method to calculate interest, since intuitively it is easier to measure is depended on lightly when the SATD is introduced and then and comprehend. Therefore, we settled on using two product has many more dependencies in the future, then dealing with metrics (i.e., LOC and Fan-In) to measure interest. this SATD is much more difficult (since many dependencies Table I shows the number of SATD and the percentage of may be affected). In the end, we settled on using the two the technical debt that has positive interest in all technical metrics, LOC and Fan-In, as measures of interest. debt. The table shows that 44.2% of technical debt incurs a 4. Calculating Interest. Using our metrics, we consider the positive interest rate in terms of LOC and 42.2% of the SATD relative LOC and Fan-In values between the introduced and has it in terms of Fan-In. We can see that in some cases, there removed versions as interest. We calculate the interest per can be negative interest (13.8% using LOC and 8.1% using SATD instance. For example, if arbitrary metric values in Fan-In), where the SATD method gets smaller or have less the introduced and removed versions are 10 and 20 in the Fan-In after the introduction of the SATD. There are also cases 69 1st International Workshop on Technical Debt Analytics (TDA 2016) 0.04 0.04 0.03 0.03 density density 0.02 0.02 0.01 0.01 0.00 0.00 0 25 50 75 100 0 25 50 75 100 Interest (%) Interest (%) (a) JMeter (LOC) (b) JMeter (Fan-In) Fig. 2. Distribution of SATD Interest in JMeter. Interest is Measured Using LOC and Fan-In. TABLE I The results of our initial case study using the Apache JMeter T HE P ERCENTAGE OF SATD THAT HAS P OSITIVE , N EGATIVE AND N O project show that 44.2% of technical debt has a positive rate C HANGE IN I NTEREST in terms of LOC and 42.2% of technical debt has it in terms # SATD Positive Negative No Change of Fan-In. instances LOC 181 44.2% 13.8% 42.0% Future work. This paper only shows an early idea to quantity Fan-In 161 42.2% 8.1% 49.7% the interest of SATD. Therefore, there remain many challenges to address in the future. TABLE II • To calculate interest, we use the relative size of metric S TATISTICS OF THE SATD THAT INCURS A POSITIVE RATE values between two versions of SATD-introduction and Min. 1st Qu. Median 3rd Qu. Max. removal. However, the period, in time, is not considered LOC 1.6 6.9 18.0 50.0 6667.0 to calculate the interest. Therefore, in the future we would Fan-In 5.6 12.4 25.0 50.0 900.0 like to take the period into account when calculating the interest. • There are several type of SATD, such as defect and design where nothing changes in terms of LOC and Fan-In between the SATD introduction and removal. Lastly, it is important to SATD. The previous study [5] shows that the percentage note that there is not large difference between in the amount of SATD varies depending on the type of technical debt of positive and no change interest rates using LOC and Fan-In. and the studied systems. For example, the projects that Next, we would like to know how high is the positive have limited time to develop features are likely to leave interest rate. This analysis provides us with more insight about comments of features that need to be implemented in the the SATD that incurs a positive interest rate. Table II and future. To better understand the interest, in the future we Figure 2 show that the distribution of interest for the SATD would like to analyze the interest per type of SATD. • The interest varies among technical debt. If we can under- that incurs a positive rate. We note that we limit the x-axis of Figure 2 to 100% for readability. We see from Figure 2, that stand the reason why some of SATD has larger interest, the distributions are left-skewed, indicating that the majority we can make use of such insights for future development. of the SATD ranges between 6.9-12.4 and 50.0 in terms of Therefore, we would like to manually investigate why LOC and Fan-In. Our findings clearly indicate that there is some of the SATD has larger interest. • Generally speaking, software systems are always evolving SATD that incurs a positive interest rate and different types of SATD have different values of interest, which shows that over time for implementing new functionality and fixing we should be prioritizing SATD based on its interest, i.e., all defects. Therefore, even if the size of the SATD method SATD is not equal. increases, it is not clear how best to evaluate the effects on the interest of SATD. We would like to compare 44.2% of technical debt incurs a positive rate in terms the impact of software evolution on methods in the of LOC and 42.2% of technical debt incurs it in terms of two groups, SATD v.s. non-SATD, to draw a relative Fan-In. comparison that controls for general evolution. IV. C ONCLUSION ACKNOWLEDGMENT In this paper, we introduced an approach to quantify inter- est of SATD. Our proposed approach uses software product This research was partially supported by JSPS KAKENHI metrics to lead to measure the interest from software projects. Grant Numbers 15H05306. 70 1st International Workshop on Technical Debt Analytics (TDA 2016) R EFERENCES [7] Scientific Toolworks, Inc. FANIN - Understand 2.6. https://scitools.com/support/metrics list/?metricGroup= [1] International workshop on managing technical debt count. (MTD). https://www.sei.cmu.edu/community/td2016/. [8] Scientific Toolworks, Inc. Understand 2.6. http://www. Accessed: 2016-10-16. scitools.com/. [2] G. Bavota and B. Russo. A large-scale empirical study [9] N. Tsantalis, T. Chaikalis, and A. Chatzigeorgiou. on self-admitted technical debt. In Proc. Int’l Conf. Jdeodorant: Identification and removal of type-checking on Mining Software Repositories (MSR), pages 315–326, bad smells. In Proc. European Conf. on Software 2016. Maintenance and Reengineering (CSMR), pages 329– [3] W. Cunningham. The WyCash portfolio management 331, 2008. system. In Addendum to the Proc. on Object-oriented [10] S. Wehaibi, E. Shihab, and L. Guerrouj. Examining Programming Systems, Languages, and Applications, the impact of self-admitted technical debt on software pages 29–30, 1992. quality. In Proc. of the Int’l Conference on Software [4] F. Fontana, V. Ferme, and S. Spinelli. Investigating the Analysis, Evolution, and Reengineering (SANER), pages impact of code smells debt on quality code evaluation. 179–188, 2016. In Proc. of the Int’l Workshop on Managing Technical [11] N. Zazworka, M. A. Shaw, F. Shull, and C. Seaman. Debt (MTD), pages 15–22, 2012. Investigating the impact of design debt on software [5] E. D. S. Maldonado and E. Shihab. Detecting and quan- quality. In Proc. of the Int’l Workshop on Managing tifying different types of self-admitted technical debt. In Technical Debt (MTD), pages 17–23, 2011. Proc. of the Int’l Workshop on Managing Technical Debt [12] N. Zazworka, R. O. Spı́nola, A. Vetro’, F. Shull, and (MTD), pages 9–15, 2015. C. Seaman. A case study on effectively identifying [6] A. Potdar and E. Shihab. An exploratory study on self- technical debt. In Proc. of the Int’l Conf. on Evaluation admitted technical debt. In Proc. of the Int’l Conf. on and Assessment in Software Engineering (EASE), pages Software Maintenance and Evolution (ICSME), pages 42–47, 2013. 91–100, 2014. 71