=Paper=
{{Paper
|id=Vol-2017/paper10
|storemode=property
|title=Detecting Technical Debt through Issue Trackers
|pdfUrl=https://ceur-ws.org/Vol-2017/paper10.pdf
|volume=Vol-2017
|authors=Ke Dai,Philippe Kruchten
|dblpUrl=https://dblp.org/rec/conf/apsec/DaiK17
}}
==Detecting Technical Debt through Issue Trackers==
5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017)
Detecting Technical Debt through Issue Trackers
Ke Dai and Philippe Kruchten
Department of Electrical and Computer Engineering
University of British Columbia
Vancouver, BC, Canada
kedai, pbk@ece.ubc.ca
Abstract—Managing technical debt effectively to prevent it effort to be expended, payment of technical debt, risk evaluation,
from accumulating too quickly is of great concern to software etc. Once technical debt can be identified systematically,
stakeholders. To pay off technical debt regularly, software software development teams would be able to estimate future
developers must be conscious of the existence of technical debt budget, prioritize future tasks, allocate limited resources and
items. The first step is to make technical debt explicit; that is the evaluate potential risks. They could also make informed
identification of technical debt. Although there exist many kinds decisions about when technical debt should be paid off to
of static source code analysis tools to identify code-level technical maximize their profits.
debt, identifying non-code-level technical debt is very challenging
and needs deep exploration. This paper proposed an approach to Due to the importance of identification of technical debt, a
identifying non-code-level technical debt through issue tracking number of studies empirically explored various approaches to
data sets using natural language processing and machine learning detecting technical debt. Some of these researches focused on
techniques and validated the feasibility and performance of this employing source code analysis techniques to detect technical
approach using an issue tracking data set recorded in Chinese debt. Code smells and automatic static analysis (ASA) are two
from a commercial software project. We found that there are most-used source code analysis techniques for the identification
actually some common words that can be used as indicators of of technical debt. Code smells was first introduced by Fowler et
technical debt. Based on these key words, we achieved the al. to describe the violation of object-oriented design principles
precision of 0.72 and the recall of 0.81 for identifying technical
(e.g., abstract, encapsulation and inheritance) [3], whereas ASA
debt items using machine learning techniques respectively.
techniques aim at identifying violations of recommended
Keywords—technical debt; identification; issue tracking data programming practices that might degrade some of software
sets; natural language processing; machine learning quality attributes (e.g., maintainability, efficiency).
Other studies aimed to identify technical debt of large
I. INTRODUCTION
granularity that’s undetectable by source code analysis
Technical debt refers to delayed tasks and immature artifacts techniques, such as architecture and requirement technical debt
that constitute a “debt” because they incur extra costs in the [10] [11] [12] [13]. Compared to code-level technical debt, the
future in the form of increased cost of change during evolution identification of non-code-level technical debt is not studied
and maintenance [1]. An appropriate amount of technical debt sufficiently and the approaches are limited. To our knowledge,
would accelerate the process of software development; however, none of existing approaches can identify all types of technical
too much of it would impede the progress and even abort the debt.
project [2]. Typically, some startup software companies tend to
incur technical debt strategically to speed up the development at As a complement to existing approaches, we try to identify
the early stage of development process for the purpose of non-code-level technical debt through issue trackers. We hope
capturing the market. But with the growth of the size and to acquire developers’ points of view on technical debt and
complexity of the software, it may become increasingly more understand how they communicate technical debt in issue
difficult to maintain and evolve the product due to intertwined trackers since they use issue trackers to record, track, prioritize
dependencies between modules or components without paying various kinds of issues in software projects. Further, developers’
off technical debt regularly. As a result, software stakeholders standpoints of technical debt will in turn help refine our
need to pay off technical debt regularly to prevent it from understanding of technical debt and should be taken into
accumulating too quickly. Different from bugs or defects consideration for an improved definition of technical debt.
existing in a software system, technical debt is invisible as the However, it is difficult and impractical to identify technical
software often works well from users’ perspective and even debt manually through issue trackers due to substantial effort
developers are often unconscious of the existence of technical involved, especially when a large project comprises a large
debt. The invisibility of technical debt increases the risks of rigid number of issues. In this context, we exploited natural language
software design and huge maintenance cost in the future processing (NLP) and machine learning techniques to automate
significantly. Therefore, it is essentially critical for development the process. NLP techniques were applied to extract features
teams to be able to identify technical debt items existing in the from unstructured text data and machine learning techniques
current software system at any point in time as it is the were used to decide whether a certain issue is an instance of
prerequisite to conduct other management activities of technical technical debt or not. We performed an exploratory study on a
debt including measurement of technical debt, estimation of commercial software project to validate the efficacy of our
Partially sponsored by a Canada NSERC grant, a MITACS grant and the
Institute for Computing, Information and Cognitive System (ICICS) at UBC
Copyright © 2017 for this paper by its authors. 59
5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017)
approach to the identification of technical debt through issue technical debt items manually. To our knowledge, our study is
trackers. Experimental results demonstrate that our approach is the first one that applies NLP and machine learning techniques
effective in identifying non-code-level technical debt, especially to detect technical debt through issue trackers.
requirement debt, design debt, and UI debt, which cannot be
detected by source code analysis techniques. B. Mining Issue Tracking Databases
Issue tracking systems are widely used in open source
We address the following questions through this research: projects as well as in software industry to record, triage and track
• RQ1: How do software practitioners communicate different kinds of issues occurred during the lifecycle of
technical debt issues in issue trackers? software: bugs finding, defects fixing, adding new features,
future tasks, requirements updating, etc. They play an important
• RQ2: Are there text patterns that are an indication that role in facilitating software development teams to manage
technical debt exist which can be used to identify potential development and maintenance activities and thus promoting the
technical debt using NLP and machine learning techniques success of software projects. Some researches have focused on
automatically? mining issue tracking databases to retrieve valuable information
The rest of this paper is organized as follows: Section II for improved definition, development management, quality
discusses related work. Section III describes our approach. evaluation, predictive models, etc.
Section IV reports and analyzes experimental results of our Antoniol et al. applied NLP and machine learning techniques
exploratory study. Section V presents the threats to validity. (alternating decision trees, naïve Bayes classifier, and logistic
Finally, we conclude our research and envision future work in regression) to automate the process of distinguishing bugs from
Section VI. other kinds of issues, compared the performance of this
approach with that of using regular expression matching and
II. RELATED WORK concluded machine learning techniques outperforms regular
A. Identification of Technical Debt expression matching in terms of predictive accuracy [14].
Many researches have been done to identify code-level Runeson et al. developed a prototype tool which detects
technical debt. This kind of technical debt can be detected using duplicate defect reports in issue tracking systems using NLP
static program analysis tools based on the measurement of techniques, evaluated the identification capabilities of this
various source code metrics. Marinescu proposed metric-based approach in a case study and concluded that about 2/3 of the
detection strategies to help engineers directly localize classes or duplicates can possibly be found using this approach [15]. Wang
methods affected by the violation of object-oriented design et al., Jalbert and Weimer, Sun et al., Sureka and Jalote
principles and validated the approach on multiple large performed similar research to address the same problem [16]
industrial case studies [4]. Munro et al. refined Marinescu’s [17] [18] [19].
detection strategies by introducing some new metrics and
justification for choosing the metrics and evaluated the Other work focused on concerned aspects of software quality
performance of the approach in identifying two kinds of code attributes, say security. Cois et al. proposed an approach to
smells (lazy class and temporary field) in two case studies [5]. detecting security-related text snippets in issue tracking systems
Olbrich et al. investigated the relationship between two kinds of using NLP and machine learning techniques [20].
code smells (god class and shotgun surgery) and maintenance III. METHOD
cost by analyzing the historical data of two major open source
projects, Apache Lucene and Apache Xerces 2 J [6]. Wong et al. For this research, we cooperated with a local software
proposed a strategy to detect modularity violations and company to access the issue data set of a commercial software
evaluated the approach using Hadoop and Eclipse [7]. Besides, product which have been in development for more than two
some researchers explored identifying technical debt through years rather than just using issue data sets from open source
comments in source code [8] [9]. software projects in order to make the classifier we developed
more adaptable to the style of issue data from commercial
Other researches aimed at exploring approaches to software products. The issues are recorded mainly in Mandarin
identifying other types of technical debt such as architecture with a few English words as the developers of this product are
technical debt. Brondum et al. proposed a modelling approach Chinese.
to visualizing architecture technical debt based on analysis of the
structural code [10]. Li et al. proposed to use two modularity A. Phase 0: Exporting issue data
metrics, Index of Package Changing Impact(ICPI) and Index of We first exported the issue data set and saved it in a
Package Goal Focus(IPGF), as indicators of architecture spreadsheet which makes it easier for researchers to read the
technical debt [11]. Further they proposed an architecture issues and to process the data. The fields or variables of the data
technical debt identification approach based on architecture set we used are id, type, priority, state, summary, description and
decisions and change scenarios [12]. The work closest to ours is label. We also tuned the character coding format so that Chinese
the work by Bellomo et al., where manual examination was characters can be displayed normally and removed issues with
conducted on 1,264 issues in four issue trackers from open messy code to render the data set clean and tidy. Finally, we got
source and government projects and 109 examples of technical 8,149 issues in total. Figure 1 shows an overview of our
debt were identified using a categorization method they approach.
developed [13]. The major difference is that we partially
automated the process of identification while they identified
Copyright © 2017 for this paper by its authors. 60
5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017)
Issue Manual
Export Issue Extract Key Extract Naïve Bayes
Tracking Analysis and
Data Words Features Classification
Database Tagging
Fig. 1. Approach Overview
TABLE I. THE CLASSIFICATION CRITERIA OF ISSUES
TABLE I.
Label Subtype Description
Requirement Change The request for requirement change from the client
Not Technical New Features Tasks to add new functions or introduce new features
Debt
Insufficient Description The description is insufficient to make a decision
Critical Defects Critical functions or features are not implemented correctly
Defect Debt Temporarily tolerable defects that will be fixed in the future
Requirement Debt Requirements are not implemented accurately or implemented partially
Design Debt The violation of good object-oriented design principles such as god class and long method
Technical Debt
Code Debt Bad coding practices such as dead code or no proper comments
UI Debt UI related issues such as inconsistent UI style or ugly UI elements
Architecture Debt Design limitation in architecture level such as the violation of modularity
6. Is it a task to redesign some function or feature as current
B. Phase 1: Tagging issues manually design does not meet or meet the requirement partially?
We tagged each issue or task in the issue data set as technical
debt or not technical debt manually by reading the summary and If yes, we tag this issue as requirement debt.
description based on the following classification criteria: 7. Is it a limitation of design that may pose a threat to the
1. Is it a request for requirement change from the client? performance of the system or to the evolution and maintenance
of the system?
If yes, we definitely tag this issue as not technical debt.
If yes, we tag this issue as design debt.
2. Is it a task to add new functions or introduce new features
to the product? 8. Is it an issue related to bad coding practices such as dead
code and no proper comments?
If yes, we also definitely tag this issue as not technical debt.
If yes, we tag this issue as code debt.
3. The description of the issue is too short or insufficient to
decide whether the issue is a technical debt item. 9. Is it a UI related issue such as inconsistent UI style or
ugly UI elements that degrades user experience?
In this case, we tag this issue as not technical debt.
If yes, we tag this issue as UI debt.
4. Is it a defect that important and critical functions or
features are not implemented correctly? 10. Is it a limitation of design in architecture level that may
exert a negative impact on the performance of the system or on
If yes, we tag this issue as not technical debt. the evolution and maintenance of the system such as the
5. Is it a defect that is not critical from the client’s violation of modularity?
perspective but weakens the performance and capabilities of the If yes, we tag this issue as architecture debt.
system and will be fixed in the future?
The 10 cases listed above are the typical cases we
If yes, we tag this issue as defect debt. encountered when tagging the issues but do not cover all the
Copyright © 2017 for this paper by its authors. 61
5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017)
types of issues existing in the issue tracker. Actually, some words based on our intuition. To make this paper more readable,
issues can be tagged as either technical debt or not technical we only list the meaning of key words instead of original
debt, which, to a large extent, depends on your personal Chinese characters in the below:
understanding of technical debt. Typically, there exist wide
discrepancies regarding whether defects should be viewed as a “at present”, “now”, “current”, “previously”, “in the past”,
“in the future”, “time”, “actually”, “in reality”, “users”,
type of technical debt among researchers and developers. In this
“clients”, “strengthen”, “change”, “modify”, “replace”,
study, we divided defects into two categories: 1. critical defects
“update”, “delete”, “cancel”, “suggest”, “optimize”, “simplify”,
that may cause fatal errors occurring when using the software;
2. tolerable defects that may exert a marginal negative impact on “perfect”, “improve”, “refactor”, “decouple”, “again”, “re-”,
“replant”, “tidy”, “integrate”, “merge”, “adjust”, “extend”,
the use of the software and are not fixed immediately after being
“expect”, “plan”, “management”, “maintenance”, “function”,
detected. We tagged the first type of defects as not technical debt
“requirement”, “design”, “rule”, “theory”, “strategy”,
and the second type of defects as technical debt.
“mechanism”, “algorithm”, “data structure”, “logic”, “code”,
After we finished tagging all the issues, we asked a known “structure”, “architecture”, “style”, “format”, “performance”,
expert in software engineering and technical debt external to our “efficiency”, “sufficiency”, “security”, “compatibility”,
research team to validate the results of our manual classification. “scalability”, “maintainability”, “stability”, “generality”,
The expert classified a random subset of the issues “usability”, “readability”, “real-time”, “limitation”, “more
independently. With respect to discrepancies in the friendly”, “more specialized”, “more accurate”, “problem”,
classification of some issues, we exchanged our respective “configuration”, “priority”, “inconsistent”, “unreasonable”,
points of view about why we classified a certain issue as the “inconvenient”, “convenient”, “not clear”, “inaccurate”, 'not
category to solve our discrepancies. If we did not achieve intuitive', “not pretty”, “incongruous”, “not smooth”,
agreements in the classification of a certain issue, we discussed “inconformity”, “incomplete”, “abnormity”, “defect”, “impact”,
the issue with developers to gain insight into the issue itself and “experience”, “habit”, “operation”, “difficulty”, “delay”, “UI”,
their opinions on the classification. “risk”, “optimize”, “refactor”, “SonarQube”
Finally, we found 331 technical debt issues in total whose There are 114 key words in total, among which 104 words
distribution is shown in Table II. Requirement debt and design are Chinese words and 10 words are English words. As some
debt are the main technical debt types, including 105 and 141 words express the similar or same meaning, we merged these
instances respectively. words. All these words to some extent indicate or imply the
concept of technical debt from different perspectives. To be
TABLE II. THE NUMBER OF DIFFERENT TYPES OF TECHNICAL DEBT specific,
ISSUES
• “at present”, “now”, “current”, “previously”, “in the
Technical Debt Type Number past”, “in the future”, “time”
Requirement Debt 105 These words indicating time concept may imply
accumulation.
Architecture Debt 6
• “strengthen”, “change”, “modify”, “replace”, “update”,
Design Debt 141
“delete”, “cancel”, “optimize”, “simplify”, “perfect”,
Defect Debt 15 “improve”, “refactor”, “decouple”, “again”, “re-”,
“replant”, “tidy”, “integrate”, “merge”, “adjust”,
UI Debt 35 “extend”
Code Debt 20 These words indicate the modification of code, design or
other 9 architecture, or the enhancement of functionality, capability,
performance, efficiency, etc.
C. Phase 2: Extracting key words and phrases • “security”, “compatibility”, “scalability”,
Different from English, Chinese is written without spaces “maintainability”, “stability”, “generality”, “usability”,
between words. So before extracting key words from the “readability”, “real-time”, “limitation”
Chinese texts, we have to convert each text to a word sequence
using a Chinese text segmentation tool. For this research, we These words indicate concerned aspects of software quality
used Jieba (https://github.com/fxsjy/jieba/) [21], a popular open attributes.
source Chinese text segmentation tool, to split Chinese texts into • “inconsistent”, “unreasonable”, “inconvenient”,
a sequence of words. “convenient”, “unclear”, “inaccurate”, 'not intuitive',
After conducting Chinese text segmentation, we extracted “not pretty”, “incongruous”, “not smooth”,
key words using Jieba. Jieba integrated two key word extraction “inconformity”, “incomplete”, “abnormity”, “defect”,
algorithms: TF-IDF and TextRank. We used both of them to “limit”, “impact”, “experience”, “habit”, “operation”,
extract key words for detecting technical debt. We took the “difficulty”, “delay”
union of two sets of key words extracted using these two These words indicate defects or design limitation such as
different algorithms, removed the key words referring to domain inconsistent UI style, unreasonable design, etc.
knowledge from the union set, and finally added some key
Copyright © 2017 for this paper by its authors. 62
5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017)
D. Phase 3: Extracting features ∏𝑛
𝑗=1 𝑝(𝑥𝑗 |𝑐𝑘)
𝑝(𝑐𝑘 | 𝑥1, … , 𝑥𝑛 ) = ∑𝐾 𝑛 𝑝(𝑐𝑘 ).
Once key words were extracted from the issue data set, ℎ=1 ∏𝑗=1 𝑝(𝑥𝑗 |𝑐ℎ )𝑝(𝑐ℎ )
features for text classification can be derived by checking the To perform our experiments, we used a popular natural
presence or absence of each key word in each issue text. Given language toolkit for building Python programs to process human
the set of key words is [“users”, “change”, “modify”, “improve”, language data (NLTK http://www.nltk.org) [22]. We employed
“refactor”, “decouple”, “priority”, “button”, “architecture”, the implementations by NLTK instead of creating a binary
“deploy”, “rules”], consider this issue description: “design Naïve Bayes classifier from scratch.
change: to keep a consistent design with different pages, we are
moving the clear-all-rules button to the front of the deploy rules
table. (Consistent with event page)”. First, we tokenized the text
into a sequence of words and removed stop words (words that Feature Extraction from Text Data
are too common to indicate any semantic meaning for our text = “design change: to keep a consistent design with different
classification). Thus, the text is converted to a string list: pages, we are moving the clear-all-rules button to the front of the
[“design”, “change”, “keep”, “consistent”, “design”, “different”, deploy rules table. (Consistent with event page)”
“pages”, “moving”, “clear-all-rules”, “button”, “front”,
“deploy”, “rules”, “table”]. Then we could check whether this t = tokenize(text) = [“design”, “change”, “keep”, “consistent”,
“design”, “different”, “pages”, “moving”, “clear-all-rules”, “button”,
string list contains each of key words, i.e. [contain(“users”), “front”, “deploy”, “rules”, “table”]
contain(“change”), contain(“modify”), contain(“improve”),
contain(“refactor”), contain(“decouple”), contain(“priority”),
contain(“button”), contain(“architecture”), contain(“deploy”), Feature Space
contain(“rules”)]. This vector checking the presence or absence Feature Vector of t
of each key word is called feature space. The dimension of S=[ V(t) = [
feature space depends on the size of the set of key words. Finally, contain(“users”), false,
we got the feature vector of the issue sample based on the feature contain(“change”), true,
contain(“modify”), false,
space: [false, true, false, false, false, false, false, true, false, true, …
true]. …
contain(“rules”), true,
contain(“design change”),
The feature space actually not only includes unigram true,
contain(“improve unit
features that are a single word like “design”, “decouple”, but test”)
false
also has bigram and trigram features which comprised adjacent ]
]
word pair and triplet respectively, such as “design change” and
“improve unit test”; that is to say, the feature space in the
previous example can be extended to [contain(“users”),
contain(“change”), contain(“modify”), contain(“improve”),
contain(“refactor”), contain(“decouple”), contain(“priority”), Fig. 2. Feature Extraction
contain(“button”), contain(“architecture”), contain(“deploy”),
contain(“rules”), contain(“design change”), contain(“improve IV. EXPERIMENTAL RESULTS AND ANALYSIS
unit test”)]. Then the feature vector is turned into [false, true,
false, false, false, false, false, true, false, true, true, true, false]. Repeated random sub-sampling validation was performed to
Figure 2 shows the process of feature extraction. validate our approach to the identification of technical debt by
repeatedly splitting the full data set into 80/20% randomly
E. Phase 4: Creating a binary Naïve Bayes classifier distributed partitions, training and testing the classifier for each
Naïve Bayes is a simple classification algorithm that is based split, and recording performance results.
on an assumption that the features are conditionally independent RQ1: How do software practitioners communicate
of each other given the category. It determines the category of a technical debt issues in issue trackers?
given sample with n-dimensional features ( 𝑥1 , … , 𝑥𝑛 ) by
calculating the probability that the sample belongs to each We searched for the term “technical debt” and the
category and then assigning the most probable category c to it, corresponding Chinese term in the issue data set and found no
which can be described as: positive results. All the technical debt instances in this issue
tracker were implicitly expressed using other technical debt
𝑐 = arg max 𝑝(𝑐𝑘 | 𝑥1 , … , 𝑥𝑛 ), related words such as redesign, design change, refactor, cleanup,
𝑘∈{1,…,K}
decouple, etc. By means of communication with developers of
where 𝑐𝑘 is the k category, and K is the size of the set of
th
this product, we learned that they did not have strong awareness
categories. Using Bayes’ theorem, the conditional probability of technical debt. Some of them had even never heard about the
𝑝(𝑐𝑘 | 𝑥1, … , 𝑥𝑛 ) can be decomposed as: concept of technical debt although they recognized that they had
𝑝(𝑥1,…,𝑥𝑛 | 𝑐𝑘) much experience in incurring technical debt when we explained
𝑝(𝑐𝑘 | 𝑥1, … , 𝑥𝑛 ) = ∑𝐾 𝑝(𝑐𝑘 ). what is technical debt. To track, prioritize and pay off technical
ℎ=1 𝑝(𝑥1 ,…,𝑥𝑛 | 𝑐ℎ )
debt effectively, we suggested they take technical debt as an
With the conditional independence assumptions, the conditional issue type in the issue tracker to communicate technical debt
probability 𝑝(𝑐𝑘 | 𝑥1 , … , 𝑥𝑛 ) can be transformed into: explicitly.
Copyright © 2017 for this paper by its authors. 63
5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017)
TABLE III. 20 MOST INFORMATIVE FEATURES FOR DETECTING of different types of technical debt. Intuitively, the presence of
TECHNICAL DEBT
“style” and “experience” may indicate UI debt while “simplify”
20 Most Informative Features for Detecting Technical Debt and “efficiency” are more likely to be indicators of design debt.
Likelihood Ratio To evaluate the performance of our classifier, the average
Features (Technical Debt : not Technical precision and recall were calculated for 10 repeated random sub-
Debt) sampling validations. Precision measures the fraction of
协议识别优化(protocol technical debt instances identified by our classifier that were
155.2 : 1.0
identification optimization) = 1
proved to be correct classification. Recall measures the fraction
增强 (strengthen) = 1 128.2 : 1.0 of correctly classified technical debt items out of the total
不方便 (inconvenient) = 1 128.2 : 1.0
number of technical debt issues. In our experiments, the average
precision and recall were 72% and 81% respectively for 10
提高 (improve) = 1 117.4 : 1.0 repeated random sub-sampling validations shown in Table IV.
优化 (optimize) = 1 90.8 : 1.0 V. THREATS TO V ALIDITY
整改 (change or modify) = 1 87.7 : 1.0 There are two main threats to the validity of our study:
threats to internal validity and threats to external validity.
风格 (style) = 1 65.2 : 1.0 Threats to internal validity can be caused by the level of
体验 (experience) = 1 64.4 : 1.0 subjectivity in manual analysis and classification of issues as we
definitely have personal bias in the understanding of issue
改进 (improve) = 1 60.7 : 1.0 description. To counter the threats, we had an expert external to
our research team classify random samplings of the issues and
不容易 (not easy) = 1 47.2 : 1.0
solved our discrepancies by discussion. We also had discussions
改善 (improve) = 1 44.5 : 1.0 with the developers of the product to gain insight into the issues
that we were not sure we classified correctly. Threats to external
效率 (efficiency) = 1 44.5 : 1.0 validity concern the generalization of our findings. We
简化(simplify) = 1 38.2 : 1.0 performed a case study on an issue data set from a commercial
software project. The data set of issues we used may not be
解决方案(strategy) = 1 35.8 : 1.0 representative; that is to say, we cannot guarantee the same
results will be obtained when our approach is applied to other
困难(difficulty) = 1 33.7 : 1.0
commercial software projects. In particular, our approach may
前期(previously) = 1 33.7 : 1.0 not be applicable to those projects for which issue trackers are
not used to record issues.
不美观(not pretty) = 1 33.7 : 1.0
VI. CONCLUSION AND FUTURE WORK
risk = 1 33.7 : 1.0
This paper presents an exploratory study of applying NLP
算法(algorithm) = 1 31.8 : 1.0 and machine learning techniques to identify technical debt
习惯(habit) = 1 31.8 : 1.0 issues through issue trackers. We have demonstrated that we can
automate the process of detecting technical debt issues through
issue trackers and achieve an acceptable performance using NLP
TABLE IV. THE RESULT FOR REPEATED RANDOM S UB-SAMPLING
VALIDATION and machine learning techniques. We found that some common
words in software engineering are directly or indirectly related
Average Average Average F1- to technical debt and these words can be used as features to
Category
Precision Recall score decide whether a certain issue is technical debt or not. We
Technical Debt 0.72 0.81 0.76 believe the performance of our classifier will improve further
when more sophisticated feature extraction and classification
techniques are applied.
RQ2: Are there text patterns that are an indication that
technical debt exist which can be used to identify potential This exploratory study was based on a rather limited data set
technical debt using NLP and machine learning techniques of 8,149 issues. Our approach needs to be validated with issue
automatically? data sets from a wider range of software projects. Furthermore,
we will improve the performance of our classifier by exploring
The experimental results demonstrate that text patterns more sophisticated feature extraction techniques such as
indicating technical debt indeed exist and can be used to identify mapping phrases with regular expressions and extracting
technical debt. In general, technical debt issues are characterized semantically meaningful information based on the context and
by two aspects of properties including rework whether it is code applying other classification techniques such as random forest,
refactoring or feature redesign and accumulation which is SVM, and deep learning. In addition, we will also develop a
implied by some words indicating time such as previously, at multi-classifier to identify technical debt of a specific type.
present, and in the future. 20 most informative features that are
strongly correlated to technical debt are shown in Table III. Each
of these features may contribute differently to the identification
Copyright © 2017 for this paper by its authors. 64
5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017)
ACKNOWLEDGMENT [11] Li, Z., et al., An empirical investigation of modularity metrics for
indicating architectural technical debt, in Proceedings of the 10th
We would like to acknowledge the support of Jean Su, international ACM Sigsoft conference on Quality of software
Steven Zhu, Billy Liu for this research. We also thank Robert architectures. 2014, ACM: Marcq-en-Bareul, France. p. 119-128.
Nord, Rob Fuller, Randy Hsu for their valuable review [12] Li, Z., P. Liang, and P. Avgeriou. Architectural technical debt
comments. This research was partially supported by a Canada identification based on architecture decisions and change scenarios. in
Software Architecture (WICSA), 2015 12th Working IEEE/IFIP
NSERC grant, a MITACS grant and the Institute for Computing, Conference on. 2015. IEEE.
Information and Cognitive System (ICICS) at UBC.
[13] Bellomo, S., et al. Got technical debt? Surfacing elusive technical debt in
issue trackers. in Mining Software Repositories (MSR), 2016 IEEE/ACM
REFERENCES 13th Working Conference on. 2016. IEEE.
[1] Avgeriou, P., et al. Managing Technical Debt in Software Engineering [14] Antoniol, G., et al. Is it a bug or an enhancement?: a text-based approach
(Dagstuhl Seminar 16162). in Dagstuhl Reports. 2016. Schloss Dagstuhl- to classify change requests. in Proceedings of the 2008 conference of the
Leibniz-Zentrum fuer Informatik. center for advanced studies on collaborative research: meeting of minds.
[2] Cunningham, W., The WyCash portfolio management system. SIGPLAN 2008. ACM.
OOPS Mess., 1992. 4(2): p. 29-30. [15] Runeson, P., M. Alexandersson, and O. Nyholm. Detection of duplicate
[3] Fowler, M. and K. Beck, Refactoring: improving the design of existing defect reports using natural language processing. in Proceedings of the
code. 1999: Addison-Wesley Professional. 29th international conference on Software Engineering. 2007. IEEE
[4] Marinescu, R. Detection strategies: Metrics-based rules for detecting Computer Society.
design flaws. in Software Maintenance, 2004. Proceedings. 20th IEEE [16] Wang, X., et al. An approach to detecting duplicate bug reports using
International Conference on. 2004. IEEE. natural language and execution information. in Software Engineering,
[5] Munro, M.J. Product metrics for automatic identification of" bad smell" 2008. ICSE'08. ACM/IEEE 30th International Conference on. 2008.
design problems in java source-code. in Software Metrics, 2005. 11th IEEE.
IEEE International Symposium. 2005. IEEE. [17] Jalbert, N. and W. Weimer. Automated duplicate detection for bug
[6] Olbrich, S., et al. The evolution and impact of code smells: A case study tracking systems. in Dependable Systems and Networks With FTCS and
of two open source systems. in Proceedings of the 2009 3rd international DCC, 2008. DSN 2008. IEEE International Conference on. 2008. IEEE.
symposium on empirical software engineering and measurement. 2009. [18] Sun, C., et al. A discriminative model approach for accurate duplicate bug
IEEE Computer Society. report retrieval. in Proceedings of the 32nd ACM/IEEE International
[7] Wong, S., et al. Detecting software modularity violations. in Proceedings Conference on Software Engineering-Volume 1. 2010. ACM.
of the 33rd International Conference on Software Engineering. 2011. [19] Sureka, A. and P. Jalote. Detecting duplicate bug report using character
ACM. n-gram-based features. in Software Engineering Conference (APSEC),
[8] de Freitas Farias, M.A., et al. A Contextualized Vocabulary Model for 2010 17th Asia Pacific. 2010. IEEE.
identifying technical debt on code comments. in Managing Technical [20] Cois, C.A. and R. Kazman. Natural Language Processing to Quantify
Debt (MTD), 2015 IEEE 7th International Workshop on. 2015. IEEE. Security Effort in the Software Development Lifecycle. in SEKE. 2015.
[9] Maldonado, E., E. Shihab, and N. Tsantalis, Using natural language [21] Sun, J., ‘Jieba’Chinese word segmentation tool. 2012.
processing to automatically detect self-admitted technical debt. IEEE [22] Bird, S. NLTK: the natural language toolkit. in Proceedings of the
Transactions on Software Engineering, 2017. COLING/ACL on Interactive presentation sessions. 2006. Association for
[10] Brondum, J. and L. Zhu, Visualising architectural dependencies, in Computational Linguistics.
Proceedings of the Third International Workshop on Managing Technical
Debt. 2012, IEEE Press: Zurich, Switzerland. p. 7-14.
Copyright © 2017 for this paper by its authors. 65