=Paper=
{{Paper
|id=Vol-2017/paper10
|storemode=property
|title=Detecting Technical Debt through Issue Trackers
|pdfUrl=https://ceur-ws.org/Vol-2017/paper10.pdf
|volume=Vol-2017
|authors=Ke Dai,Philippe Kruchten
|dblpUrl=https://dblp.org/rec/conf/apsec/DaiK17
}}
==Detecting Technical Debt through Issue Trackers==
<pdf width="1500px">https://ceur-ws.org/Vol-2017/paper10.pdf</pdf>
<pre>
                5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017)


       Detecting Technical Debt through Issue Trackers
                                                       Ke Dai and Philippe Kruchten
                                            Department of Electrical and Computer Engineering
                                                     University of British Columbia
                                                        Vancouver, BC, Canada
                                                         kedai, pbk@ece.ubc.ca


    Abstract—Managing technical debt effectively to prevent it                  effort to be expended, payment of technical debt, risk evaluation,
from accumulating too quickly is of great concern to software                   etc. Once technical debt can be identified systematically,
stakeholders. To pay off technical debt regularly, software                     software development teams would be able to estimate future
developers must be conscious of the existence of technical debt                 budget, prioritize future tasks, allocate limited resources and
items. The first step is to make technical debt explicit; that is the           evaluate potential risks. They could also make informed
identification of technical debt. Although there exist many kinds               decisions about when technical debt should be paid off to
of static source code analysis tools to identify code-level technical           maximize their profits.
debt, identifying non-code-level technical debt is very challenging
and needs deep exploration. This paper proposed an approach to                       Due to the importance of identification of technical debt, a
identifying non-code-level technical debt through issue tracking                number of studies empirically explored various approaches to
data sets using natural language processing and machine learning                detecting technical debt. Some of these researches focused on
techniques and validated the feasibility and performance of this                employing source code analysis techniques to detect technical
approach using an issue tracking data set recorded in Chinese                   debt. Code smells and automatic static analysis (ASA) are two
from a commercial software project. We found that there are                     most-used source code analysis techniques for the identification
actually some common words that can be used as indicators of                    of technical debt. Code smells was first introduced by Fowler et
technical debt. Based on these key words, we achieved the                       al. to describe the violation of object-oriented design principles
precision of 0.72 and the recall of 0.81 for identifying technical
                                                                                (e.g., abstract, encapsulation and inheritance) [3], whereas ASA
debt items using machine learning techniques respectively.
                                                                                techniques aim at identifying violations of recommended
    Keywords—technical debt; identification; issue tracking data                programming practices that might degrade some of software
sets; natural language processing; machine learning                             quality attributes (e.g., maintainability, efficiency).
                                                                                    Other studies aimed to identify technical debt of large
                         I. INTRODUCTION
                                                                                granularity that’s undetectable by source code analysis
    Technical debt refers to delayed tasks and immature artifacts               techniques, such as architecture and requirement technical debt
that constitute a “debt” because they incur extra costs in the                  [10] [11] [12] [13]. Compared to code-level technical debt, the
future in the form of increased cost of change during evolution                 identification of non-code-level technical debt is not studied
and maintenance [1]. An appropriate amount of technical debt                    sufficiently and the approaches are limited. To our knowledge,
would accelerate the process of software development; however,                  none of existing approaches can identify all types of technical
too much of it would impede the progress and even abort the                     debt.
project [2]. Typically, some startup software companies tend to
incur technical debt strategically to speed up the development at                   As a complement to existing approaches, we try to identify
the early stage of development process for the purpose of                       non-code-level technical debt through issue trackers. We hope
capturing the market. But with the growth of the size and                       to acquire developers’ points of view on technical debt and
complexity of the software, it may become increasingly more                     understand how they communicate technical debt in issue
difficult to maintain and evolve the product due to intertwined                 trackers since they use issue trackers to record, track, prioritize
dependencies between modules or components without paying                       various kinds of issues in software projects. Further, developers’
off technical debt regularly. As a result, software stakeholders                standpoints of technical debt will in turn help refine our
need to pay off technical debt regularly to prevent it from                     understanding of technical debt and should be taken into
accumulating too quickly. Different from bugs or defects                        consideration for an improved definition of technical debt.
existing in a software system, technical debt is invisible as the                   However, it is difficult and impractical to identify technical
software often works well from users’ perspective and even                      debt manually through issue trackers due to substantial effort
developers are often unconscious of the existence of technical                  involved, especially when a large project comprises a large
debt. The invisibility of technical debt increases the risks of rigid           number of issues. In this context, we exploited natural language
software design and huge maintenance cost in the future                         processing (NLP) and machine learning techniques to automate
significantly. Therefore, it is essentially critical for development            the process. NLP techniques were applied to extract features
teams to be able to identify technical debt items existing in the               from unstructured text data and machine learning techniques
current software system at any point in time as it is the                       were used to decide whether a certain issue is an instance of
prerequisite to conduct other management activities of technical                technical debt or not. We performed an exploratory study on a
debt including measurement of technical debt, estimation of                     commercial software project to validate the efficacy of our
     Partially sponsored by a Canada NSERC grant, a MITACS grant and the
Institute for Computing, Information and Cognitive System (ICICS) at UBC


       Copyright © 2017 for this paper by its authors.                     59
               5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017)

approach to the identification of technical debt through issue             technical debt items manually. To our knowledge, our study is
trackers. Experimental results demonstrate that our approach is            the first one that applies NLP and machine learning techniques
effective in identifying non-code-level technical debt, especially         to detect technical debt through issue trackers.
requirement debt, design debt, and UI debt, which cannot be
detected by source code analysis techniques.                               B. Mining Issue Tracking Databases
                                                                               Issue tracking systems are widely used in open source
   We address the following questions through this research:               projects as well as in software industry to record, triage and track
    •    RQ1: How do software practitioners communicate                    different kinds of issues occurred during the lifecycle of
technical debt issues in issue trackers?                                   software: bugs finding, defects fixing, adding new features,
                                                                           future tasks, requirements updating, etc. They play an important
    •    RQ2: Are there text patterns that are an indication that          role in facilitating software development teams to manage
technical debt exist which can be used to identify potential               development and maintenance activities and thus promoting the
technical debt using NLP and machine learning techniques                   success of software projects. Some researches have focused on
automatically?                                                             mining issue tracking databases to retrieve valuable information
    The rest of this paper is organized as follows: Section II             for improved definition, development management, quality
discusses related work. Section III describes our approach.                evaluation, predictive models, etc.
Section IV reports and analyzes experimental results of our                    Antoniol et al. applied NLP and machine learning techniques
exploratory study. Section V presents the threats to validity.             (alternating decision trees, naïve Bayes classifier, and logistic
Finally, we conclude our research and envision future work in              regression) to automate the process of distinguishing bugs from
Section VI.                                                                other kinds of issues, compared the performance of this
                                                                           approach with that of using regular expression matching and
                      II. RELATED WORK                                     concluded machine learning techniques outperforms regular
A. Identification of Technical Debt                                        expression matching in terms of predictive accuracy [14].
    Many researches have been done to identify code-level                      Runeson et al. developed a prototype tool which detects
technical debt. This kind of technical debt can be detected using          duplicate defect reports in issue tracking systems using NLP
static program analysis tools based on the measurement of                  techniques, evaluated the identification capabilities of this
various source code metrics. Marinescu proposed metric-based               approach in a case study and concluded that about 2/3 of the
detection strategies to help engineers directly localize classes or        duplicates can possibly be found using this approach [15]. Wang
methods affected by the violation of object-oriented design                et al., Jalbert and Weimer, Sun et al., Sureka and Jalote
principles and validated the approach on multiple large                    performed similar research to address the same problem [16]
industrial case studies [4]. Munro et al. refined Marinescu’s              [17] [18] [19].
detection strategies by introducing some new metrics and
justification for choosing the metrics and evaluated the                       Other work focused on concerned aspects of software quality
performance of the approach in identifying two kinds of code               attributes, say security. Cois et al. proposed an approach to
smells (lazy class and temporary field) in two case studies [5].           detecting security-related text snippets in issue tracking systems
Olbrich et al. investigated the relationship between two kinds of          using NLP and machine learning techniques [20].
code smells (god class and shotgun surgery) and maintenance                                         III. METHOD
cost by analyzing the historical data of two major open source
projects, Apache Lucene and Apache Xerces 2 J [6]. Wong et al.                 For this research, we cooperated with a local software
proposed a strategy to detect modularity violations and                    company to access the issue data set of a commercial software
evaluated the approach using Hadoop and Eclipse [7]. Besides,              product which have been in development for more than two
some researchers explored identifying technical debt through               years rather than just using issue data sets from open source
comments in source code [8] [9].                                           software projects in order to make the classifier we developed
                                                                           more adaptable to the style of issue data from commercial
    Other researches aimed at exploring approaches to                      software products. The issues are recorded mainly in Mandarin
identifying other types of technical debt such as architecture             with a few English words as the developers of this product are
technical debt. Brondum et al. proposed a modelling approach               Chinese.
to visualizing architecture technical debt based on analysis of the
structural code [10]. Li et al. proposed to use two modularity             A. Phase 0: Exporting issue data
metrics, Index of Package Changing Impact(ICPI) and Index of                   We first exported the issue data set and saved it in a
Package Goal Focus(IPGF), as indicators of architecture                    spreadsheet which makes it easier for researchers to read the
technical debt [11]. Further they proposed an architecture                 issues and to process the data. The fields or variables of the data
technical debt identification approach based on architecture               set we used are id, type, priority, state, summary, description and
decisions and change scenarios [12]. The work closest to ours is           label. We also tuned the character coding format so that Chinese
the work by Bellomo et al., where manual examination was                   characters can be displayed normally and removed issues with
conducted on 1,264 issues in four issue trackers from open                 messy code to render the data set clean and tidy. Finally, we got
source and government projects and 109 examples of technical               8,149 issues in total. Figure 1 shows an overview of our
debt were identified using a categorization method they                    approach.
developed [13]. The major difference is that we partially
automated the process of identification while they identified


       Copyright © 2017 for this paper by its authors.                60
                5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017)


      Issue                                              Manual
                            Export Issue                                         Extract Key                     Extract                 Naïve Bayes
    Tracking                                           Analysis and
                               Data                                                Words                        Features                 Classification
    Database                                            Tagging


Fig. 1. Approach Overview


                                                   TABLE I.      THE CLASSIFICATION CRITERIA OF ISSUES

                                                                    TABLE I.
        Label                      Subtype                                                           Description

                             Requirement Change                                  The request for requirement change from the client

    Not Technical                New Features                                   Tasks to add new functions or introduce new features
        Debt
                            Insufficient Description                              The description is insufficient to make a decision

                                Critical Defects                             Critical functions or features are not implemented correctly

                                 Defect Debt                                Temporarily tolerable defects that will be fixed in the future

                              Requirement Debt                         Requirements are not implemented accurately or implemented partially

                                 Design Debt                  The violation of good object-oriented design principles such as god class and long method
    Technical Debt
                                  Code Debt                                Bad coding practices such as dead code or no proper comments

                                   UI Debt                               UI related issues such as inconsistent UI style or ugly UI elements

                              Architecture Debt                       Design limitation in architecture level such as the violation of modularity


                                                                                  6. Is it a task to redesign some function or feature as current
B. Phase 1: Tagging issues manually                                            design does not meet or meet the requirement partially?
   We tagged each issue or task in the issue data set as technical
debt or not technical debt manually by reading the summary and                     If yes, we tag this issue as requirement debt.
description based on the following classification criteria:                        7. Is it a limitation of design that may pose a threat to the
   1. Is it a request for requirement change from the client?                  performance of the system or to the evolution and maintenance
                                                                               of the system?
   If yes, we definitely tag this issue as not technical debt.
                                                                                   If yes, we tag this issue as design debt.
    2. Is it a task to add new functions or introduce new features
to the product?                                                                   8. Is it an issue related to bad coding practices such as dead
                                                                               code and no proper comments?
   If yes, we also definitely tag this issue as not technical debt.
                                                                                   If yes, we tag this issue as code debt.
   3. The description of the issue is too short or insufficient to
decide whether the issue is a technical debt item.                                9. Is it a UI related issue such as inconsistent UI style or
                                                                               ugly UI elements that degrades user experience?
   In this case, we tag this issue as not technical debt.
                                                                                   If yes, we tag this issue as UI debt.
    4. Is it a defect that important and critical functions or
features are not implemented correctly?                                            10. Is it a limitation of design in architecture level that may
                                                                               exert a negative impact on the performance of the system or on
   If yes, we tag this issue as not technical debt.                            the evolution and maintenance of the system such as the
    5. Is it a defect that is not critical from the client’s                   violation of modularity?
perspective but weakens the performance and capabilities of the                    If yes, we tag this issue as architecture debt.
system and will be fixed in the future?
                                                                                  The 10 cases listed above are the typical cases we
   If yes, we tag this issue as defect debt.                                   encountered when tagging the issues but do not cover all the


       Copyright © 2017 for this paper by its authors.                    61
               5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017)

types of issues existing in the issue tracker. Actually, some              words based on our intuition. To make this paper more readable,
issues can be tagged as either technical debt or not technical             we only list the meaning of key words instead of original
debt, which, to a large extent, depends on your personal                   Chinese characters in the below:
understanding of technical debt. Typically, there exist wide
discrepancies regarding whether defects should be viewed as a                  “at present”, “now”, “current”, “previously”, “in the past”,
                                                                           “in the future”, “time”, “actually”, “in reality”, “users”,
type of technical debt among researchers and developers. In this
                                                                           “clients”, “strengthen”, “change”, “modify”, “replace”,
study, we divided defects into two categories: 1. critical defects
                                                                           “update”, “delete”, “cancel”, “suggest”, “optimize”, “simplify”,
that may cause fatal errors occurring when using the software;
2. tolerable defects that may exert a marginal negative impact on          “perfect”, “improve”, “refactor”, “decouple”, “again”, “re-”,
                                                                           “replant”, “tidy”, “integrate”, “merge”, “adjust”, “extend”,
the use of the software and are not fixed immediately after being
                                                                           “expect”, “plan”, “management”, “maintenance”, “function”,
detected. We tagged the first type of defects as not technical debt
                                                                           “requirement”, “design”, “rule”, “theory”, “strategy”,
and the second type of defects as technical debt.
                                                                           “mechanism”, “algorithm”, “data structure”, “logic”, “code”,
    After we finished tagging all the issues, we asked a known             “structure”, “architecture”, “style”, “format”, “performance”,
expert in software engineering and technical debt external to our          “efficiency”, “sufficiency”, “security”, “compatibility”,
research team to validate the results of our manual classification.        “scalability”, “maintainability”, “stability”, “generality”,
The expert classified a random subset of the issues                        “usability”, “readability”, “real-time”, “limitation”, “more
independently. With respect to discrepancies in the                        friendly”, “more specialized”, “more accurate”, “problem”,
classification of some issues, we exchanged our respective                 “configuration”, “priority”, “inconsistent”, “unreasonable”,
points of view about why we classified a certain issue as the              “inconvenient”, “convenient”, “not clear”, “inaccurate”, 'not
category to solve our discrepancies. If we did not achieve                 intuitive', “not pretty”, “incongruous”, “not smooth”,
agreements in the classification of a certain issue, we discussed          “inconformity”, “incomplete”, “abnormity”, “defect”, “impact”,
the issue with developers to gain insight into the issue itself and        “experience”, “habit”, “operation”, “difficulty”, “delay”, “UI”,
their opinions on the classification.                                      “risk”, “optimize”, “refactor”, “SonarQube”
    Finally, we found 331 technical debt issues in total whose                 There are 114 key words in total, among which 104 words
distribution is shown in Table II. Requirement debt and design             are Chinese words and 10 words are English words. As some
debt are the main technical debt types, including 105 and 141              words express the similar or same meaning, we merged these
instances respectively.                                                    words. All these words to some extent indicate or imply the
                                                                           concept of technical debt from different perspectives. To be
 TABLE II.        THE NUMBER OF DIFFERENT TYPES OF TECHNICAL DEBT          specific,
                              ISSUES
                                                                              •    “at present”, “now”, “current”, “previously”, “in the
        Technical Debt Type                   Number                               past”, “in the future”, “time”
         Requirement Debt                        105                          These words       indicating   time   concept    may imply
                                                                           accumulation.
          Architecture Debt                       6
                                                                              •    “strengthen”, “change”, “modify”, “replace”, “update”,
             Design Debt                         141
                                                                                   “delete”, “cancel”, “optimize”, “simplify”, “perfect”,
             Defect Debt                         15                                “improve”, “refactor”, “decouple”, “again”, “re-”,
                                                                                   “replant”, “tidy”, “integrate”, “merge”, “adjust”,
               UI Debt                           35                                “extend”
             Code Debt                           20                            These words indicate the modification of code, design or
                other                             9                        architecture, or the enhancement of functionality, capability,
                                                                           performance, efficiency, etc.
C. Phase 2: Extracting key words and phrases                                  •    “security”,        “compatibility”,         “scalability”,
    Different from English, Chinese is written without spaces                      “maintainability”, “stability”, “generality”, “usability”,
between words. So before extracting key words from the                             “readability”, “real-time”, “limitation”
Chinese texts, we have to convert each text to a word sequence
using a Chinese text segmentation tool. For this research, we                  These words indicate concerned aspects of software quality
used Jieba (https://github.com/fxsjy/jieba/) [21], a popular open          attributes.
source Chinese text segmentation tool, to split Chinese texts into            •     “inconsistent”,    “unreasonable”,   “inconvenient”,
a sequence of words.                                                               “convenient”, “unclear”, “inaccurate”, 'not intuitive',
    After conducting Chinese text segmentation, we extracted                       “not pretty”, “incongruous”, “not smooth”,
key words using Jieba. Jieba integrated two key word extraction                    “inconformity”, “incomplete”, “abnormity”, “defect”,
algorithms: TF-IDF and TextRank. We used both of them to                           “limit”, “impact”, “experience”, “habit”, “operation”,
extract key words for detecting technical debt. We took the                        “difficulty”, “delay”
union of two sets of key words extracted using these two                       These words indicate defects or design limitation such as
different algorithms, removed the key words referring to domain            inconsistent UI style, unreasonable design, etc.
knowledge from the union set, and finally added some key


       Copyright © 2017 for this paper by its authors.                62
                5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017)

D. Phase 3: Extracting features                                                                                            ∏𝑛
                                                                                                                            𝑗=1 𝑝(𝑥𝑗 |𝑐𝑘)
                                                                                          𝑝(𝑐𝑘 | 𝑥1, … , 𝑥𝑛 ) = ∑𝐾          𝑛                      𝑝(𝑐𝑘 ).
    Once key words were extracted from the issue data set,                                                             ℎ=1 ∏𝑗=1 𝑝(𝑥𝑗 |𝑐ℎ )𝑝(𝑐ℎ )
features for text classification can be derived by checking the                      To perform our experiments, we used a popular natural
presence or absence of each key word in each issue text. Given                    language toolkit for building Python programs to process human
the set of key words is [“users”, “change”, “modify”, “improve”,                  language data (NLTK http://www.nltk.org) [22]. We employed
“refactor”, “decouple”, “priority”, “button”, “architecture”,                     the implementations by NLTK instead of creating a binary
“deploy”, “rules”], consider this issue description: “design                      Naïve Bayes classifier from scratch.
change: to keep a consistent design with different pages, we are
moving the clear-all-rules button to the front of the deploy rules
table. (Consistent with event page)”. First, we tokenized the text
into a sequence of words and removed stop words (words that                                             Feature Extraction from Text Data
are too common to indicate any semantic meaning for our                               text = “design change: to keep a consistent design with different
classification). Thus, the text is converted to a string list:                        pages, we are moving the clear-all-rules button to the front of the
[“design”, “change”, “keep”, “consistent”, “design”, “different”,                     deploy rules table. (Consistent with event page)”
“pages”, “moving”, “clear-all-rules”, “button”, “front”,
“deploy”, “rules”, “table”]. Then we could check whether this                         t = tokenize(text) = [“design”, “change”, “keep”, “consistent”,
                                                                                      “design”, “different”, “pages”, “moving”, “clear-all-rules”, “button”,
string list contains each of key words, i.e. [contain(“users”),                       “front”, “deploy”, “rules”, “table”]
contain(“change”), contain(“modify”), contain(“improve”),
contain(“refactor”), contain(“decouple”), contain(“priority”),
contain(“button”), contain(“architecture”), contain(“deploy”),                                  Feature Space
contain(“rules”)]. This vector checking the presence or absence                                                                         Feature Vector of t
of each key word is called feature space. The dimension of                             S=[                                               V(t) = [
feature space depends on the size of the set of key words. Finally,                            contain(“users”),                                 false,
we got the feature vector of the issue sample based on the feature                             contain(“change”),                                true,
                                                                                               contain(“modify”),                                false,
space: [false, true, false, false, false, false, false, true, false, true,                               …
true].                                                                                                                                             …
                                                                                               contain(“rules”),                                 true,
                                                                                               contain(“design change”),
    The feature space actually not only includes unigram                                                                                         true,
                                                                                               contain(“improve unit
features that are a single word like “design”, “decouple”, but                                         test”)
                                                                                                                                                 false
also has bigram and trigram features which comprised adjacent                                                                                  ]
                                                                                           ]
word pair and triplet respectively, such as “design change” and
“improve unit test”; that is to say, the feature space in the
previous example can be extended to [contain(“users”),
contain(“change”), contain(“modify”), contain(“improve”),
contain(“refactor”), contain(“decouple”), contain(“priority”),                    Fig. 2. Feature Extraction
contain(“button”), contain(“architecture”), contain(“deploy”),
contain(“rules”), contain(“design change”), contain(“improve                               IV. EXPERIMENTAL RESULTS AND ANALYSIS
unit test”)]. Then the feature vector is turned into [false, true,
false, false, false, false, false, true, false, true, true, true, false].             Repeated random sub-sampling validation was performed to
Figure 2 shows the process of feature extraction.                                 validate our approach to the identification of technical debt by
                                                                                  repeatedly splitting the full data set into 80/20% randomly
E. Phase 4: Creating a binary Naïve Bayes classifier                              distributed partitions, training and testing the classifier for each
    Naïve Bayes is a simple classification algorithm that is based                split, and recording performance results.
on an assumption that the features are conditionally independent                      RQ1: How do software practitioners communicate
of each other given the category. It determines the category of a                     technical debt issues in issue trackers?
given sample with n-dimensional features ( 𝑥1 , … , 𝑥𝑛 ) by
calculating the probability that the sample belongs to each                           We searched for the term “technical debt” and the
category and then assigning the most probable category c to it,                   corresponding Chinese term in the issue data set and found no
which can be described as:                                                        positive results. All the technical debt instances in this issue
                                                                                  tracker were implicitly expressed using other technical debt
                  𝑐 = arg max 𝑝(𝑐𝑘 | 𝑥1 , … , 𝑥𝑛 ),                               related words such as redesign, design change, refactor, cleanup,
                            𝑘∈{1,…,K}
                                                                                  decouple, etc. By means of communication with developers of
where 𝑐𝑘 is the k category, and K is the size of the set of
                    th
                                                                                  this product, we learned that they did not have strong awareness
categories. Using Bayes’ theorem, the conditional probability                     of technical debt. Some of them had even never heard about the
𝑝(𝑐𝑘 | 𝑥1, … , 𝑥𝑛 ) can be decomposed as:                                         concept of technical debt although they recognized that they had
                                     𝑝(𝑥1,…,𝑥𝑛 | 𝑐𝑘)                              much experience in incurring technical debt when we explained
          𝑝(𝑐𝑘 | 𝑥1, … , 𝑥𝑛 ) = ∑𝐾                          𝑝(𝑐𝑘 ).               what is technical debt. To track, prioritize and pay off technical
                                    ℎ=1 𝑝(𝑥1 ,…,𝑥𝑛 | 𝑐ℎ )
                                                                                  debt effectively, we suggested they take technical debt as an
With the conditional independence assumptions, the conditional                    issue type in the issue tracker to communicate technical debt
probability 𝑝(𝑐𝑘 | 𝑥1 , … , 𝑥𝑛 ) can be transformed into:                         explicitly.


       Copyright © 2017 for this paper by its authors.                       63
                  5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017)

   TABLE III.          20 MOST INFORMATIVE FEATURES FOR DETECTING             of different types of technical debt. Intuitively, the presence of
                            TECHNICAL DEBT
                                                                              “style” and “experience” may indicate UI debt while “simplify”
      20 Most Informative Features for Detecting Technical Debt               and “efficiency” are more likely to be indicators of design debt.
                                             Likelihood Ratio                     To evaluate the performance of our classifier, the average
              Features                 (Technical Debt : not Technical        precision and recall were calculated for 10 repeated random sub-
                                                   Debt)                      sampling validations. Precision measures the fraction of
      协议识别优化(protocol                                                         technical debt instances identified by our classifier that were
                                                   155.2 : 1.0
   identification optimization) = 1
                                                                              proved to be correct classification. Recall measures the fraction
       增强 (strengthen) = 1                         128.2 : 1.0                of correctly classified technical debt items out of the total
     不方便 (inconvenient) = 1                        128.2 : 1.0
                                                                              number of technical debt issues. In our experiments, the average
                                                                              precision and recall were 72% and 81% respectively for 10
        提高 (improve) = 1                           117.4 : 1.0                repeated random sub-sampling validations shown in Table IV.
        优化 (optimize) = 1                          90.8 : 1.0                                   V.    THREATS TO V ALIDITY
    整改 (change or modify) = 1                      87.7 : 1.0                     There are two main threats to the validity of our study:
                                                                              threats to internal validity and threats to external validity.
          风格 (style) = 1                           65.2 : 1.0                 Threats to internal validity can be caused by the level of
       体验 (experience) = 1                         64.4 : 1.0                 subjectivity in manual analysis and classification of issues as we
                                                                              definitely have personal bias in the understanding of issue
        改进 (improve) = 1                           60.7 : 1.0                 description. To counter the threats, we had an expert external to
                                                                              our research team classify random samplings of the issues and
       不容易 (not easy) = 1                          47.2 : 1.0
                                                                              solved our discrepancies by discussion. We also had discussions
        改善 (improve) = 1                           44.5 : 1.0                 with the developers of the product to gain insight into the issues
                                                                              that we were not sure we classified correctly. Threats to external
        效率 (efficiency) = 1                        44.5 : 1.0                 validity concern the generalization of our findings. We
         简化(simplify) = 1                          38.2 : 1.0                 performed a case study on an issue data set from a commercial
                                                                              software project. The data set of issues we used may not be
       解决方案(strategy) = 1                          35.8 : 1.0                 representative; that is to say, we cannot guarantee the same
                                                                              results will be obtained when our approach is applied to other
        困难(difficulty) = 1                         33.7 : 1.0
                                                                              commercial software projects. In particular, our approach may
        前期(previously) = 1                         33.7 : 1.0                 not be applicable to those projects for which issue trackers are
                                                                              not used to record issues.
       不美观(not pretty) = 1                         33.7 : 1.0
                                                                                          VI.    CONCLUSION AND FUTURE WORK
                risk = 1                           33.7 : 1.0
                                                                                  This paper presents an exploratory study of applying NLP
        算法(algorithm) = 1                          31.8 : 1.0                 and machine learning techniques to identify technical debt
           习惯(habit) = 1                           31.8 : 1.0                 issues through issue trackers. We have demonstrated that we can
                                                                              automate the process of detecting technical debt issues through
                                                                              issue trackers and achieve an acceptable performance using NLP
  TABLE IV.          THE RESULT FOR REPEATED RANDOM S UB-SAMPLING
                              VALIDATION                                      and machine learning techniques. We found that some common
                                                                              words in software engineering are directly or indirectly related
                           Average       Average           Average F1-        to technical debt and these words can be used as features to
   Category
                           Precision      Recall              score           decide whether a certain issue is technical debt or not. We
 Technical Debt              0.72          0.81                  0.76         believe the performance of our classifier will improve further
                                                                              when more sophisticated feature extraction and classification
                                                                              techniques are applied.
    RQ2: Are there text patterns that are an indication that
    technical debt exist which can be used to identify potential                  This exploratory study was based on a rather limited data set
    technical debt using NLP and machine learning techniques                  of 8,149 issues. Our approach needs to be validated with issue
    automatically?                                                            data sets from a wider range of software projects. Furthermore,
                                                                              we will improve the performance of our classifier by exploring
    The experimental results demonstrate that text patterns                   more sophisticated feature extraction techniques such as
indicating technical debt indeed exist and can be used to identify            mapping phrases with regular expressions and extracting
technical debt. In general, technical debt issues are characterized           semantically meaningful information based on the context and
by two aspects of properties including rework whether it is code              applying other classification techniques such as random forest,
refactoring or feature redesign and accumulation which is                     SVM, and deep learning. In addition, we will also develop a
implied by some words indicating time such as previously, at                  multi-classifier to identify technical debt of a specific type.
present, and in the future. 20 most informative features that are
strongly correlated to technical debt are shown in Table III. Each
of these features may contribute differently to the identification


       Copyright © 2017 for this paper by its authors.                   64
                 5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017)

                         ACKNOWLEDGMENT                                              [11] Li, Z., et al., An empirical investigation of modularity metrics for
                                                                                          indicating architectural technical debt, in Proceedings of the 10th
    We would like to acknowledge the support of Jean Su,                                  international ACM Sigsoft conference on Quality of software
Steven Zhu, Billy Liu for this research. We also thank Robert                             architectures. 2014, ACM: Marcq-en-Bareul, France. p. 119-128.
Nord, Rob Fuller, Randy Hsu for their valuable review                                [12] Li, Z., P. Liang, and P. Avgeriou. Architectural technical debt
comments. This research was partially supported by a Canada                               identification based on architecture decisions and change scenarios. in
                                                                                          Software Architecture (WICSA), 2015 12th Working IEEE/IFIP
NSERC grant, a MITACS grant and the Institute for Computing,                              Conference on. 2015. IEEE.
Information and Cognitive System (ICICS) at UBC.
                                                                                     [13] Bellomo, S., et al. Got technical debt? Surfacing elusive technical debt in
                                                                                          issue trackers. in Mining Software Repositories (MSR), 2016 IEEE/ACM
                              REFERENCES                                                  13th Working Conference on. 2016. IEEE.
[1]  Avgeriou, P., et al. Managing Technical Debt in Software Engineering            [14] Antoniol, G., et al. Is it a bug or an enhancement?: a text-based approach
     (Dagstuhl Seminar 16162). in Dagstuhl Reports. 2016. Schloss Dagstuhl-               to classify change requests. in Proceedings of the 2008 conference of the
     Leibniz-Zentrum fuer Informatik.                                                     center for advanced studies on collaborative research: meeting of minds.
[2] Cunningham, W., The WyCash portfolio management system. SIGPLAN                       2008. ACM.
     OOPS Mess., 1992. 4(2): p. 29-30.                                               [15] Runeson, P., M. Alexandersson, and O. Nyholm. Detection of duplicate
[3] Fowler, M. and K. Beck, Refactoring: improving the design of existing                 defect reports using natural language processing. in Proceedings of the
     code. 1999: Addison-Wesley Professional.                                             29th international conference on Software Engineering. 2007. IEEE
[4] Marinescu, R. Detection strategies: Metrics-based rules for detecting                 Computer Society.
     design flaws. in Software Maintenance, 2004. Proceedings. 20th IEEE             [16] Wang, X., et al. An approach to detecting duplicate bug reports using
     International Conference on. 2004. IEEE.                                             natural language and execution information. in Software Engineering,
[5] Munro, M.J. Product metrics for automatic identification of" bad smell"               2008. ICSE'08. ACM/IEEE 30th International Conference on. 2008.
     design problems in java source-code. in Software Metrics, 2005. 11th                 IEEE.
     IEEE International Symposium. 2005. IEEE.                                       [17] Jalbert, N. and W. Weimer. Automated duplicate detection for bug
[6] Olbrich, S., et al. The evolution and impact of code smells: A case study             tracking systems. in Dependable Systems and Networks With FTCS and
     of two open source systems. in Proceedings of the 2009 3rd international             DCC, 2008. DSN 2008. IEEE International Conference on. 2008. IEEE.
     symposium on empirical software engineering and measurement. 2009.              [18] Sun, C., et al. A discriminative model approach for accurate duplicate bug
     IEEE Computer Society.                                                               report retrieval. in Proceedings of the 32nd ACM/IEEE International
[7] Wong, S., et al. Detecting software modularity violations. in Proceedings             Conference on Software Engineering-Volume 1. 2010. ACM.
     of the 33rd International Conference on Software Engineering. 2011.             [19] Sureka, A. and P. Jalote. Detecting duplicate bug report using character
     ACM.                                                                                 n-gram-based features. in Software Engineering Conference (APSEC),
[8] de Freitas Farias, M.A., et al. A Contextualized Vocabulary Model for                 2010 17th Asia Pacific. 2010. IEEE.
     identifying technical debt on code comments. in Managing Technical              [20] Cois, C.A. and R. Kazman. Natural Language Processing to Quantify
     Debt (MTD), 2015 IEEE 7th International Workshop on. 2015. IEEE.                     Security Effort in the Software Development Lifecycle. in SEKE. 2015.
[9] Maldonado, E., E. Shihab, and N. Tsantalis, Using natural language               [21] Sun, J., ‘Jieba’Chinese word segmentation tool. 2012.
     processing to automatically detect self-admitted technical debt. IEEE           [22] Bird, S. NLTK: the natural language toolkit. in Proceedings of the
     Transactions on Software Engineering, 2017.                                          COLING/ACL on Interactive presentation sessions. 2006. Association for
[10] Brondum, J. and L. Zhu, Visualising architectural dependencies, in                   Computational Linguistics.
     Proceedings of the Third International Workshop on Managing Technical
     Debt. 2012, IEEE Press: Zurich, Switzerland. p. 7-14.


        Copyright © 2017 for this paper by its authors.                         65

</pre>