5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017) Detecting Technical Debt through Issue Trackers Ke Dai and Philippe Kruchten Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada kedai, pbk@ece.ubc.ca Abstract—Managing technical debt effectively to prevent it effort to be expended, payment of technical debt, risk evaluation, from accumulating too quickly is of great concern to software etc. Once technical debt can be identified systematically, stakeholders. To pay off technical debt regularly, software software development teams would be able to estimate future developers must be conscious of the existence of technical debt budget, prioritize future tasks, allocate limited resources and items. The first step is to make technical debt explicit; that is the evaluate potential risks. They could also make informed identification of technical debt. Although there exist many kinds decisions about when technical debt should be paid off to of static source code analysis tools to identify code-level technical maximize their profits. debt, identifying non-code-level technical debt is very challenging and needs deep exploration. This paper proposed an approach to Due to the importance of identification of technical debt, a identifying non-code-level technical debt through issue tracking number of studies empirically explored various approaches to data sets using natural language processing and machine learning detecting technical debt. Some of these researches focused on techniques and validated the feasibility and performance of this employing source code analysis techniques to detect technical approach using an issue tracking data set recorded in Chinese debt. Code smells and automatic static analysis (ASA) are two from a commercial software project. We found that there are most-used source code analysis techniques for the identification actually some common words that can be used as indicators of of technical debt. Code smells was first introduced by Fowler et technical debt. Based on these key words, we achieved the al. to describe the violation of object-oriented design principles precision of 0.72 and the recall of 0.81 for identifying technical (e.g., abstract, encapsulation and inheritance) [3], whereas ASA debt items using machine learning techniques respectively. techniques aim at identifying violations of recommended Keywords—technical debt; identification; issue tracking data programming practices that might degrade some of software sets; natural language processing; machine learning quality attributes (e.g., maintainability, efficiency). Other studies aimed to identify technical debt of large I. INTRODUCTION granularity that’s undetectable by source code analysis Technical debt refers to delayed tasks and immature artifacts techniques, such as architecture and requirement technical debt that constitute a “debt” because they incur extra costs in the [10] [11] [12] [13]. Compared to code-level technical debt, the future in the form of increased cost of change during evolution identification of non-code-level technical debt is not studied and maintenance [1]. An appropriate amount of technical debt sufficiently and the approaches are limited. To our knowledge, would accelerate the process of software development; however, none of existing approaches can identify all types of technical too much of it would impede the progress and even abort the debt. project [2]. Typically, some startup software companies tend to incur technical debt strategically to speed up the development at As a complement to existing approaches, we try to identify the early stage of development process for the purpose of non-code-level technical debt through issue trackers. We hope capturing the market. But with the growth of the size and to acquire developers’ points of view on technical debt and complexity of the software, it may become increasingly more understand how they communicate technical debt in issue difficult to maintain and evolve the product due to intertwined trackers since they use issue trackers to record, track, prioritize dependencies between modules or components without paying various kinds of issues in software projects. Further, developers’ off technical debt regularly. As a result, software stakeholders standpoints of technical debt will in turn help refine our need to pay off technical debt regularly to prevent it from understanding of technical debt and should be taken into accumulating too quickly. Different from bugs or defects consideration for an improved definition of technical debt. existing in a software system, technical debt is invisible as the However, it is difficult and impractical to identify technical software often works well from users’ perspective and even debt manually through issue trackers due to substantial effort developers are often unconscious of the existence of technical involved, especially when a large project comprises a large debt. The invisibility of technical debt increases the risks of rigid number of issues. In this context, we exploited natural language software design and huge maintenance cost in the future processing (NLP) and machine learning techniques to automate significantly. Therefore, it is essentially critical for development the process. NLP techniques were applied to extract features teams to be able to identify technical debt items existing in the from unstructured text data and machine learning techniques current software system at any point in time as it is the were used to decide whether a certain issue is an instance of prerequisite to conduct other management activities of technical technical debt or not. We performed an exploratory study on a debt including measurement of technical debt, estimation of commercial software project to validate the efficacy of our Partially sponsored by a Canada NSERC grant, a MITACS grant and the Institute for Computing, Information and Cognitive System (ICICS) at UBC Copyright © 2017 for this paper by its authors. 59 5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017) approach to the identification of technical debt through issue technical debt items manually. To our knowledge, our study is trackers. Experimental results demonstrate that our approach is the first one that applies NLP and machine learning techniques effective in identifying non-code-level technical debt, especially to detect technical debt through issue trackers. requirement debt, design debt, and UI debt, which cannot be detected by source code analysis techniques. B. Mining Issue Tracking Databases Issue tracking systems are widely used in open source We address the following questions through this research: projects as well as in software industry to record, triage and track • RQ1: How do software practitioners communicate different kinds of issues occurred during the lifecycle of technical debt issues in issue trackers? software: bugs finding, defects fixing, adding new features, future tasks, requirements updating, etc. They play an important • RQ2: Are there text patterns that are an indication that role in facilitating software development teams to manage technical debt exist which can be used to identify potential development and maintenance activities and thus promoting the technical debt using NLP and machine learning techniques success of software projects. Some researches have focused on automatically? mining issue tracking databases to retrieve valuable information The rest of this paper is organized as follows: Section II for improved definition, development management, quality discusses related work. Section III describes our approach. evaluation, predictive models, etc. Section IV reports and analyzes experimental results of our Antoniol et al. applied NLP and machine learning techniques exploratory study. Section V presents the threats to validity. (alternating decision trees, naïve Bayes classifier, and logistic Finally, we conclude our research and envision future work in regression) to automate the process of distinguishing bugs from Section VI. other kinds of issues, compared the performance of this approach with that of using regular expression matching and II. RELATED WORK concluded machine learning techniques outperforms regular A. Identification of Technical Debt expression matching in terms of predictive accuracy [14]. Many researches have been done to identify code-level Runeson et al. developed a prototype tool which detects technical debt. This kind of technical debt can be detected using duplicate defect reports in issue tracking systems using NLP static program analysis tools based on the measurement of techniques, evaluated the identification capabilities of this various source code metrics. Marinescu proposed metric-based approach in a case study and concluded that about 2/3 of the detection strategies to help engineers directly localize classes or duplicates can possibly be found using this approach [15]. Wang methods affected by the violation of object-oriented design et al., Jalbert and Weimer, Sun et al., Sureka and Jalote principles and validated the approach on multiple large performed similar research to address the same problem [16] industrial case studies [4]. Munro et al. refined Marinescu’s [17] [18] [19]. detection strategies by introducing some new metrics and justification for choosing the metrics and evaluated the Other work focused on concerned aspects of software quality performance of the approach in identifying two kinds of code attributes, say security. Cois et al. proposed an approach to smells (lazy class and temporary field) in two case studies [5]. detecting security-related text snippets in issue tracking systems Olbrich et al. investigated the relationship between two kinds of using NLP and machine learning techniques [20]. code smells (god class and shotgun surgery) and maintenance III. METHOD cost by analyzing the historical data of two major open source projects, Apache Lucene and Apache Xerces 2 J [6]. Wong et al. For this research, we cooperated with a local software proposed a strategy to detect modularity violations and company to access the issue data set of a commercial software evaluated the approach using Hadoop and Eclipse [7]. Besides, product which have been in development for more than two some researchers explored identifying technical debt through years rather than just using issue data sets from open source comments in source code [8] [9]. software projects in order to make the classifier we developed more adaptable to the style of issue data from commercial Other researches aimed at exploring approaches to software products. The issues are recorded mainly in Mandarin identifying other types of technical debt such as architecture with a few English words as the developers of this product are technical debt. Brondum et al. proposed a modelling approach Chinese. to visualizing architecture technical debt based on analysis of the structural code [10]. Li et al. proposed to use two modularity A. Phase 0: Exporting issue data metrics, Index of Package Changing Impact(ICPI) and Index of We first exported the issue data set and saved it in a Package Goal Focus(IPGF), as indicators of architecture spreadsheet which makes it easier for researchers to read the technical debt [11]. Further they proposed an architecture issues and to process the data. The fields or variables of the data technical debt identification approach based on architecture set we used are id, type, priority, state, summary, description and decisions and change scenarios [12]. The work closest to ours is label. We also tuned the character coding format so that Chinese the work by Bellomo et al., where manual examination was characters can be displayed normally and removed issues with conducted on 1,264 issues in four issue trackers from open messy code to render the data set clean and tidy. Finally, we got source and government projects and 109 examples of technical 8,149 issues in total. Figure 1 shows an overview of our debt were identified using a categorization method they approach. developed [13]. The major difference is that we partially automated the process of identification while they identified Copyright © 2017 for this paper by its authors. 60 5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017) Issue Manual Export Issue Extract Key Extract Naïve Bayes Tracking Analysis and Data Words Features Classification Database Tagging Fig. 1. Approach Overview TABLE I. THE CLASSIFICATION CRITERIA OF ISSUES TABLE I. Label Subtype Description Requirement Change The request for requirement change from the client Not Technical New Features Tasks to add new functions or introduce new features Debt Insufficient Description The description is insufficient to make a decision Critical Defects Critical functions or features are not implemented correctly Defect Debt Temporarily tolerable defects that will be fixed in the future Requirement Debt Requirements are not implemented accurately or implemented partially Design Debt The violation of good object-oriented design principles such as god class and long method Technical Debt Code Debt Bad coding practices such as dead code or no proper comments UI Debt UI related issues such as inconsistent UI style or ugly UI elements Architecture Debt Design limitation in architecture level such as the violation of modularity 6. Is it a task to redesign some function or feature as current B. Phase 1: Tagging issues manually design does not meet or meet the requirement partially? We tagged each issue or task in the issue data set as technical debt or not technical debt manually by reading the summary and If yes, we tag this issue as requirement debt. description based on the following classification criteria: 7. Is it a limitation of design that may pose a threat to the 1. Is it a request for requirement change from the client? performance of the system or to the evolution and maintenance of the system? If yes, we definitely tag this issue as not technical debt. If yes, we tag this issue as design debt. 2. Is it a task to add new functions or introduce new features to the product? 8. Is it an issue related to bad coding practices such as dead code and no proper comments? If yes, we also definitely tag this issue as not technical debt. If yes, we tag this issue as code debt. 3. The description of the issue is too short or insufficient to decide whether the issue is a technical debt item. 9. Is it a UI related issue such as inconsistent UI style or ugly UI elements that degrades user experience? In this case, we tag this issue as not technical debt. If yes, we tag this issue as UI debt. 4. Is it a defect that important and critical functions or features are not implemented correctly? 10. Is it a limitation of design in architecture level that may exert a negative impact on the performance of the system or on If yes, we tag this issue as not technical debt. the evolution and maintenance of the system such as the 5. Is it a defect that is not critical from the client’s violation of modularity? perspective but weakens the performance and capabilities of the If yes, we tag this issue as architecture debt. system and will be fixed in the future? The 10 cases listed above are the typical cases we If yes, we tag this issue as defect debt. encountered when tagging the issues but do not cover all the Copyright © 2017 for this paper by its authors. 61 5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017) types of issues existing in the issue tracker. Actually, some words based on our intuition. To make this paper more readable, issues can be tagged as either technical debt or not technical we only list the meaning of key words instead of original debt, which, to a large extent, depends on your personal Chinese characters in the below: understanding of technical debt. Typically, there exist wide discrepancies regarding whether defects should be viewed as a “at present”, “now”, “current”, “previously”, “in the past”, “in the future”, “time”, “actually”, “in reality”, “users”, type of technical debt among researchers and developers. In this “clients”, “strengthen”, “change”, “modify”, “replace”, study, we divided defects into two categories: 1. critical defects “update”, “delete”, “cancel”, “suggest”, “optimize”, “simplify”, that may cause fatal errors occurring when using the software; 2. tolerable defects that may exert a marginal negative impact on “perfect”, “improve”, “refactor”, “decouple”, “again”, “re-”, “replant”, “tidy”, “integrate”, “merge”, “adjust”, “extend”, the use of the software and are not fixed immediately after being “expect”, “plan”, “management”, “maintenance”, “function”, detected. We tagged the first type of defects as not technical debt “requirement”, “design”, “rule”, “theory”, “strategy”, and the second type of defects as technical debt. “mechanism”, “algorithm”, “data structure”, “logic”, “code”, After we finished tagging all the issues, we asked a known “structure”, “architecture”, “style”, “format”, “performance”, expert in software engineering and technical debt external to our “efficiency”, “sufficiency”, “security”, “compatibility”, research team to validate the results of our manual classification. “scalability”, “maintainability”, “stability”, “generality”, The expert classified a random subset of the issues “usability”, “readability”, “real-time”, “limitation”, “more independently. With respect to discrepancies in the friendly”, “more specialized”, “more accurate”, “problem”, classification of some issues, we exchanged our respective “configuration”, “priority”, “inconsistent”, “unreasonable”, points of view about why we classified a certain issue as the “inconvenient”, “convenient”, “not clear”, “inaccurate”, 'not category to solve our discrepancies. If we did not achieve intuitive', “not pretty”, “incongruous”, “not smooth”, agreements in the classification of a certain issue, we discussed “inconformity”, “incomplete”, “abnormity”, “defect”, “impact”, the issue with developers to gain insight into the issue itself and “experience”, “habit”, “operation”, “difficulty”, “delay”, “UI”, their opinions on the classification. “risk”, “optimize”, “refactor”, “SonarQube” Finally, we found 331 technical debt issues in total whose There are 114 key words in total, among which 104 words distribution is shown in Table II. Requirement debt and design are Chinese words and 10 words are English words. As some debt are the main technical debt types, including 105 and 141 words express the similar or same meaning, we merged these instances respectively. words. All these words to some extent indicate or imply the concept of technical debt from different perspectives. To be TABLE II. THE NUMBER OF DIFFERENT TYPES OF TECHNICAL DEBT specific, ISSUES • “at present”, “now”, “current”, “previously”, “in the Technical Debt Type Number past”, “in the future”, “time” Requirement Debt 105 These words indicating time concept may imply accumulation. Architecture Debt 6 • “strengthen”, “change”, “modify”, “replace”, “update”, Design Debt 141 “delete”, “cancel”, “optimize”, “simplify”, “perfect”, Defect Debt 15 “improve”, “refactor”, “decouple”, “again”, “re-”, “replant”, “tidy”, “integrate”, “merge”, “adjust”, UI Debt 35 “extend” Code Debt 20 These words indicate the modification of code, design or other 9 architecture, or the enhancement of functionality, capability, performance, efficiency, etc. C. Phase 2: Extracting key words and phrases • “security”, “compatibility”, “scalability”, Different from English, Chinese is written without spaces “maintainability”, “stability”, “generality”, “usability”, between words. So before extracting key words from the “readability”, “real-time”, “limitation” Chinese texts, we have to convert each text to a word sequence using a Chinese text segmentation tool. For this research, we These words indicate concerned aspects of software quality used Jieba (https://github.com/fxsjy/jieba/) [21], a popular open attributes. source Chinese text segmentation tool, to split Chinese texts into • “inconsistent”, “unreasonable”, “inconvenient”, a sequence of words. “convenient”, “unclear”, “inaccurate”, 'not intuitive', After conducting Chinese text segmentation, we extracted “not pretty”, “incongruous”, “not smooth”, key words using Jieba. Jieba integrated two key word extraction “inconformity”, “incomplete”, “abnormity”, “defect”, algorithms: TF-IDF and TextRank. We used both of them to “limit”, “impact”, “experience”, “habit”, “operation”, extract key words for detecting technical debt. We took the “difficulty”, “delay” union of two sets of key words extracted using these two These words indicate defects or design limitation such as different algorithms, removed the key words referring to domain inconsistent UI style, unreasonable design, etc. knowledge from the union set, and finally added some key Copyright © 2017 for this paper by its authors. 62 5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017) D. Phase 3: Extracting features ∏𝑛 𝑗=1 𝑝(𝑥𝑗 |𝑐𝑘) 𝑝(𝑐𝑘 | 𝑥1, … , 𝑥𝑛 ) = ∑𝐾 𝑛 𝑝(𝑐𝑘 ). Once key words were extracted from the issue data set, ℎ=1 ∏𝑗=1 𝑝(𝑥𝑗 |𝑐ℎ )𝑝(𝑐ℎ ) features for text classification can be derived by checking the To perform our experiments, we used a popular natural presence or absence of each key word in each issue text. Given language toolkit for building Python programs to process human the set of key words is [“users”, “change”, “modify”, “improve”, language data (NLTK http://www.nltk.org) [22]. We employed “refactor”, “decouple”, “priority”, “button”, “architecture”, the implementations by NLTK instead of creating a binary “deploy”, “rules”], consider this issue description: “design Naïve Bayes classifier from scratch. change: to keep a consistent design with different pages, we are moving the clear-all-rules button to the front of the deploy rules table. (Consistent with event page)”. First, we tokenized the text into a sequence of words and removed stop words (words that Feature Extraction from Text Data are too common to indicate any semantic meaning for our text = “design change: to keep a consistent design with different classification). Thus, the text is converted to a string list: pages, we are moving the clear-all-rules button to the front of the [“design”, “change”, “keep”, “consistent”, “design”, “different”, deploy rules table. (Consistent with event page)” “pages”, “moving”, “clear-all-rules”, “button”, “front”, “deploy”, “rules”, “table”]. Then we could check whether this t = tokenize(text) = [“design”, “change”, “keep”, “consistent”, “design”, “different”, “pages”, “moving”, “clear-all-rules”, “button”, string list contains each of key words, i.e. [contain(“users”), “front”, “deploy”, “rules”, “table”] contain(“change”), contain(“modify”), contain(“improve”), contain(“refactor”), contain(“decouple”), contain(“priority”), contain(“button”), contain(“architecture”), contain(“deploy”), Feature Space contain(“rules”)]. This vector checking the presence or absence Feature Vector of t of each key word is called feature space. The dimension of S=[ V(t) = [ feature space depends on the size of the set of key words. Finally, contain(“users”), false, we got the feature vector of the issue sample based on the feature contain(“change”), true, contain(“modify”), false, space: [false, true, false, false, false, false, false, true, false, true, … true]. … contain(“rules”), true, contain(“design change”), The feature space actually not only includes unigram true, contain(“improve unit features that are a single word like “design”, “decouple”, but test”) false also has bigram and trigram features which comprised adjacent ] ] word pair and triplet respectively, such as “design change” and “improve unit test”; that is to say, the feature space in the previous example can be extended to [contain(“users”), contain(“change”), contain(“modify”), contain(“improve”), contain(“refactor”), contain(“decouple”), contain(“priority”), Fig. 2. Feature Extraction contain(“button”), contain(“architecture”), contain(“deploy”), contain(“rules”), contain(“design change”), contain(“improve IV. EXPERIMENTAL RESULTS AND ANALYSIS unit test”)]. Then the feature vector is turned into [false, true, false, false, false, false, false, true, false, true, true, true, false]. Repeated random sub-sampling validation was performed to Figure 2 shows the process of feature extraction. validate our approach to the identification of technical debt by repeatedly splitting the full data set into 80/20% randomly E. Phase 4: Creating a binary Naïve Bayes classifier distributed partitions, training and testing the classifier for each Naïve Bayes is a simple classification algorithm that is based split, and recording performance results. on an assumption that the features are conditionally independent RQ1: How do software practitioners communicate of each other given the category. It determines the category of a technical debt issues in issue trackers? given sample with n-dimensional features ( 𝑥1 , … , 𝑥𝑛 ) by calculating the probability that the sample belongs to each We searched for the term “technical debt” and the category and then assigning the most probable category c to it, corresponding Chinese term in the issue data set and found no which can be described as: positive results. All the technical debt instances in this issue tracker were implicitly expressed using other technical debt 𝑐 = arg max 𝑝(𝑐𝑘 | 𝑥1 , … , 𝑥𝑛 ), related words such as redesign, design change, refactor, cleanup, 𝑘∈{1,…,K} decouple, etc. By means of communication with developers of where 𝑐𝑘 is the k category, and K is the size of the set of th this product, we learned that they did not have strong awareness categories. Using Bayes’ theorem, the conditional probability of technical debt. Some of them had even never heard about the 𝑝(𝑐𝑘 | 𝑥1, … , 𝑥𝑛 ) can be decomposed as: concept of technical debt although they recognized that they had 𝑝(𝑥1,…,𝑥𝑛 | 𝑐𝑘) much experience in incurring technical debt when we explained 𝑝(𝑐𝑘 | 𝑥1, … , 𝑥𝑛 ) = ∑𝐾 𝑝(𝑐𝑘 ). what is technical debt. To track, prioritize and pay off technical ℎ=1 𝑝(𝑥1 ,…,𝑥𝑛 | 𝑐ℎ ) debt effectively, we suggested they take technical debt as an With the conditional independence assumptions, the conditional issue type in the issue tracker to communicate technical debt probability 𝑝(𝑐𝑘 | 𝑥1 , … , 𝑥𝑛 ) can be transformed into: explicitly. Copyright © 2017 for this paper by its authors. 63 5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017) TABLE III. 20 MOST INFORMATIVE FEATURES FOR DETECTING of different types of technical debt. Intuitively, the presence of TECHNICAL DEBT “style” and “experience” may indicate UI debt while “simplify” 20 Most Informative Features for Detecting Technical Debt and “efficiency” are more likely to be indicators of design debt. Likelihood Ratio To evaluate the performance of our classifier, the average Features (Technical Debt : not Technical precision and recall were calculated for 10 repeated random sub- Debt) sampling validations. Precision measures the fraction of 协议识别优化(protocol technical debt instances identified by our classifier that were 155.2 : 1.0 identification optimization) = 1 proved to be correct classification. Recall measures the fraction 增强 (strengthen) = 1 128.2 : 1.0 of correctly classified technical debt items out of the total 不方便 (inconvenient) = 1 128.2 : 1.0 number of technical debt issues. In our experiments, the average precision and recall were 72% and 81% respectively for 10 提高 (improve) = 1 117.4 : 1.0 repeated random sub-sampling validations shown in Table IV. 优化 (optimize) = 1 90.8 : 1.0 V. THREATS TO V ALIDITY 整改 (change or modify) = 1 87.7 : 1.0 There are two main threats to the validity of our study: threats to internal validity and threats to external validity. 风格 (style) = 1 65.2 : 1.0 Threats to internal validity can be caused by the level of 体验 (experience) = 1 64.4 : 1.0 subjectivity in manual analysis and classification of issues as we definitely have personal bias in the understanding of issue 改进 (improve) = 1 60.7 : 1.0 description. To counter the threats, we had an expert external to our research team classify random samplings of the issues and 不容易 (not easy) = 1 47.2 : 1.0 solved our discrepancies by discussion. We also had discussions 改善 (improve) = 1 44.5 : 1.0 with the developers of the product to gain insight into the issues that we were not sure we classified correctly. Threats to external 效率 (efficiency) = 1 44.5 : 1.0 validity concern the generalization of our findings. We 简化(simplify) = 1 38.2 : 1.0 performed a case study on an issue data set from a commercial software project. The data set of issues we used may not be 解决方案(strategy) = 1 35.8 : 1.0 representative; that is to say, we cannot guarantee the same results will be obtained when our approach is applied to other 困难(difficulty) = 1 33.7 : 1.0 commercial software projects. In particular, our approach may 前期(previously) = 1 33.7 : 1.0 not be applicable to those projects for which issue trackers are not used to record issues. 不美观(not pretty) = 1 33.7 : 1.0 VI. CONCLUSION AND FUTURE WORK risk = 1 33.7 : 1.0 This paper presents an exploratory study of applying NLP 算法(algorithm) = 1 31.8 : 1.0 and machine learning techniques to identify technical debt 习惯(habit) = 1 31.8 : 1.0 issues through issue trackers. We have demonstrated that we can automate the process of detecting technical debt issues through issue trackers and achieve an acceptable performance using NLP TABLE IV. THE RESULT FOR REPEATED RANDOM S UB-SAMPLING VALIDATION and machine learning techniques. We found that some common words in software engineering are directly or indirectly related Average Average Average F1- to technical debt and these words can be used as features to Category Precision Recall score decide whether a certain issue is technical debt or not. We Technical Debt 0.72 0.81 0.76 believe the performance of our classifier will improve further when more sophisticated feature extraction and classification techniques are applied. RQ2: Are there text patterns that are an indication that technical debt exist which can be used to identify potential This exploratory study was based on a rather limited data set technical debt using NLP and machine learning techniques of 8,149 issues. Our approach needs to be validated with issue automatically? data sets from a wider range of software projects. Furthermore, we will improve the performance of our classifier by exploring The experimental results demonstrate that text patterns more sophisticated feature extraction techniques such as indicating technical debt indeed exist and can be used to identify mapping phrases with regular expressions and extracting technical debt. In general, technical debt issues are characterized semantically meaningful information based on the context and by two aspects of properties including rework whether it is code applying other classification techniques such as random forest, refactoring or feature redesign and accumulation which is SVM, and deep learning. In addition, we will also develop a implied by some words indicating time such as previously, at multi-classifier to identify technical debt of a specific type. present, and in the future. 20 most informative features that are strongly correlated to technical debt are shown in Table III. Each of these features may contribute differently to the identification Copyright © 2017 for this paper by its authors. 64 5th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2017) ACKNOWLEDGMENT [11] Li, Z., et al., An empirical investigation of modularity metrics for indicating architectural technical debt, in Proceedings of the 10th We would like to acknowledge the support of Jean Su, international ACM Sigsoft conference on Quality of software Steven Zhu, Billy Liu for this research. We also thank Robert architectures. 2014, ACM: Marcq-en-Bareul, France. p. 119-128. Nord, Rob Fuller, Randy Hsu for their valuable review [12] Li, Z., P. Liang, and P. Avgeriou. Architectural technical debt comments. This research was partially supported by a Canada identification based on architecture decisions and change scenarios. in Software Architecture (WICSA), 2015 12th Working IEEE/IFIP NSERC grant, a MITACS grant and the Institute for Computing, Conference on. 2015. IEEE. Information and Cognitive System (ICICS) at UBC. [13] Bellomo, S., et al. Got technical debt? Surfacing elusive technical debt in issue trackers. in Mining Software Repositories (MSR), 2016 IEEE/ACM REFERENCES 13th Working Conference on. 2016. IEEE. [1] Avgeriou, P., et al. Managing Technical Debt in Software Engineering [14] Antoniol, G., et al. Is it a bug or an enhancement?: a text-based approach (Dagstuhl Seminar 16162). in Dagstuhl Reports. 2016. Schloss Dagstuhl- to classify change requests. in Proceedings of the 2008 conference of the Leibniz-Zentrum fuer Informatik. center for advanced studies on collaborative research: meeting of minds. [2] Cunningham, W., The WyCash portfolio management system. SIGPLAN 2008. ACM. OOPS Mess., 1992. 4(2): p. 29-30. [15] Runeson, P., M. Alexandersson, and O. Nyholm. Detection of duplicate [3] Fowler, M. and K. Beck, Refactoring: improving the design of existing defect reports using natural language processing. in Proceedings of the code. 1999: Addison-Wesley Professional. 29th international conference on Software Engineering. 2007. IEEE [4] Marinescu, R. Detection strategies: Metrics-based rules for detecting Computer Society. design flaws. in Software Maintenance, 2004. Proceedings. 20th IEEE [16] Wang, X., et al. An approach to detecting duplicate bug reports using International Conference on. 2004. IEEE. natural language and execution information. in Software Engineering, [5] Munro, M.J. Product metrics for automatic identification of" bad smell" 2008. ICSE'08. ACM/IEEE 30th International Conference on. 2008. design problems in java source-code. in Software Metrics, 2005. 11th IEEE. IEEE International Symposium. 2005. IEEE. [17] Jalbert, N. and W. Weimer. Automated duplicate detection for bug [6] Olbrich, S., et al. The evolution and impact of code smells: A case study tracking systems. in Dependable Systems and Networks With FTCS and of two open source systems. in Proceedings of the 2009 3rd international DCC, 2008. DSN 2008. IEEE International Conference on. 2008. IEEE. symposium on empirical software engineering and measurement. 2009. [18] Sun, C., et al. A discriminative model approach for accurate duplicate bug IEEE Computer Society. report retrieval. in Proceedings of the 32nd ACM/IEEE International [7] Wong, S., et al. Detecting software modularity violations. in Proceedings Conference on Software Engineering-Volume 1. 2010. ACM. of the 33rd International Conference on Software Engineering. 2011. [19] Sureka, A. and P. Jalote. Detecting duplicate bug report using character ACM. n-gram-based features. in Software Engineering Conference (APSEC), [8] de Freitas Farias, M.A., et al. A Contextualized Vocabulary Model for 2010 17th Asia Pacific. 2010. IEEE. identifying technical debt on code comments. in Managing Technical [20] Cois, C.A. and R. Kazman. Natural Language Processing to Quantify Debt (MTD), 2015 IEEE 7th International Workshop on. 2015. IEEE. Security Effort in the Software Development Lifecycle. in SEKE. 2015. [9] Maldonado, E., E. Shihab, and N. Tsantalis, Using natural language [21] Sun, J., ‘Jieba’Chinese word segmentation tool. 2012. processing to automatically detect self-admitted technical debt. IEEE [22] Bird, S. NLTK: the natural language toolkit. in Proceedings of the Transactions on Software Engineering, 2017. COLING/ACL on Interactive presentation sessions. 2006. Association for [10] Brondum, J. and L. Zhu, Visualising architectural dependencies, in Computational Linguistics. Proceedings of the Third International Workshop on Managing Technical Debt. 2012, IEEE Press: Zurich, Switzerland. p. 7-14. Copyright © 2017 for this paper by its authors. 65