=Paper=
{{Paper
|id=Vol-1705/04-paper
|storemode=property
|title=Evaluating an Assistant for Creating Bug Report Assignment Recommenders
|pdfUrl=https://ceur-ws.org/Vol-1705/04-paper.pdf
|volume=Vol-1705
|authors=John Anvik
|dblpUrl=https://dblp.org/rec/conf/eics/Anvik16
}}
==Evaluating an Assistant for Creating Bug Report Assignment Recommenders==
Evaluating an Assistant for Creating Bug Report Assignment Recommenders John Anvik Abstract University of Lethbridge Software development projects receive many change re- Lethbridge, Canada quests each day and each report must be examined to de- john.anvik@uleth.ca cide how the request will be handled by the project. One decision that is frequently made is to which software devel- oper to assign the change request. Efforts have been made toward semiautomating this decision, with most approaches using machine learning algorithms. However, using ma- chine learning to create an assignment recommender is a complex process that must be tailored to each individ- ual software development project. The Creation Assistant for Easy Assignment (CASEA) tool leverages a project member’s knowledge for creating an assignment recom- mender. This paper presents the results of a user study using CASEA. The user study shows that users with limited project knowledge can quickly create accurate bug report assignment recommenders. Author Keywords bug report triage; assignment recommendation; machine learning; recommender creation; computer supported work Copyright is held by the author/owner(s). ACM Classification Keywords EICS’16, June 21-24, 2016, Bruxelles, Belgium. H.5.m [Information interfaces and presentation (e.g., HCI)]: Miscellaneous; I.2.4 [Programming Languages and Soft- ware: Expert system tools and techniques]; I.2.7 [Natural Language Processing: Text analysis] 26 Introduction This paper proceeds as follows. First, an overview of CASEA Large software development projects can receive hundreds is presented. Next, the results from a user study involving of bug reports per day [5, 6]. Each of these bug reports subjects creating assignment recommenders for the Eclipse needs to be analyzed and decisions made about how the Platform project are presented. The paper then concludes report will be handled by the project. In cases where a with a discussion of some of the threats to the validity of change to the source code is needed, a decision is made this work, related work and possible future improvements about to whom the work will be assigned. This decision pro- to make CASEA more practical for software development cess is called bug triage and must be done for all incoming projects. reports. Background Bug triage takes significant time and resources [12]. Bug This section presents background information about bug report assignment recommenders have been proposed as a reports, their life cycles, and machine learning. method for reducing this overhead. Many researchers have investigated different approaches for assignment recom- Bug reports mender creation, with most focusing on the use of machine Bug reports, also known as change requests, provide a learning [5, 7, 15, 29, 31]. means for users to communicate software faults or fea- ture requests to software developers. They also provide Conceptually, the creation of an assignment recommender a means for developers to manage software development using machine learning is straightforward [3]. However in tasks. Bug reports contain a variety of information, some practice creating an assignment recommender for a specific of which is categorical and some of which is descriptive. software development project is challenging. The Creation Categorical information includes such items as the report’s Assistant for Easy Assignment (CASEA) tool [1] was cre- identification number (i.e. bug id), its resolution status (e.g., ated to assist software development projects in creating NEW or RESOLVED), the component the report is believed machine learning assignment recommenders tailored to a to involve, and which developer has been assigned the specific project. work. Descriptive information includes the title of the report, the description of the report, and discussions about possi- This paper presents the results of a user study using CASEA ble approaches to resolving the report. Finally, a report may to create assignment recommenders for a large open source contain other information, such as attachments or links to project, and is the first such study. The study found that other reports. subjects could quickly create an accurate assignment rec- ommender using the tool, despite the users having no spe- Bug report lifecycle cific knowledge about the software project. All bug reports have a lifecycle. When a bug report first en- ters a project’s issue tracking system (ITS), it is in a state To our best knowledge, CASEA is the first system to ad- such as UNCONFIRMED or NEW. The bug report will then dress the bug report assignment recommender creation move through different states, depending on the project’s problem, and this paper presents the first study of its use. development process, and arrive at a resolution state, such 27 as FIXED or INVALID. The lifecycle of a bug report can be these steps. used to categorize bug reports [5]. Figure 1 shows an ex- ample life cycle state graphs from the Bugzilla ITS [14]. Data collection The first step in recommender creation is to gather the data Machine learning algorithms to be used for creating the recommender. Specifically, bug Machine learning algorithms Machine learning is the devel- reports are extracted from the project’s issue tracking sys- opment of algorithms and techniques that allow computers tem (ITS). The project member provides the URL of the to learn [21]. Machine learning algorithms fall under three project’s ITS, a date range for data collection, and an op- categories: supervised learning, unsupervised learning, tional maximum limit for number of reports to gather. Re- and reinforcement learning. Bug report assignment recom- ports that have a resolution status of RESOLVED, VERI- menders primarily use supervised learning algorithms, such FIED or CLOSED are gathered chronologically, with every as Support Vector Machines (SVM) [17], Naive Bayes [25] tenth report selected as a testing report to create an unbi- and ML-KNN [30]. Understanding how a machine learning ased set for evaluation. algorithm creates a recommender requires understanding three concepts: the feature, the instance and the class. A Data preparation feature is a specific piece of information that is used to de- Having collected the data from the project’s ITS, the next termine the class, such as a term that appears in one or step is to filter the data to produce the highest quality train- more of a set of bug reports. An instance is a collection of ing set. Two types of filtering are performed: automatic and features that have specific values, such as all of the terms assisted. in the description of a specific bug report. Finally, a class is The automatic filtering performs three actions on the textual the collection of instances that all belong to the same cate- data. First, terms that are stopwords (i.e. common words gory, such as all of the bug reports fixed by a developer. In such as ’a’ and ’the’) are removed. Next, stemming is per- supervised machine learning, training instances are labeled formed to reduce all of the terms to their respective root with their class. A recommender is created from a set of in- values so that words such as ’user’ and ’users’ are treated stances and the output of the recommender is a subset of as the same word, ensuring a common vocabulary between the classes predicted for a new instance. the reports. Finally, punctuation and numeric values are re- moved, except where the punctuation is important to the Creation assistant for easy assignment term, such as URLs or class names (e.g. “org.eclipse.jdt”). The Creation Assistant for Easy Assignment (CASEA) [1] is a software tool to assist a software project in creating CASEA assists the user with two types of filtering: label fil- and maintaining bug report assignment recommenders. tering and instance filtering. To assist with label filtering, CASEA guides a project member through the assignment CASEA presents the user with a label frequency graph. For recommender creation process in four steps: Data Col- an assignment recommender, this graph presents bug fix- lection, Data Preparation, Recommender Training, and ing statistics, a type of activity profile [23], for the project Recommender Evaluation. The remainder of this section developers based on a random sample of all of the bug presents an overview of how CASEA assists with each of reports in the training data set. Figure 2 shows the Con- 28 Figure 1: Bug report life cycle state diagram from the Bugzilla ITS. 29 Figure 2: The Creation Assistant for Easy Assignment (Configuration tab). 30 figuration tab and an example of label filtering. As can be seen in the graph, developer activity follows a Pareto dis- tribution curve with a few developers contributing the bulk of the work, and many other developers making small con- tributions [11, 18, 22, 26]. CASEA visualizes the project’s development activity and allows the user to select a thresh- old using a slider, such that only a core set of developers are recommended. In Figure 2, a cutoff of 21 has been se- lected. Figure 3: Recommender evaluation in CASEA (Analysis Tab). Instance filtering is done using project-specific heuristics. The heuristics have two parts: a grouping rule and a label heuristics to be applied, to a maximum of ten; six was cho- source. The grouping rule is used to categorize the data sen in Figure 2. into groups for which the label source will be used for la- beling the instances. For an assignment recommender, Recommender Training the grouping rule is a bug report lifecycle (called “Path After filtering the data to create the set of training and eval- Group” in Figure 2) and the label source (i.e. data source uation instances for the recommender, the data is then for- in Figure 2) is either a field from the bug report, such as the matted for use with the machine learning algorithm. Once assigned-to field, or other labelling information that can be the user has filtered the data and the data is formatted, the extracted from the bug report, such as the user that last at- recommender is created using a multi-class Support Vector tached a patch or the developer who marked the report as Machines (SVM) algorithm with a Gaussian kernel. SVM is resolved. a commonly used algorithm for assignment recommenda- tion [5, 7]. Figure 2 shows an example of instance filtering in CASEA. All of the training reports are used to determine the specific Recommender Evaluation bug report life cycles for the project and the user is pre- Once the user starts the recommender creation process, sented with a statistical summary of the categories. The the user is moved to the Analysis tab that presents the rec- figure shows that for the data set for the Eclipse Product ommender evaluation results (Figure 3 ). The user can then project, 30.6the reports have a NEW→FIXED (NF) lifecy- return to the Configuration tab, adjust the values for label cle, followed by 22.6% NEW→FIXED→VERIFIED (NFV), and instance filtering, and create a new recommender. This and 15.6% being NEW (N). Using this information, the user process continues until the user is either satisfied with the can create heuristics for the most common occurring cate- created recommender, or the user has determined that an gories. In this case, “FixedBy” was chosen for the NF cat- assignment recommender cannot be created with a high egory, “Resolver” for the NFV category, and “Reporter” for enough accuracy to benefit the project. At any time the user the N category. The user can also choose the number of can save the recommender configuration and return at a later date. 31 CASEA uses the metrics of precision and recall to evalu- about their prior knowledge and experience in two areas: ate a created recommender. It presents results for the top technical experience and technical knowledge. For techni- recommendation (top-1), the top 3 recommendations (top- cal experience, subjects were asked about their level of ex- 3), and the top 5 recommendations (top-5). Figure 3 shows perience with issue tracking systems, open source projects an example of the evaluation results for a recommender and software testing. To assess prior technical knowledge, after eighteen trials. It shows that the first four configura- subjects were asked about their familiarity with bug reports, tions did not create very accurate recommenders as the machine learning algorithms, classifiers or recommender activity threshold was too low, but when the threshold was systems, user interface design principles, and data mining. raised, a more accurate recommender was produced. After A Likert scale (Very Experienced/Familar, Some Experi- about five more trials, a good heuristic configuration was ence/Familarity, Little Experience/ Familarity, Heard Of, No determined that produced an assignment recommender Experience/Familarity) was used for the self-reporting of with a high top-1 accuracy and reasonable top-3 and top- their experience level. After completing the prior experi- 5 accuracies. Further experimentation was done with the ence and knowledge survey, subjects were asked to cre- heuristics with varying results, before determining that the ate an assignment recommender for the Eclipse Platform configuration from trial #10 was the best configuration. project within fifteen minutes. Once the subjects expressed that they were done using CASEA, a debriefing interview User study of CASEA was conducted. The subjects were asked to explain their A small user study was conducted to assess the potential approach to creating an assignment recommender using for CASEA to assist software projects in creating assign- CASEA and what they recommended as improvements to ment recommenders. Specifically, the study sought to an- the tool, as well as any other comments about their experi- swer qualitative questions such as “What aspects of the ence with CASEA. recommender creation process and the CASEA interface do users find helpful? challenging? or confusing?". This Prior knowledge and experience study was similar in intent to Stumpf et al [27], who con- Figure 4 shows a summary of the responses from subjects ducted a user study to determine how users interact with regarding their prior experience. As shown, most of the a machine learning system. The study was conducted us- subjects had prior experience with issue tracking systems, ing a sample of eight computer science graduate and un- with four reporting some experience and four reporting little dergraduate students. This study population was selected experience. Overall testing experience was a bit less, with under the assumption that using a group with no specific six subjects reporting little experience, two reporting “heard project knowledge would provide a lowerbound for future of” and one reporting no experience. Subjects had the least in-field user studies. overall experience with contributing to open source projects, with two reporting little experience, three reporting “heard User Study Setup of” and four reporting no experience.1 For technical knowl- The user study was conducted in the following manner. edge, Figure 5 shows a summary of the responses. Most of First, subjects were asked to complete a prior knowledge the subjects reported either being very familar (2 subjects) and experience survey. Specifically, subjects were asked 1 Not all subjects provided answers to all of the questions. 32 or having some familiarity (6 subjects) with user interface the columns. design principles from either recently or currently taking an undergraduate course about this topic. Half of the subjects Table 2 shows the threshold and heuristic configurations for reported familiarity with machine learning algorithms, and the best assignment recommender created by each of the slightly more (5 subjects) reported familiarity with classi- eight subjects. The first two columns list the top ten path fiers and recommender systems. The knowledge in these groups for the data set and how much of the data set is areas came from either taking an undergraduate course in covered by the path group. As shown by the table, 77% computational intelligence or from other course projects. of the data set is covered by the first five path groups, and Subjects reported the least familiarity with the bug report most of the subjects specified heuristics for six or fewer lifecycle and data mining. path groups. Also, with For technical knowledge, Figure 5 shows a summary of the one exception, the “Assigned” data source was used for the responses. Most of the subjects reported either being very remaining 11%-27% not covered by the specified heuris- familar (2 subjects) or having some familiarity (6 subjects) tics. Half of the subjects chose values less than 10 for the with user interface design principles from either recently or threshold and the others used values greater than 20. The currently taking an undergraduate course about this topic. results show that the subjects were usually able to create Half of the subjects reported familiarity with machine learn- a reasonably accurate assignment recommender in 10 tri- ing algorithms, and slightly more (5 subjects) reported fa- als or less. The two most accurate recommenders (created miliarity with classifiers and recommender systems. The by Subjects #7 and #8) had the same configuration (i.e. knowledge in these areas came from either taking an un- threshold and heuristic values), as shown in Table 2. dergraduate course in computational intelligence or from Qualitative Results and Observations other course projects. Subjects reported the least familiarity Based on observations during the study and responses with the bug report lifecycle and data mining. from the debriefing interview, subjects were found to em- Quantitative Results ploy two strategies for assignment recommender creation Table 1 shows the quantitative results from the eight sub- using CASEA. Some subjects were found to be very ex- jects. The first column identifies the subjects. The next two perimental in their approach, making many changes be- columns present both the number of trials a subject con- fore creating a new recommender. Other users were more ducted before creating their most accurate assignment rec- methodical, making small changes and testing the results. ommender, and the total number of trials that the subject Figure 6 shows a categorization of the different types of conducted in creating an assignment recommender using changes (heuristic change, threshold change or both) made CASEA. The next three columns show the Top-1, Top-3 and by each subject. As expected, subjects changed the heuris- Top-5 precision and recall values for the best Eclipse Plat- tic configurations the most, and most subjects only changed form assignment recommender created by the subject. The the threshold three times or fewer. last three rows of Table 1 show a summary of the results, One subject commented that the best strategy was to make presenting the maximum, minimum, and median values for small incremental changes, and that CASEA made it easy 33 Figure 4: Responses by subjects regarding prior technical experience. Figure 5: Responses by subjects regarding prior technical knowledge. 34 Identifier Trial to Best Max Trials Top (1%) Top (3%) Top (5%) Precision Recall Precision Recall Precision Recall Subject #1 5 5 68.63 1.25 54.98 3.01 66.27 6.04 Subject #2 10 10 39.95 1.14 41.26 3.52 45.1 6.42 Subject #3 5 16 68.63 2.08 46.49 4.22 35.25 5.33 Subject #4 3 5 37.99 2.32 27.53 5.04 22.99 7.02 Subject #5 2 3 81.62 1.49 62.66 3.43 51.62 4.7 Subject #6 16 18 68.87 4.07 34.97 6.2 34.66 10.24 Subject #7 11 20 89.22 3.48 56.45 6.6 52.7 10.27 Subject #8 10 19 89.22 3.48 56.45 6.6 52.7 10.27 Max 16 20 89.22 4.07 62.66 6.6 66.27 10.27 Min 2 3 37.99 1.14 27.53 3.01 22.99 4.7 Median 7.5 13 68.75 2.2 50.735 4.63 48.36 6.72 Table 1: Best Eclipse Platform assignment recommenders created by subjects. Covers Sub. #1 Sub. #2 Sub. #3 Sub. #4 Sub. #5 Sub. #6 Sub. #7 Sub. #8 NF 30.6% FirstResp. Resolver Assigned FixedBy FirstResp. FixedBy FixedBy FixedBy NFV 22.6% Resolver Resolver Assigned Assigned FixedBy Assigned Resolver Resolver N 15.1% Reporter Assigned FixedBy Assigned FirstResp. Assigned Assigned Assigned NM 5.1% FixedBy FirstResp. FirstResp. Assigned Assigned Assigned FirstResp. FirstResp. NAFV 3.9% FixedBy Resolver FirstResp. Assigned Assigned Assigned Assigned NAF 3.2% Resolver Reporter Assigned Assigned NC 3.0% FirstResp. Reporter NX 2.5% FirstResp. Assigned NFRFV 1.8% FixedBy NA 1.1% Assigned Other Assigned Assigned Assigned Assigned Assigned FixedBy Assigned Assigned Activity Threshold 47 31 3 5 5 5 22 22 Table 2: Best assignment recommender configurations of subjects. 35 configuration. As was mentioned, the subjects did not have specific knowl- edge about the project, such as who formed the core group of developers. This led to some subjects choosing a low ac- tivity cutoff so as to not exclude developers, and resulted in recommenders that were not accurate and took longer to create. This behavior would not be expected from an actual project member using CASEA, as they would have knowl- edge about the core development team. Threats to validity Figure 6: Types of changes made by subjects. This section highlights some of the threats to the internal validity, external validity, and construct validity of this work. to employ this strategy. Another subject observed that cre- Threats to the internal validity of this work relate to the po- ating an assignment recommender using CASEA was sim- tential sources of error in the evaluation of CASEA. A po- ilar to trying to get a high score in a game, where the score tential source of error is with the creation of the data set was the precision and recall values. used for the evaluation. Although a random sample of bug reports was examined to establish that the data collection As part of the recommender evaluation, CASEA provides procedure was correct, there may have been some bug re- information about how long it takes to create a recom- ports that contained incorrect data. mender. This led some subjects to work towards an incor- rect goal of minimizing the recommender creation time. Threats to external validity relate to the generalizability of the results to other projects or user groups. In this work, Although subjects were provided with a brief tutorial of subjects with no project-specific knowledge were used to CASEA and a high level explanation of the recommender evaluate the usability of CASEA. Therefore, these results creation process at the beginning of the study session, sub- would not generalize to those with project-specific knowl- jects encountered a number of problems related to under- edge, but could be considered as a lower-bound for such a standing terminology or concepts. Specifically, the term group. “Path Group" was used to describe the categorization of bug reports, and subjects found this term unintuitive. This Threats to construct validity refers to the suitability of the led to some initial confusion about the options in the heuris- evaluation measures. The method used to determine the tic configuration panel. Also, the meaning of the precision set of developers that could have fixed a bug report, used and recall metrics was not initially well understood by sub- for calculating precision and recall, is known to overesti- jects. However, once their meaning was understood, sub- mate the group [4]. This results in precision values that jects felt that they made more intelligent choices about the are overvalued, and recall values that are undervalued. 36 However, the evaluation results in CASEA show the rela- Poulin et al. [28] developed ExplainD, a framework for ex- tive differences between different configurations, so even if plaining decisions made by classifiers that use additive evi- the precision and recall values are over or under their true dence, such as Naive Bayes. The framework was used in a value, CASEA still provides meaningful information to the bioinformatics web-based system called Proteome Analyst. user. Strumbelj and Konoenko [19] presented a method for ex- Related work plaining classifier predictions that used coalitional game This section presents related work in the areas of assist- theory. The method used a sampling-based approach to ing with triage, assisting with recommender creation, and reduce the computational complexity of explaining the con- explaining machine learning. tributions of individual feature values. Their approach was applied to explaining the results of various machine learning Assisting with Bug Report Triage algorithms, including Naive Bayes and SVM. Like CASEA, Porchlight [10] and it’s predecessor Team- Bugs [9] seek to provide a tool to assist project triagers in Kulesza et al. [20] created an end user debugging approach making their tasks more efficient. Porchlight allows a triager for intelligent assistants, such as bug report assignment to group similar bug reports together using tags, and then recommenders. The system allowed the user to ask ‘why’ apply a triage decision to the group. This tagging is similar questions about predictions and then change the answers to the path groups in CASEA, which also groups bug re- to debug current and future predictions. ports into categories for specifying and applying labelling Basilio Noris developed a visualization tool for machine heuristics. learning called MLDemos [24] MLDemos assists in under- Assisting with Recommender Creation standing how different machine learning algorithms func- SkyTree Infinity [16] and BigML [8] both provide means for tion. It also demonstrates how the parameters of the algo- guiding a user through the creation of machine-learning rithms affect and modify the results in classification prob- recommenders. However, using Skytree Infinity still re- lems. quires advanced knowledge of machine learning and statis- tics [13]. BigML provides no support for data preparation or Conclusion visualization, and creates recommenders using decision- This paper presented the results of a pilot study of CASEA, trees, which was shown to be ineffective for the bug report a tool to assist in the creation of bug report assignment rec- assignment problem [5]. ommenders. CASEA assists a user in labelling and filtering the bug reports used for creating a project-specific assign- Explaining Machine Learning ment recommender, as well as providing feedback on the One avenue toward making the use of recommender sys- effectiveness of the configured assignment recommender. tems practical is to assist in their creation and evaluation. The study found that users with little to no project-specific This is the approach taken by CASEA. An alternative ap- knowledge were able to quickly create effective assignment proach to making their use practical is by explaining their recommenders for the Eclipse Platform project. results. 37 Based on feedback and the results of the user study, a [6] S. Banitaan and M. Alenezi. 2013. TRAM: An ap- number of future improvements were idenitified for CASEA, proach for assigning bug reports using their Metadata. including having CASEA first attempt to tune a recom- In 2013 Third International Conference on Communi- mender automatically and then have the user tweak the cations and Information Technology. 215–219. configuration, extending CASEA to assist with the creation [7] Pamela Bhattacharya, Iulian Neamtiu, and Christian R of other triage recommenders, supporting other machine Shelton. 2012. Automated, highly-accurate, bug as- learning algorithms, and providing other evaluation met- signment using machine learning and tossing graphs. rics, such as F1. An improved version of CASEA, called Journal of Systems and Software 85, 10 (2012), 2275– the Creation Assistant Supporting Triage Recommenders 2292. (CASTR) [2] was created to incorporate these changes in [8] BigML. 2016. BigML is Machine Learning for every- preparation for a field study with project developers. one. (2016). (Mar 10, 2016) https://bigml.com. [9] Gerald Bortis and André Van der Hoek. 2011. Team- References bugs: A Collaborative Bug Tracking Tool. In Proceed- [1] John Anvik, Marshall Brooks, Henry Burton, and Justin ings of the 4th International Workshop on Cooperative Canada. 2014. Assisting Software Projects with Bug and Human Aspects of Software Engineering (CHASE Report Assignment Recommender Creation. In Pro- ’11). ACM, New York, NY, USA, 69–71. ceedings of the 26th International Conference on Soft- [10] G. Bortis and A. van der Hoek. 2013. PorchLight: A ware Engineering and Knowledge Engineering. 470– tag-based approach to bug triaging. In 2013 35th Inter- 473. national Conference on Software Engineering (ICSE). [2] John Anvik, Marshall Brooks, Henry Burton, Justin 342–351. Canada, and Ted Henders. 2016. CASTR - Creation [11] Gerardo Canfora and Luigi Cerulo. 2006. Supporting Assistant Supporting Triage Recommenders. (2016). Change Request Assignment in Open Source Devel- (July 25, 2016) https://bitbucket.org/bugtriage/castr. opment. In Proceedings of the 2006 ACM Symposium [3] John Anvik, Lyndon Hiew, and Gail C. Murphy. 2006. on Applied Computing (SAC ’06). ACM, New York, NY, Who Should Fix This Bug?. In Proceedings of the 28th USA, 1767–1772. International Conference on Software Engineering [12] Yguaratã Cerqueira Cavalcanti, Paulo Anselmo Mota (ICSE ’06). ACM, New York, NY, USA, 361–370. Silveira Neto, Ivan do Carmo Machado, Tassio Ferreira [4] John Anvik and Gail C. Murphy. 2007. Determining Vale, Eduardo Santana Almeida, and Silvio Romero Implementation Expertise from Bug Reports. In Fourth de Lemos Meira. 2014. Challenges and opportunities International Workshop on Mining Software Reposito- for software change request repositories: a systematic ries (MSR’07:ICSE Workshops 2007). 2–9. mapping study. Journal of Software: Evolution and [5] John Anvik and Gail C. Murphy. 2011. Reducing Process 26, 7 (2014), 620–653. the Effort of Bug Report Triage: Recommenders [13] S. Charrington. 2012. Three New Tools Bring Ma- for Development-oriented Decisions. ACM Trans. chine Learning Insights to the Masses. (2012). Softw. Eng. Methodol. 20, 3, Article 10 (Aug. 2011), (Feb 27, 2012) http://readwrite.com/2012/02/27/ 35 pages. three-new-tools-bring-machine. 38 [14] Mozilla Corporation. 2014. Bugzilla. (2014). (Nov 24, [24] B. Noris. 2016. ML Demos. (2016). (Mar 10, 2016) 2014) http://www.bugzilla.org/. http://mldemos.epfl.ch/. [15] Davor Čubranić. 2004. Automatic bug triage using [25] Jason D Rennie, Lawrence Shih, Jaime Teevan, text categorization. In Proceedings of the Sixteenth David R Karger, and others. 2003. Tackling the poor International Conference on Software Engineering and assumptions of naive bayes text classifiers. In ICML, Knowledge Engineering. 92–97. Vol. 3. 616–623. [16] Skytree Inc. 2016. Skytree Server. (2016). (Mar 10, [26] Gregorio Robles, Stefan Koch, Jesús M GonZÁlEZ- 2016) http://www.skytree.net. BARAHonA, and Juan Carlos. 2004. Remote analysis [17] Thorsten Joachims. 1998. Text categorization with and measurement of libre software systems by means support vector machines: Learning with many relevant of the CVSAnalY tool. In Proceedings of the 2nd ICSE features. In Proceedings of the 10th European confer- Workshop on Remote Analysis and Measurement of ence on machine learning. Springer, 137–142. Software Systems (RAMSS). 51–55. [18] Stefan Koch and Georg Schneider. 2002. Effort, Cop- [27] Simone Stumpf, Vidya Rajaram, Lida Li, Weng-Keen eration and Coordination in an Open Source Software Wong, Margaret Burnett, Thomas Dietterich, Erin Sulli- Project: GNOME. Information Systems Journal 12, 1 van, and Jonathan Herlocker. 2009. Interacting mean- (2002), 27–42. ingfully with machine learning systems: Three exper- [19] Igor Kononenko and others. 2010. An efficient expla- iments. International Journal of Human-Computer nation of individual classifications using game theory. Studies 67, 8 (2009), 639–662. Journal of Machine Learning Research 11, Jan (2010), [28] Duane Szafron, Brett Poulin, Roman Eisner, Paul Lu, 1–18. Russ Greiner, David Wishart, Alona Fyshe, Brandon [20] Todd Kulesza, Simone Stumpf, Weng-Keen Wong, Pearcy, Cam Macdonell, and John Anvik. 2006. Vi- Margaret M Burnett, Stephen Perona, Andrew Ko, and sual explanation of evidence in additive classifiers. In Ian Oberst. 2011. Why-oriented End-user Debugging Proceedings of the 18th Conference on Innovative Ap- of Naive Bayes Text Classification. ACM Transactions plications of Artificial Intelligence, Vol. 2. 1822–1829. on Interactive Intelligent Systems (TiiS) 1, 1 (2011), 2. [29] Xin Xia, David Lo, Xinyu Wang, and Bo Zhou. 2013. [21] Tom M Mitchell. 1997. Machine learning. McGraw-Hill. Accurate developer recommendation for bug resolu- [22] Audris Mockus, Roy T Fielding, and James D Herb- tion. In In Proceedings of the 20th Working Conference sleb. 2002. Two case studies of open source software on Reverse Engineering. IEEE, 72–81. development: Apache and Mozilla. ACM Transac- [30] Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A tions on Software Engineering and Methodology 11, 3 lazy learning approach to multi-label learning. Pattern (2002), 309–346. recognition 40, 7 (2007), 2038–2048. [23] Hoda Naguib, Nitesh Narayan, Bernd Brügge, and [31] Tao Zhang and Byungjeong Lee. 2013. A hybrid bug Dina Helal. 2013. Bug report assignee recommenda- triage algorithm for developer recommendation. In Pro- tion using activity profiles. In Proceedings of the 10th ceedings of the 28th annual ACM symposium on ap- IEEE Working Conference on Mining Software Reposi- plied computing. ACM, 1088–1094. tories (MSR ’13). IEEE, 22–30. 39