=Paper= {{Paper |id=Vol-1705/04-paper |storemode=property |title=Evaluating an Assistant for Creating Bug Report Assignment Recommenders |pdfUrl=https://ceur-ws.org/Vol-1705/04-paper.pdf |volume=Vol-1705 |authors=John Anvik |dblpUrl=https://dblp.org/rec/conf/eics/Anvik16 }} ==Evaluating an Assistant for Creating Bug Report Assignment Recommenders== https://ceur-ws.org/Vol-1705/04-paper.pdf
                                 Evaluating an Assistant for Creating
                                 Bug Report Assignment
                                 Recommenders
John Anvik                                            Abstract
University of Lethbridge                              Software development projects receive many change re-
Lethbridge, Canada                                    quests each day and each report must be examined to de-
john.anvik@uleth.ca
                                                      cide how the request will be handled by the project. One
                                                      decision that is frequently made is to which software devel-
                                                      oper to assign the change request. Efforts have been made
                                                      toward semiautomating this decision, with most approaches
                                                      using machine learning algorithms. However, using ma-
                                                      chine learning to create an assignment recommender is
                                                      a complex process that must be tailored to each individ-
                                                      ual software development project. The Creation Assistant
                                                      for Easy Assignment (CASEA) tool leverages a project
                                                      member’s knowledge for creating an assignment recom-
                                                      mender. This paper presents the results of a user study
                                                      using CASEA. The user study shows that users with limited
                                                      project knowledge can quickly create accurate bug report
                                                      assignment recommenders.

                                                      Author Keywords
                                                      bug report triage; assignment recommendation; machine
                                                      learning; recommender creation; computer supported work

Copyright is held by the author/owner(s).
                                                      ACM Classification Keywords
EICS’16, June 21-24, 2016, Bruxelles, Belgium.        H.5.m [Information interfaces and presentation (e.g., HCI)]:
                                                      Miscellaneous; I.2.4 [Programming Languages and Soft-
                                                      ware: Expert system tools and techniques]; I.2.7 [Natural
                                                      Language Processing: Text analysis]



                                                 26
Introduction                                                        This paper proceeds as follows. First, an overview of CASEA
Large software development projects can receive hundreds            is presented. Next, the results from a user study involving
of bug reports per day [5, 6]. Each of these bug reports            subjects creating assignment recommenders for the Eclipse
needs to be analyzed and decisions made about how the               Platform project are presented. The paper then concludes
report will be handled by the project. In cases where a             with a discussion of some of the threats to the validity of
change to the source code is needed, a decision is made             this work, related work and possible future improvements
about to whom the work will be assigned. This decision pro-         to make CASEA more practical for software development
cess is called bug triage and must be done for all incoming         projects.
reports.
                                                                    Background
Bug triage takes significant time and resources [12]. Bug           This section presents background information about bug
report assignment recommenders have been proposed as a              reports, their life cycles, and machine learning.
method for reducing this overhead. Many researchers have
investigated different approaches for assignment recom-             Bug reports
mender creation, with most focusing on the use of machine           Bug reports, also known as change requests, provide a
learning [5, 7, 15, 29, 31].                                        means for users to communicate software faults or fea-
                                                                    ture requests to software developers. They also provide
Conceptually, the creation of an assignment recommender             a means for developers to manage software development
using machine learning is straightforward [3]. However in           tasks. Bug reports contain a variety of information, some
practice creating an assignment recommender for a specific          of which is categorical and some of which is descriptive.
software development project is challenging. The Creation           Categorical information includes such items as the report’s
Assistant for Easy Assignment (CASEA) tool [1] was cre-             identification number (i.e. bug id), its resolution status (e.g.,
ated to assist software development projects in creating            NEW or RESOLVED), the component the report is believed
machine learning assignment recommenders tailored to a              to involve, and which developer has been assigned the
specific project.                                                   work. Descriptive information includes the title of the report,
                                                                    the description of the report, and discussions about possi-
This paper presents the results of a user study using CASEA
                                                                    ble approaches to resolving the report. Finally, a report may
to create assignment recommenders for a large open source
                                                                    contain other information, such as attachments or links to
project, and is the first such study. The study found that
                                                                    other reports.
subjects could quickly create an accurate assignment rec-
ommender using the tool, despite the users having no spe-           Bug report lifecycle
cific knowledge about the software project.                         All bug reports have a lifecycle. When a bug report first en-
                                                                    ters a project’s issue tracking system (ITS), it is in a state
To our best knowledge, CASEA is the first system to ad-
                                                                    such as UNCONFIRMED or NEW. The bug report will then
dress the bug report assignment recommender creation
                                                                    move through different states, depending on the project’s
problem, and this paper presents the first study of its use.
                                                                    development process, and arrive at a resolution state, such




                                                               27
as FIXED or INVALID. The lifecycle of a bug report can be               these steps.
used to categorize bug reports [5]. Figure 1 shows an ex-
ample life cycle state graphs from the Bugzilla ITS [14].               Data collection
                                                                        The first step in recommender creation is to gather the data
Machine learning algorithms                                             to be used for creating the recommender. Specifically, bug
Machine learning algorithms Machine learning is the devel-              reports are extracted from the project’s issue tracking sys-
opment of algorithms and techniques that allow computers                tem (ITS). The project member provides the URL of the
to learn [21]. Machine learning algorithms fall under three             project’s ITS, a date range for data collection, and an op-
categories: supervised learning, unsupervised learning,                 tional maximum limit for number of reports to gather. Re-
and reinforcement learning. Bug report assignment recom-                ports that have a resolution status of RESOLVED, VERI-
menders primarily use supervised learning algorithms, such              FIED or CLOSED are gathered chronologically, with every
as Support Vector Machines (SVM) [17], Naive Bayes [25]                 tenth report selected as a testing report to create an unbi-
and ML-KNN [30]. Understanding how a machine learning                   ased set for evaluation.
algorithm creates a recommender requires understanding
three concepts: the feature, the instance and the class. A              Data preparation
feature is a specific piece of information that is used to de-          Having collected the data from the project’s ITS, the next
termine the class, such as a term that appears in one or                step is to filter the data to produce the highest quality train-
more of a set of bug reports. An instance is a collection of            ing set. Two types of filtering are performed: automatic and
features that have specific values, such as all of the terms            assisted.
in the description of a specific bug report. Finally, a class is
                                                                        The automatic filtering performs three actions on the textual
the collection of instances that all belong to the same cate-
                                                                        data. First, terms that are stopwords (i.e. common words
gory, such as all of the bug reports fixed by a developer. In
                                                                        such as ’a’ and ’the’) are removed. Next, stemming is per-
supervised machine learning, training instances are labeled
                                                                        formed to reduce all of the terms to their respective root
with their class. A recommender is created from a set of in-
                                                                        values so that words such as ’user’ and ’users’ are treated
stances and the output of the recommender is a subset of
                                                                        as the same word, ensuring a common vocabulary between
the classes predicted for a new instance.
                                                                        the reports. Finally, punctuation and numeric values are re-
                                                                        moved, except where the punctuation is important to the
Creation assistant for easy assignment
                                                                        term, such as URLs or class names (e.g. “org.eclipse.jdt”).
The Creation Assistant for Easy Assignment (CASEA) [1]
is a software tool to assist a software project in creating             CASEA assists the user with two types of filtering: label fil-
and maintaining bug report assignment recommenders.                     tering and instance filtering. To assist with label filtering,
CASEA guides a project member through the assignment                    CASEA presents the user with a label frequency graph. For
recommender creation process in four steps: Data Col-                   an assignment recommender, this graph presents bug fix-
lection, Data Preparation, Recommender Training, and                    ing statistics, a type of activity profile [23], for the project
Recommender Evaluation. The remainder of this section                   developers based on a random sample of all of the bug
presents an overview of how CASEA assists with each of                  reports in the training data set. Figure 2 shows the Con-



                                                                   28
Figure 1: Bug report life cycle state diagram from the Bugzilla ITS.




                                29
Figure 2: The Creation Assistant for Easy Assignment (Configuration tab).




                                   30
figuration tab and an example of label filtering. As can be
seen in the graph, developer activity follows a Pareto dis-
tribution curve with a few developers contributing the bulk
of the work, and many other developers making small con-
tributions [11, 18, 22, 26]. CASEA visualizes the project’s
development activity and allows the user to select a thresh-
old using a slider, such that only a core set of developers
are recommended. In Figure 2, a cutoff of 21 has been se-
lected.                                                                  Figure 3: Recommender evaluation in CASEA (Analysis Tab).

Instance filtering is done using project-specific heuristics.
The heuristics have two parts: a grouping rule and a label             heuristics to be applied, to a maximum of ten; six was cho-
source. The grouping rule is used to categorize the data               sen in Figure 2.
into groups for which the label source will be used for la-
beling the instances. For an assignment recommender,                   Recommender Training
the grouping rule is a bug report lifecycle (called “Path              After filtering the data to create the set of training and eval-
Group” in Figure 2) and the label source (i.e. data source             uation instances for the recommender, the data is then for-
in Figure 2) is either a field from the bug report, such as the        matted for use with the machine learning algorithm. Once
assigned-to field, or other labelling information that can be          the user has filtered the data and the data is formatted, the
extracted from the bug report, such as the user that last at-          recommender is created using a multi-class Support Vector
tached a patch or the developer who marked the report as               Machines (SVM) algorithm with a Gaussian kernel. SVM is
resolved.                                                              a commonly used algorithm for assignment recommenda-
                                                                       tion [5, 7].
Figure 2 shows an example of instance filtering in CASEA.
All of the training reports are used to determine the specific         Recommender Evaluation
bug report life cycles for the project and the user is pre-            Once the user starts the recommender creation process,
sented with a statistical summary of the categories. The               the user is moved to the Analysis tab that presents the rec-
figure shows that for the data set for the Eclipse Product             ommender evaluation results (Figure 3 ). The user can then
project, 30.6the reports have a NEW→FIXED (NF) lifecy-                 return to the Configuration tab, adjust the values for label
cle, followed by 22.6% NEW→FIXED→VERIFIED (NFV),                       and instance filtering, and create a new recommender. This
and 15.6% being NEW (N). Using this information, the user              process continues until the user is either satisfied with the
can create heuristics for the most common occurring cate-              created recommender, or the user has determined that an
gories. In this case, “FixedBy” was chosen for the NF cat-             assignment recommender cannot be created with a high
egory, “Resolver” for the NFV category, and “Reporter” for             enough accuracy to benefit the project. At any time the user
the N category. The user can also choose the number of                 can save the recommender configuration and return at a
                                                                       later date.




                                                                  31
CASEA uses the metrics of precision and recall to evalu-              about their prior knowledge and experience in two areas:
ate a created recommender. It presents results for the top            technical experience and technical knowledge. For techni-
recommendation (top-1), the top 3 recommendations (top-               cal experience, subjects were asked about their level of ex-
3), and the top 5 recommendations (top-5). Figure 3 shows             perience with issue tracking systems, open source projects
an example of the evaluation results for a recommender                and software testing. To assess prior technical knowledge,
after eighteen trials. It shows that the first four configura-        subjects were asked about their familiarity with bug reports,
tions did not create very accurate recommenders as the                machine learning algorithms, classifiers or recommender
activity threshold was too low, but when the threshold was            systems, user interface design principles, and data mining.
raised, a more accurate recommender was produced. After               A Likert scale (Very Experienced/Familar, Some Experi-
about five more trials, a good heuristic configuration was            ence/Familarity, Little Experience/ Familarity, Heard Of, No
determined that produced an assignment recommender                    Experience/Familarity) was used for the self-reporting of
with a high top-1 accuracy and reasonable top-3 and top-              their experience level. After completing the prior experi-
5 accuracies. Further experimentation was done with the               ence and knowledge survey, subjects were asked to cre-
heuristics with varying results, before determining that the          ate an assignment recommender for the Eclipse Platform
configuration from trial #10 was the best configuration.              project within fifteen minutes. Once the subjects expressed
                                                                      that they were done using CASEA, a debriefing interview
User study of CASEA                                                   was conducted. The subjects were asked to explain their
A small user study was conducted to assess the potential              approach to creating an assignment recommender using
for CASEA to assist software projects in creating assign-             CASEA and what they recommended as improvements to
ment recommenders. Specifically, the study sought to an-              the tool, as well as any other comments about their experi-
swer qualitative questions such as “What aspects of the               ence with CASEA.
recommender creation process and the CASEA interface
do users find helpful? challenging? or confusing?". This              Prior knowledge and experience
study was similar in intent to Stumpf et al [27], who con-            Figure 4 shows a summary of the responses from subjects
ducted a user study to determine how users interact with              regarding their prior experience. As shown, most of the
a machine learning system. The study was conducted us-                subjects had prior experience with issue tracking systems,
ing a sample of eight computer science graduate and un-               with four reporting some experience and four reporting little
dergraduate students. This study population was selected              experience. Overall testing experience was a bit less, with
under the assumption that using a group with no specific              six subjects reporting little experience, two reporting “heard
project knowledge would provide a lowerbound for future               of” and one reporting no experience. Subjects had the least
in-field user studies.                                                overall experience with contributing to open source projects,
                                                                      with two reporting little experience, three reporting “heard
User Study Setup                                                      of” and four reporting no experience.1 For technical knowl-
The user study was conducted in the following manner.                 edge, Figure 5 shows a summary of the responses. Most of
First, subjects were asked to complete a prior knowledge              the subjects reported either being very familar (2 subjects)
and experience survey. Specifically, subjects were asked
                                                                         1
                                                                             Not all subjects provided answers to all of the questions.



                                                                 32
or having some familiarity (6 subjects) with user interface           the columns.
design principles from either recently or currently taking an
undergraduate course about this topic. Half of the subjects           Table 2 shows the threshold and heuristic configurations for
reported familiarity with machine learning algorithms, and            the best assignment recommender created by each of the
slightly more (5 subjects) reported familiarity with classi-          eight subjects. The first two columns list the top ten path
fiers and recommender systems. The knowledge in these                 groups for the data set and how much of the data set is
areas came from either taking an undergraduate course in              covered by the path group. As shown by the table, 77%
computational intelligence or from other course projects.             of the data set is covered by the first five path groups, and
Subjects reported the least familiarity with the bug report           most of the subjects specified heuristics for six or fewer
lifecycle and data mining.                                            path groups. Also, with

For technical knowledge, Figure 5 shows a summary of the              one exception, the “Assigned” data source was used for the
responses. Most of the subjects reported either being very            remaining 11%-27% not covered by the specified heuris-
familar (2 subjects) or having some familiarity (6 subjects)          tics. Half of the subjects chose values less than 10 for the
with user interface design principles from either recently or         threshold and the others used values greater than 20. The
currently taking an undergraduate course about this topic.            results show that the subjects were usually able to create
Half of the subjects reported familiarity with machine learn-         a reasonably accurate assignment recommender in 10 tri-
ing algorithms, and slightly more (5 subjects) reported fa-           als or less. The two most accurate recommenders (created
miliarity with classifiers and recommender systems. The               by Subjects #7 and #8) had the same configuration (i.e.
knowledge in these areas came from either taking an un-               threshold and heuristic values), as shown in Table 2.
dergraduate course in computational intelligence or from
                                                                      Qualitative Results and Observations
other course projects. Subjects reported the least familiarity
                                                                      Based on observations during the study and responses
with the bug report lifecycle and data mining.
                                                                      from the debriefing interview, subjects were found to em-
Quantitative Results                                                  ploy two strategies for assignment recommender creation
Table 1 shows the quantitative results from the eight sub-            using CASEA. Some subjects were found to be very ex-
jects. The first column identifies the subjects. The next two         perimental in their approach, making many changes be-
columns present both the number of trials a subject con-              fore creating a new recommender. Other users were more
ducted before creating their most accurate assignment rec-            methodical, making small changes and testing the results.
ommender, and the total number of trials that the subject             Figure 6 shows a categorization of the different types of
conducted in creating an assignment recommender using                 changes (heuristic change, threshold change or both) made
CASEA. The next three columns show the Top-1, Top-3 and               by each subject. As expected, subjects changed the heuris-
Top-5 precision and recall values for the best Eclipse Plat-          tic configurations the most, and most subjects only changed
form assignment recommender created by the subject. The               the threshold three times or fewer.
last three rows of Table 1 show a summary of the results,
                                                                      One subject commented that the best strategy was to make
presenting the maximum, minimum, and median values for
                                                                      small incremental changes, and that CASEA made it easy



                                                                 33
Figure 4: Responses by subjects regarding prior technical experience.




Figure 5: Responses by subjects regarding prior technical knowledge.




                                 34
            Identifier   Trial to Best      Max Trials            Top (1%)                Top (3%)                 Top (5%)
                                                           Precision      Recall     Precision     Recall      Precision     Recall
           Subject #1            5                5          68.63         1.25       54.98        3.01         66.27           6.04
           Subject #2           10               10          39.95         1.14       41.26        3.52          45.1           6.42
           Subject #3            5               16          68.63         2.08       46.49        4.22         35.25           5.33
           Subject #4            3                5          37.99         2.32       27.53        5.04         22.99           7.02
           Subject #5            2                3          81.62         1.49       62.66        3.43         51.62            4.7
           Subject #6           16               18          68.87         4.07       34.97         6.2         34.66          10.24
           Subject #7           11               20          89.22         3.48       56.45         6.6          52.7          10.27
           Subject #8           10               19          89.22         3.48       56.45         6.6          52.7          10.27
               Max             16                20          89.22         4.07       62.66         6.6         66.27          10.27
               Min              2                3           37.99         1.14       27.53        3.01         22.99           4.7
              Median           7.5               13          68.75          2.2       50.735       4.63         48.36           6.72

                              Table 1: Best Eclipse Platform assignment recommenders created by subjects.




                     Covers     Sub. #1       Sub. #2      Sub. #3        Sub. #4      Sub. #5       Sub. #6       Sub. #7        Sub. #8
       NF            30.6%     FirstResp.    Resolver     Assigned        FixedBy     FirstResp.    FixedBy        FixedBy        FixedBy
      NFV            22.6%     Resolver      Resolver     Assigned        Assigned     FixedBy      Assigned      Resolver       Resolver
        N            15.1%     Reporter      Assigned      FixedBy        Assigned    FirstResp.    Assigned      Assigned       Assigned
       NM            5.1%       FixedBy      FirstResp.   FirstResp.      Assigned    Assigned      Assigned      FirstResp.     FirstResp.
     NAFV             3.9%      FixedBy      Resolver     FirstResp.                  Assigned      Assigned      Assigned       Assigned
      NAF             3.2%                   Resolver     Reporter                                                Assigned       Assigned
       NC             3.0%                   FirstResp.   Reporter
       NX            2.5%                    FirstResp.   Assigned
     NFRFV            1.8%                    FixedBy
       NA            1.1%                    Assigned
      Other                     Assigned     Assigned      Assigned       Assigned    Assigned       FixedBy      Assigned       Assigned
Activity Threshold                   47          31           3              5            5            5             22                22

                                     Table 2: Best assignment recommender configurations of subjects.




                                                                     35
                                                                      configuration.

                                                                      As was mentioned, the subjects did not have specific knowl-
                                                                      edge about the project, such as who formed the core group
                                                                      of developers. This led to some subjects choosing a low ac-
                                                                      tivity cutoff so as to not exclude developers, and resulted in
                                                                      recommenders that were not accurate and took longer to
                                                                      create. This behavior would not be expected from an actual
                                                                      project member using CASEA, as they would have knowl-
                                                                      edge about the core development team.

                                                                      Threats to validity
         Figure 6: Types of changes made by subjects.                 This section highlights some of the threats to the internal
                                                                      validity, external validity, and construct validity of this work.

to employ this strategy. Another subject observed that cre-           Threats to the internal validity of this work relate to the po-
ating an assignment recommender using CASEA was sim-                  tential sources of error in the evaluation of CASEA. A po-
ilar to trying to get a high score in a game, where the score         tential source of error is with the creation of the data set
was the precision and recall values.                                  used for the evaluation. Although a random sample of bug
                                                                      reports was examined to establish that the data collection
As part of the recommender evaluation, CASEA provides                 procedure was correct, there may have been some bug re-
information about how long it takes to create a recom-                ports that contained incorrect data.
mender. This led some subjects to work towards an incor-
rect goal of minimizing the recommender creation time.                Threats to external validity relate to the generalizability of
                                                                      the results to other projects or user groups. In this work,
Although subjects were provided with a brief tutorial of              subjects with no project-specific knowledge were used to
CASEA and a high level explanation of the recommender                 evaluate the usability of CASEA. Therefore, these results
creation process at the beginning of the study session, sub-          would not generalize to those with project-specific knowl-
jects encountered a number of problems related to under-              edge, but could be considered as a lower-bound for such a
standing terminology or concepts. Specifically, the term              group.
“Path Group" was used to describe the categorization of
bug reports, and subjects found this term unintuitive. This           Threats to construct validity refers to the suitability of the
led to some initial confusion about the options in the heuris-        evaluation measures. The method used to determine the
tic configuration panel. Also, the meaning of the precision           set of developers that could have fixed a bug report, used
and recall metrics was not initially well understood by sub-          for calculating precision and recall, is known to overesti-
jects. However, once their meaning was understood, sub-               mate the group [4]. This results in precision values that
jects felt that they made more intelligent choices about the          are overvalued, and recall values that are undervalued.



                                                                 36
However, the evaluation results in CASEA show the rela-               Poulin et al. [28] developed ExplainD, a framework for ex-
tive differences between different configurations, so even if         plaining decisions made by classifiers that use additive evi-
the precision and recall values are over or under their true          dence, such as Naive Bayes. The framework was used in a
value, CASEA still provides meaningful information to the             bioinformatics web-based system called Proteome Analyst.
user.
                                                                      Strumbelj and Konoenko [19] presented a method for ex-
Related work                                                          plaining classifier predictions that used coalitional game
This section presents related work in the areas of assist-            theory. The method used a sampling-based approach to
ing with triage, assisting with recommender creation, and             reduce the computational complexity of explaining the con-
explaining machine learning.                                          tributions of individual feature values. Their approach was
                                                                      applied to explaining the results of various machine learning
Assisting with Bug Report Triage                                      algorithms, including Naive Bayes and SVM.
Like CASEA, Porchlight [10] and it’s predecessor Team-
Bugs [9] seek to provide a tool to assist project triagers in         Kulesza et al. [20] created an end user debugging approach
making their tasks more efficient. Porchlight allows a triager        for intelligent assistants, such as bug report assignment
to group similar bug reports together using tags, and then            recommenders. The system allowed the user to ask ‘why’
apply a triage decision to the group. This tagging is similar         questions about predictions and then change the answers
to the path groups in CASEA, which also groups bug re-                to debug current and future predictions.
ports into categories for specifying and applying labelling
                                                                      Basilio Noris developed a visualization tool for machine
heuristics.
                                                                      learning called MLDemos [24] MLDemos assists in under-
Assisting with Recommender Creation                                   standing how different machine learning algorithms func-
SkyTree Infinity [16] and BigML [8] both provide means for            tion. It also demonstrates how the parameters of the algo-
guiding a user through the creation of machine-learning               rithms affect and modify the results in classification prob-
recommenders. However, using Skytree Infinity still re-               lems.
quires advanced knowledge of machine learning and statis-
tics [13]. BigML provides no support for data preparation or          Conclusion
visualization, and creates recommenders using decision-               This paper presented the results of a pilot study of CASEA,
trees, which was shown to be ineffective for the bug report           a tool to assist in the creation of bug report assignment rec-
assignment problem [5].                                               ommenders. CASEA assists a user in labelling and filtering
                                                                      the bug reports used for creating a project-specific assign-
Explaining Machine Learning                                           ment recommender, as well as providing feedback on the
One avenue toward making the use of recommender sys-                  effectiveness of the configured assignment recommender.
tems practical is to assist in their creation and evaluation.         The study found that users with little to no project-specific
This is the approach taken by CASEA. An alternative ap-               knowledge were able to quickly create effective assignment
proach to making their use practical is by explaining their           recommenders for the Eclipse Platform project.
results.



                                                                 37
Based on feedback and the results of the user study, a              [6] S. Banitaan and M. Alenezi. 2013. TRAM: An ap-
number of future improvements were idenitified for CASEA,               proach for assigning bug reports using their Metadata.
including having CASEA first attempt to tune a recom-                   In 2013 Third International Conference on Communi-
mender automatically and then have the user tweak the                   cations and Information Technology. 215–219.
configuration, extending CASEA to assist with the creation          [7] Pamela Bhattacharya, Iulian Neamtiu, and Christian R
of other triage recommenders, supporting other machine                  Shelton. 2012. Automated, highly-accurate, bug as-
learning algorithms, and providing other evaluation met-                signment using machine learning and tossing graphs.
rics, such as F1. An improved version of CASEA, called                  Journal of Systems and Software 85, 10 (2012), 2275–
the Creation Assistant Supporting Triage Recommenders                   2292.
(CASTR) [2] was created to incorporate these changes in             [8] BigML. 2016. BigML is Machine Learning for every-
preparation for a field study with project developers.                  one. (2016). (Mar 10, 2016) https://bigml.com.
                                                                    [9] Gerald Bortis and André Van der Hoek. 2011. Team-
References                                                              bugs: A Collaborative Bug Tracking Tool. In Proceed-
 [1] John Anvik, Marshall Brooks, Henry Burton, and Justin              ings of the 4th International Workshop on Cooperative
     Canada. 2014. Assisting Software Projects with Bug                 and Human Aspects of Software Engineering (CHASE
     Report Assignment Recommender Creation. In Pro-                    ’11). ACM, New York, NY, USA, 69–71.
     ceedings of the 26th International Conference on Soft-        [10] G. Bortis and A. van der Hoek. 2013. PorchLight: A
     ware Engineering and Knowledge Engineering. 470–                   tag-based approach to bug triaging. In 2013 35th Inter-
     473.                                                               national Conference on Software Engineering (ICSE).
 [2] John Anvik, Marshall Brooks, Henry Burton, Justin                  342–351.
     Canada, and Ted Henders. 2016. CASTR - Creation               [11] Gerardo Canfora and Luigi Cerulo. 2006. Supporting
     Assistant Supporting Triage Recommenders. (2016).                  Change Request Assignment in Open Source Devel-
     (July 25, 2016) https://bitbucket.org/bugtriage/castr.             opment. In Proceedings of the 2006 ACM Symposium
 [3] John Anvik, Lyndon Hiew, and Gail C. Murphy. 2006.                 on Applied Computing (SAC ’06). ACM, New York, NY,
     Who Should Fix This Bug?. In Proceedings of the 28th               USA, 1767–1772.
     International Conference on Software Engineering              [12] Yguaratã Cerqueira Cavalcanti, Paulo Anselmo Mota
     (ICSE ’06). ACM, New York, NY, USA, 361–370.                       Silveira Neto, Ivan do Carmo Machado, Tassio Ferreira
 [4] John Anvik and Gail C. Murphy. 2007. Determining                   Vale, Eduardo Santana Almeida, and Silvio Romero
     Implementation Expertise from Bug Reports. In Fourth               de Lemos Meira. 2014. Challenges and opportunities
     International Workshop on Mining Software Reposito-                for software change request repositories: a systematic
     ries (MSR’07:ICSE Workshops 2007). 2–9.                            mapping study. Journal of Software: Evolution and
 [5] John Anvik and Gail C. Murphy. 2011. Reducing                      Process 26, 7 (2014), 620–653.
     the Effort of Bug Report Triage: Recommenders                 [13] S. Charrington. 2012. Three New Tools Bring Ma-
     for Development-oriented Decisions. ACM Trans.                     chine Learning Insights to the Masses. (2012).
     Softw. Eng. Methodol. 20, 3, Article 10 (Aug. 2011),               (Feb 27, 2012) http://readwrite.com/2012/02/27/
     35 pages.                                                          three-new-tools-bring-machine.



                                                              38
[14] Mozilla Corporation. 2014. Bugzilla. (2014). (Nov 24,            [24] B. Noris. 2016. ML Demos. (2016). (Mar 10, 2016)
     2014) http://www.bugzilla.org/.                                       http://mldemos.epfl.ch/.
[15] Davor Čubranić. 2004. Automatic bug triage using               [25] Jason D Rennie, Lawrence Shih, Jaime Teevan,
     text categorization. In Proceedings of the Sixteenth                  David R Karger, and others. 2003. Tackling the poor
     International Conference on Software Engineering and                  assumptions of naive bayes text classifiers. In ICML,
     Knowledge Engineering. 92–97.                                         Vol. 3. 616–623.
[16] Skytree Inc. 2016. Skytree Server. (2016). (Mar 10,              [26] Gregorio Robles, Stefan Koch, Jesús M GonZÁlEZ-
     2016) http://www.skytree.net.                                         BARAHonA, and Juan Carlos. 2004. Remote analysis
[17] Thorsten Joachims. 1998. Text categorization with                     and measurement of libre software systems by means
     support vector machines: Learning with many relevant                  of the CVSAnalY tool. In Proceedings of the 2nd ICSE
     features. In Proceedings of the 10th European confer-                 Workshop on Remote Analysis and Measurement of
     ence on machine learning. Springer, 137–142.                          Software Systems (RAMSS). 51–55.
[18] Stefan Koch and Georg Schneider. 2002. Effort, Cop-              [27] Simone Stumpf, Vidya Rajaram, Lida Li, Weng-Keen
     eration and Coordination in an Open Source Software                   Wong, Margaret Burnett, Thomas Dietterich, Erin Sulli-
     Project: GNOME. Information Systems Journal 12, 1                     van, and Jonathan Herlocker. 2009. Interacting mean-
     (2002), 27–42.                                                        ingfully with machine learning systems: Three exper-
[19] Igor Kononenko and others. 2010. An efficient expla-                  iments. International Journal of Human-Computer
     nation of individual classifications using game theory.               Studies 67, 8 (2009), 639–662.
     Journal of Machine Learning Research 11, Jan (2010),             [28] Duane Szafron, Brett Poulin, Roman Eisner, Paul Lu,
     1–18.                                                                 Russ Greiner, David Wishart, Alona Fyshe, Brandon
[20] Todd Kulesza, Simone Stumpf, Weng-Keen Wong,                          Pearcy, Cam Macdonell, and John Anvik. 2006. Vi-
     Margaret M Burnett, Stephen Perona, Andrew Ko, and                    sual explanation of evidence in additive classifiers. In
     Ian Oberst. 2011. Why-oriented End-user Debugging                     Proceedings of the 18th Conference on Innovative Ap-
     of Naive Bayes Text Classification. ACM Transactions                  plications of Artificial Intelligence, Vol. 2. 1822–1829.
     on Interactive Intelligent Systems (TiiS) 1, 1 (2011), 2.        [29] Xin Xia, David Lo, Xinyu Wang, and Bo Zhou. 2013.
[21] Tom M Mitchell. 1997. Machine learning. McGraw-Hill.                  Accurate developer recommendation for bug resolu-
[22] Audris Mockus, Roy T Fielding, and James D Herb-                      tion. In In Proceedings of the 20th Working Conference
     sleb. 2002. Two case studies of open source software                  on Reverse Engineering. IEEE, 72–81.
     development: Apache and Mozilla. ACM Transac-                    [30] Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A
     tions on Software Engineering and Methodology 11, 3                   lazy learning approach to multi-label learning. Pattern
     (2002), 309–346.                                                      recognition 40, 7 (2007), 2038–2048.
[23] Hoda Naguib, Nitesh Narayan, Bernd Brügge, and                   [31] Tao Zhang and Byungjeong Lee. 2013. A hybrid bug
     Dina Helal. 2013. Bug report assignee recommenda-                     triage algorithm for developer recommendation. In Pro-
     tion using activity profiles. In Proceedings of the 10th              ceedings of the 28th annual ACM symposium on ap-
     IEEE Working Conference on Mining Software Reposi-                    plied computing. ACM, 1088–1094.
     tories (MSR ’13). IEEE, 22–30.



                                                                 39