7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019)


            Studying Multifaceted Collaboration of OSS
              Developers and its Impact on their Bug
                       Fixing Performance
                                Amit Kumar                                                  Mahen Gandhi
                  Department of Information Technology                        Area of Computer Science and Engineering
                Indian Institute of Information Technology                                NIIT University
                              Allahabad, India                                            Neemrana, India
                 Email: amitchandramunityagi@gmail.com                         Email: mahenm.gandhi@st.niituniversity.in

                             Yugandhar Desai                                                Sonali Agarwal
               Area of Computer Science and Engineering                          Department of Information Technology
                           NIIT University                                     Indian Institute of Information Technology
                           Neemrana, India                                                   Allahabad, India
               Email: yugandhard.desai@st.niituniversity.in                             Email: sonali@iiita.ac.in

    Abstract—Developers often collaborate to fix complex bugs,           development problems. For example, DSNs have been used to
 even in open source software systems (OSS) where collaboration          study community structures of software developers and their
 largely occurs through discussions in the bug tracker. The implicit     evolution [1], to categorize bug reports [2], and to help in
 Developer Social Networks (DSN) are created as a result of
 these discussions. Past research has investigated the usefulness of     defect prediction [3].
 such DSNs in addressing many Software Engineering problems                 However, most of these studies explore only one type of
 (e.g. Defect Prediction, Evolution of collaboration patterns, etc.).    links among the developers (e.g. In DSN constructed from
 However, the multifaceted nature of DSNs constructed from               bug report data, the developers are connected if they have
 bug reports data has been ignored in most of the past studies.          worked together to fix the same bug report) while they
 That is, in most of the past studies, the link among developers
 exist only if they comment on the same bug report while in              are indirectly connected through various other avenues. For
 reality, the developers may be connected indirectly (e.g. pair          instance, developers who have not commented on the same
 of developers are connected even if they comment on two                 bug report but have commented on two different bug reports
 different bug reports which are associated with the same software       found in the same component of a software product, are
 component). Such unexplored relationships among developers              indirectly connected. In this paper, we consider many such
 can be used in defining new measures to identify important
 developers in the OSS system which otherwise is not trivial to          indirect connections among the developers and build the Multi-
 do. In this paper, we study this implicit multifaceted nature           layered / Multi-faceted Developer Social Network (MDSN).
 of collaborations among developers by extending single layer            In our Multi-Layered DSN, each layer represents a different
 DSN to Multi-layer DSN (MDSN). Our experiments performed                DSN which shows the links among developers capturing
 on bug data of Eclipse and NetBeans show that structure of              different types of proximity among them. We believe that a
 DSNs and their evolution at various layers differ significantly
 and performance of developers in bug fixing process is not only         holistic view of these different kinds of proximities among
 significantly correlated (Pearson correlation coefficient up to 0.74)   the developers and investigation of Multi-faceted Developer
 with their network centrality scores but also vary across various       Social Network (MDSN) can elucidate more on the nature of
 layers of MDSN signifying their usefulness in determining the           developer collaborations on issue tracking systems.
 crucial and important developers in the software systems.                  Towards our goal of investigating MDSN, we first attempt
    Index Terms—Developer Social Network, Multidimensional
 Developer Social Network, Multilayered Developer Social Net-            to answer the fundamental question if the structure of DSN
 work, Multifaceted Developer Social Network                             at various layers vary significantly from each other. Network
                                                                         Structure of DSN has been characterized by many global
                       I. I NTRODUCTION                                  social network properties in past studies [4] [5]. We also use
    Issue Tracking Systems are not only used to archive bug              global social network properties to characterize and investigate
 reports and the related information but also to help developers         DSN of various layers and hence ask the following research
 to collaborate and have a discussion on issues (bugs or                 question:
 features). Developers typically interact by commenting on bug           RQ1: How significantly the global network properties of DSN
 reports. These interactions form an implicit developer social           vary across the layers of the MDSN?
 network (DSN).                                                             Past studies have reported that DSN does not remain
    Due to the readily available data from issue trackers, re-           invariable and evolve. Studying such evolution of DSN is
 searchers have started investigating DSNs to solve software             important as it allows us to comprehend how relationships


Copyright © 2019 for this paper by its authors.                                                                                 37
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                           7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019)


 among developers evolve. MDSN has many layers of DSN
 and hence it is important to study and compare the evolution
 of each DSN. This sheds light on the dynamics of DSNs at
 each layer. In particular, we pose our second research question
 as follows:
 RQ2: How does the evolution of DSNs differ at each layer?
 Do some DSNs evolve faster than others?                                                     Fig. (1) Single-layered DSN
    The first two research questions are posed to investigate if
 the Multifaceted/Multilayered approach of studying DSN adds              the relationship between email and commit activities via these
 some value to the understanding of developer communication               networks. Hong et al. [4] have investigated how a DSN evolves
 structure or not.                                                        and compared it to the evolution in other social networks like
    However, past studies have used DSN to characterize the               Facebook, Twitter, etc. The network structure properties of
 traits, performance, and importance of the developers [6] [7]            DSNs have also been studied to identify the structures that
 [8] [9]. Node centrality measures in DSN and entropy of                  correlate with efficiency in the bug fixing process [5].
 developers contributions have been used widely to characterize              Zanetti et al. [2] found that centrality of users in a commu-
 the importance of developers in the collaboration network                nication network between bug reporters and developers to be
 of developers [10]. In MDSN, the collaboration happens at                indicative of the quality of a bug report. Cataldo and Herbsleb
 various levels and hence it will be interesting to see how               [6] observed that the core developers in the communication
 the importance of nodes in DSN is associated with their bug              structure of the organizations are top contributors. Meneely
 fixing performance. To measure the importance of the node                et al. [3] and Wolf et al. [11] used developer social network
 (developer), we compute the graph entropy-based measures                 for failure prediction. More recently, Wang and Nagappan [12]
 along with various node centrality measures at each layer                studied the distribution of collaboration patterns and used them
 of DSN. In particular, we ask our third research question as             to see the impact of such patterns on the quality of the project
 follows:                                                                 from the security point of view.
 RQ3: How significantly various metrics measuring the im-                    Our study is similar to the work described above as we
 portance of the developers in DSNs correlate with their bug              also study developer social network. However, instead of using
 fixing performance? How significantly these correlations differ          single layer DSN, we study multilayered (Multifaceted DSN).
 across different layers of MDSN?                                         Each layer in our Multifaceted DSN represents different sort
    To answer these research questions, we first construct the            of relationship among developers making it a richer framework
 Multilayered Developer Social Networks from the bug report               for depicting more complex proximities among them. Our
 data of two popular Java IDE projects-Eclipse and NetBeans.              model of MDSN is inspired by Kazienko et al. [13]. However,
 Then we use many global network properties (characterizing               to the best of our knowledge, we are first to calibrate and use
 the properties of the entire network), node importance based             it in the Software Engineering domain. We also investigate
 measures and measures to characterize the bug-fixing perfor-             how the positions of developers in networks represented by
 mance of developers to answer our research questions.                    various layers in proposed Multi-faceted DSN convey about
                      II. R ELATED W ORK                                  their performance in bug fixing activities. This investigation
                                                                          also adds to the novelty of the proposed work in this paper.
    Leveraging archival data to facilitate ongoing software de-
 velopment is a key tenet in software engineering research. One                     III. BACKGROUND AND M ETHODOLOGY
 such data that researchers have started using is the implicit
 social networks that are created because of developer interac-           A. Developer Social Network (DSN) and Multi-layered DSN
 tions: when they work on the same file or task or communicate            (MDSN)
 regarding an issue or task. Here we sample a subset of research            Issue Tracking Systems have been used as a communication
 involving DSNs which are related to our work. For example,               platform by developers while they work on the bugs reported
 Canfora et al. [9] mine data from the mailing lists to identify          by users and fellow developers. Developers usually do so by
 experienced developers who actively interact with newcomers              commenting on the bug reports. A typical issue report in ITS
 to identify mentors. Bird et al. [1] on the other hand, have             of large software ecosystem such as NetBeans or Eclipse has
 used the DSNs from mailing lists to investigate the social               many fields which provides details about the issue e.g. short
 status of OSS participants based on the network structure and            and long description of the issue, product and component of

                                                    TABLE (I) Example Bug Reports
              Bug Id    Assigned To   Severity   Priority   Product Id   Component Id     Reporter   Operating System   Commenters
               B1           D1          S1         P1           2             3             R1            Linux           D1, D4
               B2           D1          S1         P3           1            11             R2        Windows XP          D2, D3
               B3           D1          S2         P2           2             3             R3            Linux            D2
               B4           D2          S1         P1           2             4             R1         Mac OS X           D2, D4
               B5           D3          S3         P3           3            11             R3         Mac OS X           D1, D3


Copyright © 2019 for this paper by its authors.                                                                                      38
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                           7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019)

                                                          TABLE (II) Dataset
              Duration    #Bug Report’s   #Assignees   #Comments     #Commenters     #Reporters   #OS    #Products   #Components   #Versions
   Eclipse   2001-2005       27213           759         149340         2767           1911        23       59          325           88
  NetBeans   2001-2005       21345           374         134040         1905           1559        16       36          328           21


 the given product where the issue is likely to be present,              Figure 2 can be understood easily i.e. L2-D1 represents the
 operating system (the product is used/tested with), priority of         network where developers are connected with an edge if they
 the issue, severity of the issue, reporter who reported the issue,      have commented on different bug reports found in the same
 assignee (whom the issue is assigned to) etc. The discussion            product. L2-D2, L3, and L4 represent the similar semantics i.e.
 on Issue Reporting Systems has been leveraged to construct              developers in L2-D2, L3 and L4 are connected with the edge
 the Developer Social Network among developers. However,                 between them if they have commented on the different bug
 the DSN can be modeled as a single-layered DSN as well                  reports found in same product as well as same component,
 as a multilayered DSN. Most of the past studies [4] [5] [8]             two bug reports reported by same reporter, two bug reports
 consider DSN as single-layered, where developers are nodes              associated with same operating system respectively. It should
 and edges between the pair of developers exist if they have             be noted that we deliberately avoided the trivial links in layers
 commented on the same bug report.                                       L2-D1, L2-D2, L3, and L4 (replication of links in layers L2-
                                                                         D1, L2-D2, L3, L4 due to L1) by connecting only those
                                                                         developers commenting on different bug reports. We did it to
                                                                         analyze the exclusive nature and power of DSN at each layer.
                                                                         We used DSN at each layer to answer our research questions
                                                                         by exploring the global network properties and various node
                                                                         importance measures of it.
                                                                         B. Global Network Properties
                                                                            To investigate the difference in the nature and evolution of
                                                                         various DSNs at every layer, we used similar global network
                                                                         properties as used by Hong et al. [4]. We compared the DSNs
                                                                         at each layer based on the following global network properties:
                                                                            1) Network density: Network density is defined as the ratio
                                                                         of number of edges present in the network and the maximum
                                                                         possible edges which can exist in the network (excluding self-
                                                                         loops). Higher density of network indicates higher levels of
                                                                         inter-developer communication.
                   Fig. (2) Multi-layered DSN                               2) Modularity: Modularity of network is important mea-
     We model our DSN as Multi-layered DSN so that multiple              sure as higher value of modularity denotes the higher com-
 facets of the collaboration among developers can be investi-            munity structure present in the network. We used the same
 gated. In our MDSN, each layer represents a different kind              modularity definition as defined by Newman [14].
 of relationship among the developers. In the case of single-                                           n
                                                                                                        X
                                                                                                              xi − yi2
                                                                                                                         
 layer DSN (DSN considered by past studies), the developers                                       M=
 are connected with the edge between them if they have                                                  i=1
 commented on the same bug report. In MDSN, developers are                  where, xi denotes the proportion of the edges between the
 also connected even if they comment on different bugs/issues            vertices of the community i while yi denotes the proportion
 which are found in the same product, the same component                 of the edges that are not part of the community.
 of the product, reported by the same reporter or if the bugs               3) Average Path Length (APL): It is the average length
 were discovered while software product was used with the                of shortest paths between each pair of nodes in the network.
 same operating system. In total, we have five layers in our             Shorter APL shows that Developers are well connected to each
 MDSN. To illustrate further the difference between single-              other in the network and are easily accessible to each other.
 layered DSN and MDSN, let us consider a toy example data                   4) Average Clustering Coefficient: The Clustering Coeffi-
 set shown in Table I. There are five bug reports with some              cient in a graph is the degree of clustering by a node with its
 (relevant to our study) of their attributes in this table. The          neighboring nodes. The clustering coefficient of a vertex can
 single-layered DSN constructed out of this dataset is shown             be defined as follows in an undirected graph:
 in Figure 1 while MDSN consisting of five layers can be
 constructed as shown in Figure 2. It can be noted that single-                                        2xi
                                                                                                  Pi =
 layered DSN is contained in the MDSN as one layer (L1) in                                         yi (yi − 1)
 it, making it a richer network framework to depict the deeper             Here xi is the number of edges between neighbors of
 relationships among the developers. Other layers in MDSN of             node i and yi is the number of node i’s neighbors. Average


Copyright © 2019 for this paper by its authors.                                                                                    39
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                           7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019)


 Network Clustering Coefficient is the average of all network              4) Entropy based measure: The entropy of a system mea-
 nodes clustering coefficients. A significantly higher average           sures the randomness or uncertainty in it. The entropy of a
 clustering coefficient indicates that network follows small             graph measures the diversity of edges incident on its nodes.
 world phenomenon [15].                                                  Higher the entropy of the graph, more uniform is the distri-
    5) Average Degree (AD): The degree of a node in the graph            bution of its edges on its nodes. Dehmer and Mowshowitz
 is total number of edges incident on that. Since each edge has          [19] provides good survey on graph entropy. Graph entropy
 two vertices and counts in the degree of both vertices, the             can be used to measure the importance of the nodes in the
 average degree of the undirected graph is defined as :                  graph. For instance, if entropy of a graph is significantly
                                                                         changed after removing a node from it, node is considered
                                         |E|
                           AD = 2 ×                                      important. Though many definitions are available for graph
                                         |V |                            entropy in literature, we define entropy of a graph as follows:
 C. Node Importance Measures                                             Let G = (V, E) be an undirected graph. The entropy of the
   Past studies have leveraged the position of developers in             graph G, denoted as H(G) is defined as:
 DSN to investigate their importance in the developer com-
                                                                                                       X
                                                                                             H(G) =        −pi log pi
 munity. In particular, various node based centrality measures                                            iV
 in DSN have been used to measure the importance of the
                                                                         where, pi is the degree of node i divided by the sum of the
 developers in DSN. To answer our third research question, we
                                                                         degrees of all nodes in the graph. The graph entropy based
 used following node importance measures defined for DSN:
                                                                         importance H(Gi ) of node i is defined as:
   1) Eigenvector Centrality: In graph theory, eigenvector
 centrality (also called eigencentrality) is a measure of a node’s                               H(Gi ) = |E2 − E1 |
 importance in a network. The idea is to assign proportional             where, E1 is the entropy of the graph G with node i and E2
 score values to all network nodes. Let G(V, E) be a graph,              is the entropy of the graph G without node i. This helps us in
 consisting of vertices V and edges E. Let A = (av , t) be the           determining how important a node is in the graph. Higher the
 adjacency matrix, i.e av, t = 1 if vertex v is linked to vertex t,      value of H(Gi ), more is the importance of the developer i.
 and av, t = 0 otherwise. The relative centrality score of vertex
 v as defined by Phillip Bonacich [16] is:                               D. Performance of developers in bug fixing process
                        1 X            1X                                   To answer our third research question, we require two
                  xv =           xt =         av,t xt                    set of measures - Node importance measures as defined in
                        λ              λ
                           tM (v)          tG                          previous sub section and measures to quantify the performance
 where M (v) is a set of the neighbours of v and λ is a constant.        of the developers in bug fixing process. We used following
 Mathematically, this can be written in vector notation as the           measures to measure the performance of developers in bug
 famous eigenvector equation,                                            fixing process.
                                                                            1) Average fix time: The Average Fixed Time for an
                              Ax = λx                                    assignee a is estimated over a certain period of time using
 The principal eigenvector of the above equation denotes the             the equation below. To calculate the developer’s efficiency in
 centrality of all network nodes (here, node is the developer).          certain time period, we only consider the bugs that are opened
    2) Betweenness Centrality: The Betweenness Centrality of             and fixed during that time period.
                                                                                                     Pn
 the graph is determined by the propensity of a single vertex                                              t2b − t1bi
 to be more central than any other vertex in the graph. In other                             AF T = i=1 i
                                                                                                             n
 words, it measures how often a node appears on shortest paths           where,
 between nodes in the network. The following is the standard                bi = ith Bug in set of bugs assigned to the assignee a.
 measure given by Freeman [17]:                                             n = Total bugs assigned to the assignee a.
                                     X      σst (V )                        t1 = Time when the bug was assigned to the assignee.
                    CB (V ) =                                               t2 = Time when the FIXED label was added to the bug
                                              σst
                                s6=v6=tV                                report for the first time.
    where, σst is the number of shortest paths from sV to tV .            2) Aggregate Priority Points: This metric is used to mea-
    3) Closeness Centrality: A node’s closeness centrality in            sure the importance of the developer with respect to the type
 a connected graph is a measure of centrality in a network,              of bugs he/she fixes. The developer who fixes the bugs with
 measured as the reciprocal sum of the shortest path length              higher priority is considered to be more important. We assign
 between the node and all other nodes in the graph. Therefore,           priority points to each developer based on the types of bugs
 the more central the node, the closer it is to all the other            he fixes. First we assign the weightage to each priority type
 nodes. The closeness centrality as defined by Sabidussi [18]            as follows:
 is as follows:                                                                       TABLE (III) Weightage of Priorities
                                         1                                                Priority   P1   P2    P3   P4   P5
                     CC (V ) = P                                                          Points     5     4    3     2   1
                                     tV G (v, t)
                                        d


Copyright © 2019 for this paper by its authors.                                                                                40
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                             7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019)


    Then we calculate the priority points for each developer.             bug report. Hence in answering our research questions, we
 Higher value of priority points for the developer signifies              constructed single-layered and Multi-layered DSN out of the
 that he fixes the bugs with relatively higher priorities making          comments of the bug reports in our dataset. However, it
 him more important developer than those with lower priority              should be noted that to answer our RQ3, the node importance
 points.                                                                  measures and performance metrics are calculated and analyzed
    The equation for estimating the aggregate priority points of          only for assignees as they are most responsible for fixing
 developer over a certain period of time is as follows:                   the assigned bug/issue. We conjecture that the assignees with
                              Pn
                                      pi × cp                             good node importance measure in MDSN are good performers.
                      AP P = i=1                                          In particular, node importance measures of an assignee in
                                     n
                                                                          different layers influences her performance differently. For
 where,                                                                   computing various global network-based metrics and node
    n = Total number of bugs assigned to assignee a.                      importance based measures we used Gephi Network Analysis
    pi = Priority of bug.                                                 tool [20].
    cp = Points allocated to priority p.
    3) Aggregate Severity Points: This is very similar to the
                                                                                          IV. R ESULTS AND D ISCUSSION
 Aggregate Priority Points. The points allocated for each sever-
 ity are as follows.                                                        In this section, we discuss our results and their implications
                                                                          with respect to our research questions.
                TABLE (IV) Weightage of Severities
     Severity    trivial   minor   normal   major   critical   blocker
      Points        1        2       3       4         5          6
                                                                             RQ1: How significantly the global network properties of
                                                                          DSN vary across the layers of the MDSN?
    Note that NetBeans and Eclipse allow users to demand new                 To answer this research question, we computed all the
 features that are not technically real bugs. Therefore, we do            network measures defined in section III-B with the help of
 not consider those bug reports where the severity attribute              Gephi [20] and compared the global network properties of
 is set for enhancement because this category is reserved for             DSNs formed at each layer of MDSN. Past research [4]
 feature requests or improvements to the product. The formula             [5] has also used the similar approach to compare various
 for calculating the aggregate severity points is as follows:             networks.The difference in network measures across various
                               Pn                                         layers of MDSN signifies the importance of individual layers
                                      si × cs                             in MDSN making it more useful framework to study the
                      ASP = i=1                                           collaboration patterns among developers. Our results are
                                     n
 where,                                                                   shown in Figure 3. It can be seen from the figure that while
    n = Total number of bugs assigned to assignee a                       density, average path length, modularity of the DSN vary
    si = Severity of bug                                                  significantly across different layers of MDSN, the clustering
    cs = Points allocated to severity s                                   coefficient remains relatively stable across the layers. The
    4) Total Components Developer Works Upon: This measure                modularity of the network describes the community structure
 is the total number of modules/components that the assignee              of the network and past studies have leveraged modularity of
 has worked on during certain time period. This denotes the               DSN for community detection and discovering team structures
 diversity in the work profile of the developer.                          in OSS projects [1] [21] [4]. Variation in modularity across
                                                                          different layers suggest that the different community structure
 E. Experimental Set up and Dataset                                       and team structure could be discovered using our MDSN
    To carry out our work,we performed our experiments with               approach improving the knowledge about the team structure
 bug reports of two common open-source software projects-                 in OSS maintenance activities. Furthermore, other network
 Eclipse and NetBeans. We chose these projects because they               measures e.g. network density, average path length etc. have
 are very popular among the community of software engineer-               been used to predict the defects in software modules [11] and
 ing, developed around a similar time as OSS projects and                 hence it would be interesting to investigate if the accuracy
 have similar functionalities that make them a good choice to             of defect prediction models could also be improved by
 test our proposed Multi-layered Developer Social Network. In             incorporating network measures computed based on MDSN.
 total, Our dataset has 283380 comments made upon 48258 bug               In nutshell, it is clear from our results shown in Figure 3
 reports between 2001 and 2005. We chose this period as both              that network structure of DSNs formed at various layers is
 the projects during this time were in their initial phase making         significantly different from each other and encourages to
 them ideal to study their evolution. The complete details of the         leverage MDSN to investigate their usefulness in solving
 dataset are shown in Table II.                                           popular research problems e.g. community detection, defect
    In Issue Tracking System like Bugzilla (used by Eclipse               prediction etc.
 and NetBeans), though the issue is assigned to one person
 i.e. Assignee, it is fixed collaboratively by OSS contributors.            RQ2: How does the evolution of DSNs differ at each layer?
 The collaboration happens through comments made on the                   Do some DSNs evolve faster than others?


Copyright © 2019 for this paper by its authors.                                                                                  41
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                           7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019)


                              Fig. (3) Variation in Network Structure across various layers of MDSN
    To answer this research question, we first split 5 years bug           evolution and predicting some of its important aspects [22].
 report data of Eclipse and NetBeans (2001-2005) into chunks               Our study extends the past study on DSN based software
 of 6 months data. This way, we have 10 samples of bug report              evolution and suggests that studying the evolution of DSN
 data for each of the projects. Then we compute the global                 at various layers of MDSN can provide new insights to the
 network properties for each sample to see their evolution over            software evolution research. Variation in evolution of DSNs
 time. Figure 4 shows the evolution of network properties of               at various layers also suggests that predicting future aspects
 DSN over time for both Eclipse and NetBeans. It can be seen               of some DSNs is more difficult than others. For instance,
 easily from the figure that the evolution of almost all the               predicting the developers leaving the project might be much
 network properties at layer-L2-D1 (edges between developer                more difficult in layer L2-D1 and L2-D2 in comparison of
 nodes if they comment on different bug reports associated                 other layers where evolution is relatively smoother. Overall,
 with the same product) and L2-D2 (edges between developer                 the answer to this research question is affirmative based on
 nodes if they comment on different bug reports associated                 the results shown in Figure 4 adding the value to our study.
 with the same product as well as same component) evolve
 faster than properties at other layers. This out-performance                 RQ3: How significantly various metrics measuring the
 of these layers is observed for both NetBeans as well as                  importance of the developers in DSN correlate with their bug
 Eclipse. Graph-based metrics such as density, modularity                  fixing performance? How significantly these correlations differ
 and their evolution have been used to study the software                  across different layers of MDSN?

                                                    TABLE (V) Correlation Analysis
                                                 Avg Fixed Time        Total Components       Aggregate Priority Points   Aggregate Severity Points
     Layers                Metric
                                              Eclipse    NetBeans      Eclipse   NetBeans     Eclipse     NetBeans        Eclipse     NetBeans
                   Betweenness Centrality      -0.079       -0.161    0.209**     0.506**     0.742**       0.563**       0.744**      0.578**
                     Closeness Centrality    -0.195**      -0.434*    0.181**     0.646**     0.336**       0.636**       0.338**      0.648**
       L1
                    Eigenvector Centrality    -0.161*      -0.443*    0.315**     0.656**     0.482**       0.656**       0.475**      0.651**
                   Entropy Based Measure      -0.162*       -0.222    0.409**       0.005     0.635**        -0.087       0.631**       -0.087
                   Betweenness Centrality      -0.059       -0.171    0.282**     0.506**     0.269**        0.358        0.261**       0.396*
                     Closeness Centrality     -0.141*      -0.386*      0.092     0.646**      0.162*       0.472*         0.157*      0.489**
     L2 - D1
                    Eigenvector Centrality     -0.096      -0.424*      0.063      0.451*      0.032        0.442*          0.03        0.448*
                   Entropy Based Measure      0.162*        0.026      -0.097       0.045     -0.146*        -0.061       -0.139*       -0.029
                   Betweenness Centrality      -0.064       -0.119    0.297**      0.48**     0.275**        0.294        0.269**        0.326
                     Closeness Centrality     -0.141*      -0.434*      0.106     0.521**     0.175**       0.464*         0.172*      0.473**
     L2 - D2
                    Eigenvector Centrality     -0.092      -0.384*      0.086      0.48**      0.035        0.452*          0.034      0.485**
                   Entropy Based Measure      -0.156*       0.022       0.10*      -0.212     0.191**        -0.093       0.185**       -0.092
                   Betweenness Centrality       -0.11        -0.18    0.223**      0.468*      0.74**       0.642**       0.745**      0.647**
                     Closeness Centrality    -0.271**       -0.314    0.274**     0.641**     0.364**       0.651**       0.366**      0.654**
       L3
                    Eigenvector Centrality   -0.242**     -0.523**    0.311**     0.553**     0.357**       0.526**       0.356**      0.525**
                   Entropy Based Measure     -0.232**       -0.212    0.365**        0.08     0.474**         -0.01       0.472**       -0.012
                   Betweenness Centrality     -0.25**     -0.631**    0.246**     0.524**     0.428**        0.45*        0.436**       0.466*
                     Closeness Centrality    -0.347**     -0.656**    0.329**      0.442*     0.285**       0.416*        0.287**       0.422*
       L4
                    Eigenvector Centrality   -0.393**     -0.788**    0.279**      0.385*     0.204**       0.371*        0.203**       0.373*
                   Entropy Based Measure      -0.39**       -0.365    0.301**       0.164     0.251**        0.213        0.252**        0.222
                   Betweenness Centrality    -0.263**     -0.656**    0.255**      0.438*      0.32**       0.402*        0.325**       0.407*
                     Closeness Centrality    -0.358**     -0.711**    0.277**      0.397*     0.242**       0.379*        0.243**        0.38*
 Combined DSN
                    Eigenvector Centrality   -0.424**     -0.743**    0.247**      0.376*     0.192**         0.36        0.192**        0.36
                   Entropy Based Measure     -0.418**       -0.243     0.26**       0.125     0.216**        0.115        0.218**        0.137
                                                        ∗p < .05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < .001


Copyright © 2019 for this paper by its authors.                                                                                           42
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                           7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019)


       Fig. (4) Layer-wise evolution of global network properties of DSN ( Eclipse,                NetBeans) from 2001 to 2005.
    The main findings of this paper is the answer to this research       studies show that predicting the fix-time of a bug in OSS is
 question. To answer this research question, we first computed           hard. Bhattacharya and Neamtiu [25] reported that many of
 various node importance measures for the DSN at each layer              the features/attributes considered by researchers to build the
 as defined in section III-C and the measures to characterize            predictor in predicting the average fix time of the bug are not
 the performance of the developers as defined in section III-D.          found relevant. Our results suggest that node importance based
 Then we used the Pearson correlation coefficient to see how ef-         measures for assignee can prove to be good features for such
 fectively and strongly these two sets of measures correlate with        predictors. Second, most of the past studies considered the
 each other. We chose Pearson Correlation Coefficient to see             node-based centrality at layer-1 only while our study suggests
 the strength of association between two types of measures as            that similar centrality measures perform better if we leverage
 it has been found useful by many past studies [23] [24]. Table          MDSN instead of single-layered DSN.
 V shows a summary of our results. The values of correlation                A closer look at Table V shows that node importance
 coefficients where the p-value is less than 0.05 are specifically       measures are also significantly correlated with the total number
 highlighted. To see the cumulative effect of node importance            of components the developer worked upon in fixing the issues.
 measures on the performance of the developers we merged the             This suggests that developers with high node importance
 DSNs of all the layers into one. In this merged integrated DSN,         measures gain diverse expertise in fixing the issues making
 a node exists between the pair of the developers if there exists        them more crucial/important for the organization. A significant
 a link between them in any of the layers - L1, L2-D1, L2-D2,            correlation between node importance measures of developers
 L3, L4. In general, our results are encouraging. For instance,          and aggregate priority points as well as aggregate severity
 the Eigenvector centrality value of a node in DSN is negatively         points of the bugs fixed by them shows that node importance
 correlated with the average fix time suggesting that developers         measures selected in our study are good measures to identify
 who enjoy good eigenvector centrality value fix the issue faster        crucial and important developers (developers who are good to
 than their peers with lower eigenvector centrality. The value of        fix the bugs with high priority and high severity).
 the correlation coefficient between Average fix time and other
 node importance measures is also significant. It can also be              Interestingly, MDSN approach of investigating the impact
 seen that the correlation coefficient between Average fix time          of various node importance measures on their bug fixing
 and other node importance measures is maximum for layer-4               performance provides new insight as many of the results are
 out of all the individual layers. This is interesting because past      counter-intuitive, e.g. best correlation is found for layer-4
                                                                         (where developers are connected if they have commented on


Copyright © 2019 for this paper by its authors.                                                                                 43
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                              7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019)


 two different bug reports associated with the same operating                      [6] M. Cataldo and J. D. Herbsleb, “Communication networks in geograph-
 system). Though the correlation is found significant for both                         ically distributed software development,” in Proceedings of the 2008
                                                                                       ACM conference on Computer supported cooperative work, pp. 579–
 the projects making the finding general enough, the value of                          588, ACM, 2008.
 correlation coefficients are found to be higher with NetBeans                     [7] M. Joblin, S. Apel, and W. Mauerer, “Evolutionary trends of developer
 data.                                                                                 coordination: A network approach,” Empirical Software Engineering,
                                                                                       vol. 22, no. 4, pp. 2050–2094, 2017.
    Overall, there are two takeaways from our results-first,                       [8] J. Xuan, H. Jiang, Z. Ren, and W. Zou, “Developer prioritization in
 significant correlation between various node-based measures                           bug repositories,” in 2012 34th International Conference on Software
 and performance of developers suggests that these measures                            Engineering (ICSE), pp. 25–35, IEEE, 2012.
                                                                                   [9] G. Canfora, M. Di Penta, R. Oliveto, and S. Panichella, “Who is going
 can be used to identify important developers. Second, the                             to mentor newcomers in open source projects?,” in Proceedings of the
 measures based on different layers of MDSN are differently                            ACM SIGSOFT 20th International Symposium on the Foundations of
 correlated with the performance of developers, making the                             Software Engineering, p. 44, ACM, 2012.
                                                                                  [10] Q. C. Taylor, J. E. Stevenson, D. P. Delorey, and C. D. Knutson, “Author
 MDSN framework worthy enough to try for identifying the                               entropy: A metric for characterization of software authorship patterns,”
 crucial and important developers in OSS.                                              in Third International Workshop on Public Data about Software Devel-
                                                                                       opment (WoPDaSD08), p. 6, 2008.
                                                                                  [11] T. Wolf, A. Schroter, D. Damian, and T. Nguyen, “Predicting build
              V. C ONCLUSION AND F UTURE W ORK                                         failures using social network analysis on developer communication,”
                                                                                       in Proceedings of the 31st International Conference on Software Engi-
    In this research, we proposed MDSN to investigate the                              neering, pp. 1–11, IEEE Computer Society, 2009.
 multifaceted nature of collaboration among the developers                        [12] S. Wang and N. Nagappan, “Characterizing and understanding soft-
 while they fix the bugs and collaborate through the Issue Re-                         ware developer networks in security development,” arXiv preprint
                                                                                       arXiv:1907.12141, 2019.
 porting System. There are many takeaways from our research.                      [13] P. Kazienko, K. Musial, and T. Kajdanowicz, “Multidimensional social
 First, since the structure of networks varies significantly                           network in the social recommender system,” IEEE Transactions on
 across various layers of MDSN, replicating the past studies                           Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 41,
                                                                                       no. 4, pp. 746–759, 2011.
 on community detection and identifying team formation in                         [14] M. E. Newman, “Modularity and community structure in networks,”
 OSS on the MDSN framework may provide new insights.                                   Proceedings of the national academy of sciences, vol. 103, no. 23,
 Second, Our results show that many node importance measures                           pp. 8577–8582, 2006.
                                                                                  [15] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-
 i.e. node centrality based metrics and graph entropy-based                            world’networks,” nature, vol. 393, no. 6684, p. 440, 1998.
 measures have a significant correlation with the performance                     [16] P. Bonacich, “Power and centrality: A family of measures,” American
 of the developers in the bug fixing process. Further, such                            Journal of Sociology, vol. 92, no. 5, pp. 1170–1182, 1987.
                                                                                  [17] L. C. Freeman, “A set of measures of centrality based on betweenness,”
 correlations vary significantly across the layers suggesting that                     Sociometry, vol. 40, no. 1, pp. 35–41, 1977.
 MDSN could be more useful to identify important and crucial                      [18] G. Sabidussi, “The centrality index of a graph,” Psychometrika, vol. 31,
 developers in the developer community of OSS. Though our                              pp. 581–603, Dec 1966.
                                                                                  [19] M. Dehmer and A. Mowshowitz, “A history of graph entropy measures,”
 results are consistent with both the case studies, we selected                        Information Sciences, vol. 181, no. 1, pp. 57–78, 2011.
 for our research, there are few threats to its validity. First,                  [20] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: an open source soft-
 comments are not made only by developers on ITS (Issue                                ware for exploring and manipulating networks,” in Third international
                                                                                       AAAI conference on weblogs and social media, 2009.
 Tracking Systems) and hence considering all commenters as                        [21] C. Bird, D. Pattison, R. D’Souza, V. Filkov, and P. Devanbu, “Latent
 developers could be a threat to the validity of our results.                          social structure in open source projects,” in Proceedings of the 16th
 Second, ITS is not the only platform where developers col-                            ACM SIGSOFT International Symposium on Foundations of software
                                                                                       engineering, pp. 24–35, ACM, 2008.
 laborate. Past research has also used version control data to                    [22] P. Bhattacharya, M. Iliofotou, I. Neamtiu, and M. Faloutsos, “Graph-
 study collaboration among developers. Hence, performing our                           based analysis and prediction for software evolution,” in 2012 34th
 study on version control data can complement our study. We                            International Conference on Software Engineering (ICSE), pp. 419–429,
                                                                                       IEEE, 2012.
 plan this in our future work.                                                    [23] Z. Zhang, W. K. Chan, T. Tse, P. Hu, and X. Wang, “Is non-parametric
                                                                                       hypothesis testing model robust for statistical fault localization?,” In-
                               R EFERENCES                                             formation and Software Technology, vol. 51, no. 11, pp. 1573–1585,
                                                                                       2009.
  [1] C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A. Swaminathan, “Min-        [24] T. Zimmermann, R. Premraj, and A. Zeller, “Predicting defects for
      ing email social networks,” in Proceedings of the 2006 international             eclipse,” in Third International Workshop on Predictor Models in
      workshop on Mining software repositories, pp. 137–143, ACM, 2006.                Software Engineering (PROMISE’07: ICSE Workshops 2007), pp. 9–9,
  [2] M. S. Zanetti, I. Scholtes, C. J. Tessone, and F. Schweitzer, “Categoriz-        IEEE, 2007.
      ing bugs with social networks: a case study on four open source software    [25] P. Bhattacharya and I. Neamtiu, “Bug-fix time prediction models: can
      communities,” in Proceedings of the 2013 International Conference on             we do better?,” in Proceedings of the 8th Working Conference on Mining
      Software Engineering, pp. 1032–1041, IEEE Press, 2013.                           Software Repositories, pp. 207–210, ACM, 2011.
  [3] A. Meneely, L. Williams, W. Snipes, and J. Osborne, “Predicting failures
      with developer networks and social network analysis,” in Proceedings
      of the 16th ACM SIGSOFT International Symposium on Foundations of
      software engineering, pp. 13–23, ACM, 2008.
  [4] Q. Hong, S. Kim, S. C. Cheung, and C. Bird, “Understanding a developer
      social network and its evolution,” in 2011 27th IEEE international
      conference on software maintenance (ICSM), pp. 323–332, IEEE, 2011.
  [5] A. Kumar and A. Gupta, “Evolution of developer social network and its
      impact on bug fixing process,” in Proceedings of the 6th India Software
      Engineering Conference, pp. 63–72, ACM, 2013.


Copyright © 2019 for this paper by its authors.                                                                                                       44
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).