7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019) Studying Multifaceted Collaboration of OSS Developers and its Impact on their Bug Fixing Performance Amit Kumar Mahen Gandhi Department of Information Technology Area of Computer Science and Engineering Indian Institute of Information Technology NIIT University Allahabad, India Neemrana, India Email: amitchandramunityagi@gmail.com Email: mahenm.gandhi@st.niituniversity.in Yugandhar Desai Sonali Agarwal Area of Computer Science and Engineering Department of Information Technology NIIT University Indian Institute of Information Technology Neemrana, India Allahabad, India Email: yugandhard.desai@st.niituniversity.in Email: sonali@iiita.ac.in Abstract—Developers often collaborate to fix complex bugs, development problems. For example, DSNs have been used to even in open source software systems (OSS) where collaboration study community structures of software developers and their largely occurs through discussions in the bug tracker. The implicit evolution [1], to categorize bug reports [2], and to help in Developer Social Networks (DSN) are created as a result of these discussions. Past research has investigated the usefulness of defect prediction [3]. such DSNs in addressing many Software Engineering problems However, most of these studies explore only one type of (e.g. Defect Prediction, Evolution of collaboration patterns, etc.). links among the developers (e.g. In DSN constructed from However, the multifaceted nature of DSNs constructed from bug report data, the developers are connected if they have bug reports data has been ignored in most of the past studies. worked together to fix the same bug report) while they That is, in most of the past studies, the link among developers exist only if they comment on the same bug report while in are indirectly connected through various other avenues. For reality, the developers may be connected indirectly (e.g. pair instance, developers who have not commented on the same of developers are connected even if they comment on two bug report but have commented on two different bug reports different bug reports which are associated with the same software found in the same component of a software product, are component). Such unexplored relationships among developers indirectly connected. In this paper, we consider many such can be used in defining new measures to identify important developers in the OSS system which otherwise is not trivial to indirect connections among the developers and build the Multi- do. In this paper, we study this implicit multifaceted nature layered / Multi-faceted Developer Social Network (MDSN). of collaborations among developers by extending single layer In our Multi-Layered DSN, each layer represents a different DSN to Multi-layer DSN (MDSN). Our experiments performed DSN which shows the links among developers capturing on bug data of Eclipse and NetBeans show that structure of different types of proximity among them. We believe that a DSNs and their evolution at various layers differ significantly and performance of developers in bug fixing process is not only holistic view of these different kinds of proximities among significantly correlated (Pearson correlation coefficient up to 0.74) the developers and investigation of Multi-faceted Developer with their network centrality scores but also vary across various Social Network (MDSN) can elucidate more on the nature of layers of MDSN signifying their usefulness in determining the developer collaborations on issue tracking systems. crucial and important developers in the software systems. Towards our goal of investigating MDSN, we first attempt Index Terms—Developer Social Network, Multidimensional Developer Social Network, Multilayered Developer Social Net- to answer the fundamental question if the structure of DSN work, Multifaceted Developer Social Network at various layers vary significantly from each other. Network Structure of DSN has been characterized by many global I. I NTRODUCTION social network properties in past studies [4] [5]. We also use Issue Tracking Systems are not only used to archive bug global social network properties to characterize and investigate reports and the related information but also to help developers DSN of various layers and hence ask the following research to collaborate and have a discussion on issues (bugs or question: features). Developers typically interact by commenting on bug RQ1: How significantly the global network properties of DSN reports. These interactions form an implicit developer social vary across the layers of the MDSN? network (DSN). Past studies have reported that DSN does not remain Due to the readily available data from issue trackers, re- invariable and evolve. Studying such evolution of DSN is searchers have started investigating DSNs to solve software important as it allows us to comprehend how relationships Copyright © 2019 for this paper by its authors. 37 Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019) among developers evolve. MDSN has many layers of DSN and hence it is important to study and compare the evolution of each DSN. This sheds light on the dynamics of DSNs at each layer. In particular, we pose our second research question as follows: RQ2: How does the evolution of DSNs differ at each layer? Do some DSNs evolve faster than others? Fig. (1) Single-layered DSN The first two research questions are posed to investigate if the Multifaceted/Multilayered approach of studying DSN adds the relationship between email and commit activities via these some value to the understanding of developer communication networks. Hong et al. [4] have investigated how a DSN evolves structure or not. and compared it to the evolution in other social networks like However, past studies have used DSN to characterize the Facebook, Twitter, etc. The network structure properties of traits, performance, and importance of the developers [6] [7] DSNs have also been studied to identify the structures that [8] [9]. Node centrality measures in DSN and entropy of correlate with efficiency in the bug fixing process [5]. developers contributions have been used widely to characterize Zanetti et al. [2] found that centrality of users in a commu- the importance of developers in the collaboration network nication network between bug reporters and developers to be of developers [10]. In MDSN, the collaboration happens at indicative of the quality of a bug report. Cataldo and Herbsleb various levels and hence it will be interesting to see how [6] observed that the core developers in the communication the importance of nodes in DSN is associated with their bug structure of the organizations are top contributors. Meneely fixing performance. To measure the importance of the node et al. [3] and Wolf et al. [11] used developer social network (developer), we compute the graph entropy-based measures for failure prediction. More recently, Wang and Nagappan [12] along with various node centrality measures at each layer studied the distribution of collaboration patterns and used them of DSN. In particular, we ask our third research question as to see the impact of such patterns on the quality of the project follows: from the security point of view. RQ3: How significantly various metrics measuring the im- Our study is similar to the work described above as we portance of the developers in DSNs correlate with their bug also study developer social network. However, instead of using fixing performance? How significantly these correlations differ single layer DSN, we study multilayered (Multifaceted DSN). across different layers of MDSN? Each layer in our Multifaceted DSN represents different sort To answer these research questions, we first construct the of relationship among developers making it a richer framework Multilayered Developer Social Networks from the bug report for depicting more complex proximities among them. Our data of two popular Java IDE projects-Eclipse and NetBeans. model of MDSN is inspired by Kazienko et al. [13]. However, Then we use many global network properties (characterizing to the best of our knowledge, we are first to calibrate and use the properties of the entire network), node importance based it in the Software Engineering domain. We also investigate measures and measures to characterize the bug-fixing perfor- how the positions of developers in networks represented by mance of developers to answer our research questions. various layers in proposed Multi-faceted DSN convey about II. R ELATED W ORK their performance in bug fixing activities. This investigation also adds to the novelty of the proposed work in this paper. Leveraging archival data to facilitate ongoing software de- velopment is a key tenet in software engineering research. One III. BACKGROUND AND M ETHODOLOGY such data that researchers have started using is the implicit social networks that are created because of developer interac- A. Developer Social Network (DSN) and Multi-layered DSN tions: when they work on the same file or task or communicate (MDSN) regarding an issue or task. Here we sample a subset of research Issue Tracking Systems have been used as a communication involving DSNs which are related to our work. For example, platform by developers while they work on the bugs reported Canfora et al. [9] mine data from the mailing lists to identify by users and fellow developers. Developers usually do so by experienced developers who actively interact with newcomers commenting on the bug reports. A typical issue report in ITS to identify mentors. Bird et al. [1] on the other hand, have of large software ecosystem such as NetBeans or Eclipse has used the DSNs from mailing lists to investigate the social many fields which provides details about the issue e.g. short status of OSS participants based on the network structure and and long description of the issue, product and component of TABLE (I) Example Bug Reports Bug Id Assigned To Severity Priority Product Id Component Id Reporter Operating System Commenters B1 D1 S1 P1 2 3 R1 Linux D1, D4 B2 D1 S1 P3 1 11 R2 Windows XP D2, D3 B3 D1 S2 P2 2 3 R3 Linux D2 B4 D2 S1 P1 2 4 R1 Mac OS X D2, D4 B5 D3 S3 P3 3 11 R3 Mac OS X D1, D3 Copyright © 2019 for this paper by its authors. 38 Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019) TABLE (II) Dataset Duration #Bug Report’s #Assignees #Comments #Commenters #Reporters #OS #Products #Components #Versions Eclipse 2001-2005 27213 759 149340 2767 1911 23 59 325 88 NetBeans 2001-2005 21345 374 134040 1905 1559 16 36 328 21 the given product where the issue is likely to be present, Figure 2 can be understood easily i.e. L2-D1 represents the operating system (the product is used/tested with), priority of network where developers are connected with an edge if they the issue, severity of the issue, reporter who reported the issue, have commented on different bug reports found in the same assignee (whom the issue is assigned to) etc. The discussion product. L2-D2, L3, and L4 represent the similar semantics i.e. on Issue Reporting Systems has been leveraged to construct developers in L2-D2, L3 and L4 are connected with the edge the Developer Social Network among developers. However, between them if they have commented on the different bug the DSN can be modeled as a single-layered DSN as well reports found in same product as well as same component, as a multilayered DSN. Most of the past studies [4] [5] [8] two bug reports reported by same reporter, two bug reports consider DSN as single-layered, where developers are nodes associated with same operating system respectively. It should and edges between the pair of developers exist if they have be noted that we deliberately avoided the trivial links in layers commented on the same bug report. L2-D1, L2-D2, L3, and L4 (replication of links in layers L2- D1, L2-D2, L3, L4 due to L1) by connecting only those developers commenting on different bug reports. We did it to analyze the exclusive nature and power of DSN at each layer. We used DSN at each layer to answer our research questions by exploring the global network properties and various node importance measures of it. B. Global Network Properties To investigate the difference in the nature and evolution of various DSNs at every layer, we used similar global network properties as used by Hong et al. [4]. We compared the DSNs at each layer based on the following global network properties: 1) Network density: Network density is defined as the ratio of number of edges present in the network and the maximum possible edges which can exist in the network (excluding self- loops). Higher density of network indicates higher levels of inter-developer communication. Fig. (2) Multi-layered DSN 2) Modularity: Modularity of network is important mea- We model our DSN as Multi-layered DSN so that multiple sure as higher value of modularity denotes the higher com- facets of the collaboration among developers can be investi- munity structure present in the network. We used the same gated. In our MDSN, each layer represents a different kind modularity definition as defined by Newman [14]. of relationship among the developers. In the case of single- n X xi − yi2  layer DSN (DSN considered by past studies), the developers M= are connected with the edge between them if they have i=1 commented on the same bug report. In MDSN, developers are where, xi denotes the proportion of the edges between the also connected even if they comment on different bugs/issues vertices of the community i while yi denotes the proportion which are found in the same product, the same component of the edges that are not part of the community. of the product, reported by the same reporter or if the bugs 3) Average Path Length (APL): It is the average length were discovered while software product was used with the of shortest paths between each pair of nodes in the network. same operating system. In total, we have five layers in our Shorter APL shows that Developers are well connected to each MDSN. To illustrate further the difference between single- other in the network and are easily accessible to each other. layered DSN and MDSN, let us consider a toy example data 4) Average Clustering Coefficient: The Clustering Coeffi- set shown in Table I. There are five bug reports with some cient in a graph is the degree of clustering by a node with its (relevant to our study) of their attributes in this table. The neighboring nodes. The clustering coefficient of a vertex can single-layered DSN constructed out of this dataset is shown be defined as follows in an undirected graph: in Figure 1 while MDSN consisting of five layers can be constructed as shown in Figure 2. It can be noted that single- 2xi Pi = layered DSN is contained in the MDSN as one layer (L1) in yi (yi − 1) it, making it a richer network framework to depict the deeper Here xi is the number of edges between neighbors of relationships among the developers. Other layers in MDSN of node i and yi is the number of node i’s neighbors. Average Copyright © 2019 for this paper by its authors. 39 Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019) Network Clustering Coefficient is the average of all network 4) Entropy based measure: The entropy of a system mea- nodes clustering coefficients. A significantly higher average sures the randomness or uncertainty in it. The entropy of a clustering coefficient indicates that network follows small graph measures the diversity of edges incident on its nodes. world phenomenon [15]. Higher the entropy of the graph, more uniform is the distri- 5) Average Degree (AD): The degree of a node in the graph bution of its edges on its nodes. Dehmer and Mowshowitz is total number of edges incident on that. Since each edge has [19] provides good survey on graph entropy. Graph entropy two vertices and counts in the degree of both vertices, the can be used to measure the importance of the nodes in the average degree of the undirected graph is defined as : graph. For instance, if entropy of a graph is significantly changed after removing a node from it, node is considered |E| AD = 2 × important. Though many definitions are available for graph |V | entropy in literature, we define entropy of a graph as follows: C. Node Importance Measures Let G = (V, E) be an undirected graph. The entropy of the Past studies have leveraged the position of developers in graph G, denoted as H(G) is defined as: DSN to investigate their importance in the developer com- X H(G) = −pi log pi munity. In particular, various node based centrality measures iV in DSN have been used to measure the importance of the where, pi is the degree of node i divided by the sum of the developers in DSN. To answer our third research question, we degrees of all nodes in the graph. The graph entropy based used following node importance measures defined for DSN: importance H(Gi ) of node i is defined as: 1) Eigenvector Centrality: In graph theory, eigenvector centrality (also called eigencentrality) is a measure of a node’s H(Gi ) = |E2 − E1 | importance in a network. The idea is to assign proportional where, E1 is the entropy of the graph G with node i and E2 score values to all network nodes. Let G(V, E) be a graph, is the entropy of the graph G without node i. This helps us in consisting of vertices V and edges E. Let A = (av , t) be the determining how important a node is in the graph. Higher the adjacency matrix, i.e av, t = 1 if vertex v is linked to vertex t, value of H(Gi ), more is the importance of the developer i. and av, t = 0 otherwise. The relative centrality score of vertex v as defined by Phillip Bonacich [16] is: D. Performance of developers in bug fixing process 1 X 1X To answer our third research question, we require two xv = xt = av,t xt set of measures - Node importance measures as defined in λ λ tM (v) tG previous sub section and measures to quantify the performance where M (v) is a set of the neighbours of v and λ is a constant. of the developers in bug fixing process. We used following Mathematically, this can be written in vector notation as the measures to measure the performance of developers in bug famous eigenvector equation, fixing process. 1) Average fix time: The Average Fixed Time for an Ax = λx assignee a is estimated over a certain period of time using The principal eigenvector of the above equation denotes the the equation below. To calculate the developer’s efficiency in centrality of all network nodes (here, node is the developer). certain time period, we only consider the bugs that are opened 2) Betweenness Centrality: The Betweenness Centrality of and fixed during that time period. Pn the graph is determined by the propensity of a single vertex t2b − t1bi to be more central than any other vertex in the graph. In other AF T = i=1 i n words, it measures how often a node appears on shortest paths where, between nodes in the network. The following is the standard bi = ith Bug in set of bugs assigned to the assignee a. measure given by Freeman [17]: n = Total bugs assigned to the assignee a. X σst (V ) t1 = Time when the bug was assigned to the assignee. CB (V ) = t2 = Time when the FIXED label was added to the bug σst s6=v6=tV report for the first time. where, σst is the number of shortest paths from sV to tV . 2) Aggregate Priority Points: This metric is used to mea- 3) Closeness Centrality: A node’s closeness centrality in sure the importance of the developer with respect to the type a connected graph is a measure of centrality in a network, of bugs he/she fixes. The developer who fixes the bugs with measured as the reciprocal sum of the shortest path length higher priority is considered to be more important. We assign between the node and all other nodes in the graph. Therefore, priority points to each developer based on the types of bugs the more central the node, the closer it is to all the other he fixes. First we assign the weightage to each priority type nodes. The closeness centrality as defined by Sabidussi [18] as follows: is as follows: TABLE (III) Weightage of Priorities 1 Priority P1 P2 P3 P4 P5 CC (V ) = P Points 5 4 3 2 1 tV G (v, t) d Copyright © 2019 for this paper by its authors. 40 Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019) Then we calculate the priority points for each developer. bug report. Hence in answering our research questions, we Higher value of priority points for the developer signifies constructed single-layered and Multi-layered DSN out of the that he fixes the bugs with relatively higher priorities making comments of the bug reports in our dataset. However, it him more important developer than those with lower priority should be noted that to answer our RQ3, the node importance points. measures and performance metrics are calculated and analyzed The equation for estimating the aggregate priority points of only for assignees as they are most responsible for fixing developer over a certain period of time is as follows: the assigned bug/issue. We conjecture that the assignees with Pn pi × cp good node importance measure in MDSN are good performers. AP P = i=1 In particular, node importance measures of an assignee in n different layers influences her performance differently. For where, computing various global network-based metrics and node n = Total number of bugs assigned to assignee a. importance based measures we used Gephi Network Analysis pi = Priority of bug. tool [20]. cp = Points allocated to priority p. 3) Aggregate Severity Points: This is very similar to the IV. R ESULTS AND D ISCUSSION Aggregate Priority Points. The points allocated for each sever- ity are as follows. In this section, we discuss our results and their implications with respect to our research questions. TABLE (IV) Weightage of Severities Severity trivial minor normal major critical blocker Points 1 2 3 4 5 6 RQ1: How significantly the global network properties of DSN vary across the layers of the MDSN? Note that NetBeans and Eclipse allow users to demand new To answer this research question, we computed all the features that are not technically real bugs. Therefore, we do network measures defined in section III-B with the help of not consider those bug reports where the severity attribute Gephi [20] and compared the global network properties of is set for enhancement because this category is reserved for DSNs formed at each layer of MDSN. Past research [4] feature requests or improvements to the product. The formula [5] has also used the similar approach to compare various for calculating the aggregate severity points is as follows: networks.The difference in network measures across various Pn layers of MDSN signifies the importance of individual layers si × cs in MDSN making it more useful framework to study the ASP = i=1 collaboration patterns among developers. Our results are n where, shown in Figure 3. It can be seen from the figure that while n = Total number of bugs assigned to assignee a density, average path length, modularity of the DSN vary si = Severity of bug significantly across different layers of MDSN, the clustering cs = Points allocated to severity s coefficient remains relatively stable across the layers. The 4) Total Components Developer Works Upon: This measure modularity of the network describes the community structure is the total number of modules/components that the assignee of the network and past studies have leveraged modularity of has worked on during certain time period. This denotes the DSN for community detection and discovering team structures diversity in the work profile of the developer. in OSS projects [1] [21] [4]. Variation in modularity across different layers suggest that the different community structure E. Experimental Set up and Dataset and team structure could be discovered using our MDSN To carry out our work,we performed our experiments with approach improving the knowledge about the team structure bug reports of two common open-source software projects- in OSS maintenance activities. Furthermore, other network Eclipse and NetBeans. We chose these projects because they measures e.g. network density, average path length etc. have are very popular among the community of software engineer- been used to predict the defects in software modules [11] and ing, developed around a similar time as OSS projects and hence it would be interesting to investigate if the accuracy have similar functionalities that make them a good choice to of defect prediction models could also be improved by test our proposed Multi-layered Developer Social Network. In incorporating network measures computed based on MDSN. total, Our dataset has 283380 comments made upon 48258 bug In nutshell, it is clear from our results shown in Figure 3 reports between 2001 and 2005. We chose this period as both that network structure of DSNs formed at various layers is the projects during this time were in their initial phase making significantly different from each other and encourages to them ideal to study their evolution. The complete details of the leverage MDSN to investigate their usefulness in solving dataset are shown in Table II. popular research problems e.g. community detection, defect In Issue Tracking System like Bugzilla (used by Eclipse prediction etc. and NetBeans), though the issue is assigned to one person i.e. Assignee, it is fixed collaboratively by OSS contributors. RQ2: How does the evolution of DSNs differ at each layer? The collaboration happens through comments made on the Do some DSNs evolve faster than others? Copyright © 2019 for this paper by its authors. 41 Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019) Fig. (3) Variation in Network Structure across various layers of MDSN To answer this research question, we first split 5 years bug evolution and predicting some of its important aspects [22]. report data of Eclipse and NetBeans (2001-2005) into chunks Our study extends the past study on DSN based software of 6 months data. This way, we have 10 samples of bug report evolution and suggests that studying the evolution of DSN data for each of the projects. Then we compute the global at various layers of MDSN can provide new insights to the network properties for each sample to see their evolution over software evolution research. Variation in evolution of DSNs time. Figure 4 shows the evolution of network properties of at various layers also suggests that predicting future aspects DSN over time for both Eclipse and NetBeans. It can be seen of some DSNs is more difficult than others. For instance, easily from the figure that the evolution of almost all the predicting the developers leaving the project might be much network properties at layer-L2-D1 (edges between developer more difficult in layer L2-D1 and L2-D2 in comparison of nodes if they comment on different bug reports associated other layers where evolution is relatively smoother. Overall, with the same product) and L2-D2 (edges between developer the answer to this research question is affirmative based on nodes if they comment on different bug reports associated the results shown in Figure 4 adding the value to our study. with the same product as well as same component) evolve faster than properties at other layers. This out-performance RQ3: How significantly various metrics measuring the of these layers is observed for both NetBeans as well as importance of the developers in DSN correlate with their bug Eclipse. Graph-based metrics such as density, modularity fixing performance? How significantly these correlations differ and their evolution have been used to study the software across different layers of MDSN? TABLE (V) Correlation Analysis Avg Fixed Time Total Components Aggregate Priority Points Aggregate Severity Points Layers Metric Eclipse NetBeans Eclipse NetBeans Eclipse NetBeans Eclipse NetBeans Betweenness Centrality -0.079 -0.161 0.209** 0.506** 0.742** 0.563** 0.744** 0.578** Closeness Centrality -0.195** -0.434* 0.181** 0.646** 0.336** 0.636** 0.338** 0.648** L1 Eigenvector Centrality -0.161* -0.443* 0.315** 0.656** 0.482** 0.656** 0.475** 0.651** Entropy Based Measure -0.162* -0.222 0.409** 0.005 0.635** -0.087 0.631** -0.087 Betweenness Centrality -0.059 -0.171 0.282** 0.506** 0.269** 0.358 0.261** 0.396* Closeness Centrality -0.141* -0.386* 0.092 0.646** 0.162* 0.472* 0.157* 0.489** L2 - D1 Eigenvector Centrality -0.096 -0.424* 0.063 0.451* 0.032 0.442* 0.03 0.448* Entropy Based Measure 0.162* 0.026 -0.097 0.045 -0.146* -0.061 -0.139* -0.029 Betweenness Centrality -0.064 -0.119 0.297** 0.48** 0.275** 0.294 0.269** 0.326 Closeness Centrality -0.141* -0.434* 0.106 0.521** 0.175** 0.464* 0.172* 0.473** L2 - D2 Eigenvector Centrality -0.092 -0.384* 0.086 0.48** 0.035 0.452* 0.034 0.485** Entropy Based Measure -0.156* 0.022 0.10* -0.212 0.191** -0.093 0.185** -0.092 Betweenness Centrality -0.11 -0.18 0.223** 0.468* 0.74** 0.642** 0.745** 0.647** Closeness Centrality -0.271** -0.314 0.274** 0.641** 0.364** 0.651** 0.366** 0.654** L3 Eigenvector Centrality -0.242** -0.523** 0.311** 0.553** 0.357** 0.526** 0.356** 0.525** Entropy Based Measure -0.232** -0.212 0.365** 0.08 0.474** -0.01 0.472** -0.012 Betweenness Centrality -0.25** -0.631** 0.246** 0.524** 0.428** 0.45* 0.436** 0.466* Closeness Centrality -0.347** -0.656** 0.329** 0.442* 0.285** 0.416* 0.287** 0.422* L4 Eigenvector Centrality -0.393** -0.788** 0.279** 0.385* 0.204** 0.371* 0.203** 0.373* Entropy Based Measure -0.39** -0.365 0.301** 0.164 0.251** 0.213 0.252** 0.222 Betweenness Centrality -0.263** -0.656** 0.255** 0.438* 0.32** 0.402* 0.325** 0.407* Closeness Centrality -0.358** -0.711** 0.277** 0.397* 0.242** 0.379* 0.243** 0.38* Combined DSN Eigenvector Centrality -0.424** -0.743** 0.247** 0.376* 0.192** 0.36 0.192** 0.36 Entropy Based Measure -0.418** -0.243 0.26** 0.125 0.216** 0.115 0.218** 0.137 ∗p < .05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < .001 Copyright © 2019 for this paper by its authors. 42 Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019) Fig. (4) Layer-wise evolution of global network properties of DSN ( Eclipse, NetBeans) from 2001 to 2005. The main findings of this paper is the answer to this research studies show that predicting the fix-time of a bug in OSS is question. To answer this research question, we first computed hard. Bhattacharya and Neamtiu [25] reported that many of various node importance measures for the DSN at each layer the features/attributes considered by researchers to build the as defined in section III-C and the measures to characterize predictor in predicting the average fix time of the bug are not the performance of the developers as defined in section III-D. found relevant. Our results suggest that node importance based Then we used the Pearson correlation coefficient to see how ef- measures for assignee can prove to be good features for such fectively and strongly these two sets of measures correlate with predictors. Second, most of the past studies considered the each other. We chose Pearson Correlation Coefficient to see node-based centrality at layer-1 only while our study suggests the strength of association between two types of measures as that similar centrality measures perform better if we leverage it has been found useful by many past studies [23] [24]. Table MDSN instead of single-layered DSN. V shows a summary of our results. The values of correlation A closer look at Table V shows that node importance coefficients where the p-value is less than 0.05 are specifically measures are also significantly correlated with the total number highlighted. To see the cumulative effect of node importance of components the developer worked upon in fixing the issues. measures on the performance of the developers we merged the This suggests that developers with high node importance DSNs of all the layers into one. In this merged integrated DSN, measures gain diverse expertise in fixing the issues making a node exists between the pair of the developers if there exists them more crucial/important for the organization. A significant a link between them in any of the layers - L1, L2-D1, L2-D2, correlation between node importance measures of developers L3, L4. In general, our results are encouraging. For instance, and aggregate priority points as well as aggregate severity the Eigenvector centrality value of a node in DSN is negatively points of the bugs fixed by them shows that node importance correlated with the average fix time suggesting that developers measures selected in our study are good measures to identify who enjoy good eigenvector centrality value fix the issue faster crucial and important developers (developers who are good to than their peers with lower eigenvector centrality. The value of fix the bugs with high priority and high severity). the correlation coefficient between Average fix time and other node importance measures is also significant. It can also be Interestingly, MDSN approach of investigating the impact seen that the correlation coefficient between Average fix time of various node importance measures on their bug fixing and other node importance measures is maximum for layer-4 performance provides new insight as many of the results are out of all the individual layers. This is interesting because past counter-intuitive, e.g. best correlation is found for layer-4 (where developers are connected if they have commented on Copyright © 2019 for this paper by its authors. 43 Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 7th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2019) two different bug reports associated with the same operating [6] M. Cataldo and J. D. Herbsleb, “Communication networks in geograph- system). Though the correlation is found significant for both ically distributed software development,” in Proceedings of the 2008 ACM conference on Computer supported cooperative work, pp. 579– the projects making the finding general enough, the value of 588, ACM, 2008. correlation coefficients are found to be higher with NetBeans [7] M. Joblin, S. Apel, and W. Mauerer, “Evolutionary trends of developer data. coordination: A network approach,” Empirical Software Engineering, vol. 22, no. 4, pp. 2050–2094, 2017. Overall, there are two takeaways from our results-first, [8] J. Xuan, H. Jiang, Z. Ren, and W. Zou, “Developer prioritization in significant correlation between various node-based measures bug repositories,” in 2012 34th International Conference on Software and performance of developers suggests that these measures Engineering (ICSE), pp. 25–35, IEEE, 2012. [9] G. Canfora, M. Di Penta, R. Oliveto, and S. Panichella, “Who is going can be used to identify important developers. Second, the to mentor newcomers in open source projects?,” in Proceedings of the measures based on different layers of MDSN are differently ACM SIGSOFT 20th International Symposium on the Foundations of correlated with the performance of developers, making the Software Engineering, p. 44, ACM, 2012. [10] Q. C. Taylor, J. E. Stevenson, D. P. Delorey, and C. D. Knutson, “Author MDSN framework worthy enough to try for identifying the entropy: A metric for characterization of software authorship patterns,” crucial and important developers in OSS. in Third International Workshop on Public Data about Software Devel- opment (WoPDaSD08), p. 6, 2008. [11] T. Wolf, A. Schroter, D. Damian, and T. Nguyen, “Predicting build V. C ONCLUSION AND F UTURE W ORK failures using social network analysis on developer communication,” in Proceedings of the 31st International Conference on Software Engi- In this research, we proposed MDSN to investigate the neering, pp. 1–11, IEEE Computer Society, 2009. multifaceted nature of collaboration among the developers [12] S. Wang and N. Nagappan, “Characterizing and understanding soft- while they fix the bugs and collaborate through the Issue Re- ware developer networks in security development,” arXiv preprint arXiv:1907.12141, 2019. porting System. There are many takeaways from our research. [13] P. Kazienko, K. Musial, and T. Kajdanowicz, “Multidimensional social First, since the structure of networks varies significantly network in the social recommender system,” IEEE Transactions on across various layers of MDSN, replicating the past studies Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 41, no. 4, pp. 746–759, 2011. on community detection and identifying team formation in [14] M. E. Newman, “Modularity and community structure in networks,” OSS on the MDSN framework may provide new insights. Proceedings of the national academy of sciences, vol. 103, no. 23, Second, Our results show that many node importance measures pp. 8577–8582, 2006. [15] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small- i.e. node centrality based metrics and graph entropy-based world’networks,” nature, vol. 393, no. 6684, p. 440, 1998. measures have a significant correlation with the performance [16] P. Bonacich, “Power and centrality: A family of measures,” American of the developers in the bug fixing process. Further, such Journal of Sociology, vol. 92, no. 5, pp. 1170–1182, 1987. [17] L. C. Freeman, “A set of measures of centrality based on betweenness,” correlations vary significantly across the layers suggesting that Sociometry, vol. 40, no. 1, pp. 35–41, 1977. MDSN could be more useful to identify important and crucial [18] G. Sabidussi, “The centrality index of a graph,” Psychometrika, vol. 31, developers in the developer community of OSS. Though our pp. 581–603, Dec 1966. [19] M. Dehmer and A. Mowshowitz, “A history of graph entropy measures,” results are consistent with both the case studies, we selected Information Sciences, vol. 181, no. 1, pp. 57–78, 2011. for our research, there are few threats to its validity. First, [20] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: an open source soft- comments are not made only by developers on ITS (Issue ware for exploring and manipulating networks,” in Third international AAAI conference on weblogs and social media, 2009. Tracking Systems) and hence considering all commenters as [21] C. Bird, D. Pattison, R. D’Souza, V. Filkov, and P. Devanbu, “Latent developers could be a threat to the validity of our results. social structure in open source projects,” in Proceedings of the 16th Second, ITS is not the only platform where developers col- ACM SIGSOFT International Symposium on Foundations of software engineering, pp. 24–35, ACM, 2008. laborate. Past research has also used version control data to [22] P. Bhattacharya, M. Iliofotou, I. Neamtiu, and M. Faloutsos, “Graph- study collaboration among developers. Hence, performing our based analysis and prediction for software evolution,” in 2012 34th study on version control data can complement our study. We International Conference on Software Engineering (ICSE), pp. 419–429, IEEE, 2012. plan this in our future work. [23] Z. Zhang, W. K. Chan, T. Tse, P. Hu, and X. Wang, “Is non-parametric hypothesis testing model robust for statistical fault localization?,” In- R EFERENCES formation and Software Technology, vol. 51, no. 11, pp. 1573–1585, 2009. [1] C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A. Swaminathan, “Min- [24] T. Zimmermann, R. Premraj, and A. Zeller, “Predicting defects for ing email social networks,” in Proceedings of the 2006 international eclipse,” in Third International Workshop on Predictor Models in workshop on Mining software repositories, pp. 137–143, ACM, 2006. Software Engineering (PROMISE’07: ICSE Workshops 2007), pp. 9–9, [2] M. S. Zanetti, I. Scholtes, C. J. Tessone, and F. Schweitzer, “Categoriz- IEEE, 2007. ing bugs with social networks: a case study on four open source software [25] P. Bhattacharya and I. Neamtiu, “Bug-fix time prediction models: can communities,” in Proceedings of the 2013 International Conference on we do better?,” in Proceedings of the 8th Working Conference on Mining Software Engineering, pp. 1032–1041, IEEE Press, 2013. Software Repositories, pp. 207–210, ACM, 2011. [3] A. Meneely, L. Williams, W. Snipes, and J. Osborne, “Predicting failures with developer networks and social network analysis,” in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pp. 13–23, ACM, 2008. [4] Q. Hong, S. Kim, S. C. Cheung, and C. Bird, “Understanding a developer social network and its evolution,” in 2011 27th IEEE international conference on software maintenance (ICSM), pp. 323–332, IEEE, 2011. [5] A. Kumar and A. Gupta, “Evolution of developer social network and its impact on bug fixing process,” in Proceedings of the 6th India Software Engineering Conference, pp. 63–72, ACM, 2013. Copyright © 2019 for this paper by its authors. 44 Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).