<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Studying Multifaceted Collaboration of OSS Developers and its Impact on their Bug Fixing Performance</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Amit Kumar</string-name>
          <email>amitchandramunityagi@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yugandhar Desai</string-name>
          <email>yugandhard.desai@st.niituniversity.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mahen Gandhi</string-name>
          <email>mahenm.gandhi@st.niituniversity.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sonali Agarwal</string-name>
          <email>sonali@iiita.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Area of Computer Science and Engineering, NIIT University</institution>
          ,
          <addr-line>Neemrana</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Information Technology, Indian Institute of Information Technology</institution>
          ,
          <addr-line>Allahabad</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>37</fpage>
      <lpage>44</lpage>
      <abstract>
        <p>-Developers often collaborate to fix complex bugs, even in open source software systems (OSS) where collaboration largely occurs through discussions in the bug tracker. The implicit Developer Social Networks (DSN) are created as a result of these discussions. Past research has investigated the usefulness of such DSNs in addressing many Software Engineering problems (e.g. Defect Prediction, Evolution of collaboration patterns, etc.). However, the multifaceted nature of DSNs constructed from bug reports data has been ignored in most of the past studies. That is, in most of the past studies, the link among developers exist only if they comment on the same bug report while in reality, the developers may be connected indirectly (e.g. pair of developers are connected even if they comment on two different bug reports which are associated with the same software component). Such unexplored relationships among developers can be used in defining new measures to identify important developers in the OSS system which otherwise is not trivial to do. In this paper, we study this implicit multifaceted nature of collaborations among developers by extending single layer DSN to Multi-layer DSN (MDSN). Our experiments performed on bug data of Eclipse and NetBeans show that structure of DSNs and their evolution at various layers differ significantly and performance of developers in bug fixing process is not only significantly correlated (Pearson correlation coefficient up to 0.74) with their network centrality scores but also vary across various layers of MDSN signifying their usefulness in determining the crucial and important developers in the software systems. Index Terms-Developer Social Network, Multidimensional Developer Social Network, Multilayered Developer Social Network, Multifaceted Developer Social Network</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>Issue Tracking Systems are not only used to archive bug
reports and the related information but also to help developers
to collaborate and have a discussion on issues (bugs or
features). Developers typically interact by commenting on bug
reports. These interactions form an implicit developer social
network (DSN).</p>
      <p>
        Due to the readily available data from issue trackers,
researchers have started investigating DSNs to solve software
development problems. For example, DSNs have been used to
study community structures of software developers and their
evolution [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], to categorize bug reports [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and to help in
defect prediction [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>However, most of these studies explore only one type of
links among the developers (e.g. In DSN constructed from
bug report data, the developers are connected if they have
worked together to fix the same bug report) while they
are indirectly connected through various other avenues. For
instance, developers who have not commented on the same
bug report but have commented on two different bug reports
found in the same component of a software product, are
indirectly connected. In this paper, we consider many such
indirect connections among the developers and build the
Multilayered / Multi-faceted Developer Social Network (MDSN).
In our Multi-Layered DSN, each layer represents a different
DSN which shows the links among developers capturing
different types of proximity among them. We believe that a
holistic view of these different kinds of proximities among
the developers and investigation of Multi-faceted Developer
Social Network (MDSN) can elucidate more on the nature of
developer collaborations on issue tracking systems.</p>
      <p>
        Towards our goal of investigating MDSN, we first attempt
to answer the fundamental question if the structure of DSN
at various layers vary significantly from each other. Network
Structure of DSN has been characterized by many global
social network properties in past studies [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We also use
global social network properties to characterize and investigate
DSN of various layers and hence ask the following research
question:
RQ1: How significantly the global network properties of DSN
vary across the layers of the MDSN?
      </p>
      <p>Past studies have reported that DSN does not remain
invariable and evolve. Studying such evolution of DSN is
important as it allows us to comprehend how relationships
among developers evolve. MDSN has many layers of DSN
and hence it is important to study and compare the evolution
of each DSN. This sheds light on the dynamics of DSNs at
each layer. In particular, we pose our second research question
as follows:
RQ2: How does the evolution of DSNs differ at each layer?
Do some DSNs evolve faster than others?</p>
      <p>The first two research questions are posed to investigate if
the Multifaceted/Multilayered approach of studying DSN adds
some value to the understanding of developer communication
structure or not.</p>
      <p>
        However, past studies have used DSN to characterize the
traits, performance, and importance of the developers [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Node centrality measures in DSN and entropy of
developers contributions have been used widely to characterize
the importance of developers in the collaboration network
of developers [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In MDSN, the collaboration happens at
various levels and hence it will be interesting to see how
the importance of nodes in DSN is associated with their bug
fixing performance. To measure the importance of the node
(developer), we compute the graph entropy-based measures
along with various node centrality measures at each layer
of DSN. In particular, we ask our third research question as
follows:
RQ3: How significantly various metrics measuring the
importance of the developers in DSNs correlate with their bug
fixing performance? How significantly these correlations differ
across different layers of MDSN?
      </p>
      <p>To answer these research questions, we first construct the
Multilayered Developer Social Networks from the bug report
data of two popular Java IDE projects-Eclipse and NetBeans.
Then we use many global network properties (characterizing
the properties of the entire network), node importance based
measures and measures to characterize the bug-fixing
performance of developers to answer our research questions.</p>
      <p>II. RELATED WORK</p>
      <p>
        Leveraging archival data to facilitate ongoing software
development is a key tenet in software engineering research. One
such data that researchers have started using is the implicit
social networks that are created because of developer
interactions: when they work on the same file or task or communicate
regarding an issue or task. Here we sample a subset of research
involving DSNs which are related to our work. For example,
Canfora et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] mine data from the mailing lists to identify
experienced developers who actively interact with newcomers
to identify mentors. Bird et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] on the other hand, have
used the DSNs from mailing lists to investigate the social
status of OSS participants based on the network structure and
the relationship between email and commit activities via these
networks. Hong et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] have investigated how a DSN evolves
and compared it to the evolution in other social networks like
Facebook, Twitter, etc. The network structure properties of
DSNs have also been studied to identify the structures that
correlate with efficiency in the bug fixing process [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Zanetti et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] found that centrality of users in a
communication network between bug reporters and developers to be
indicative of the quality of a bug report. Cataldo and Herbsleb
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] observed that the core developers in the communication
structure of the organizations are top contributors. Meneely
et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and Wolf et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] used developer social network
for failure prediction. More recently, Wang and Nagappan [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
studied the distribution of collaboration patterns and used them
to see the impact of such patterns on the quality of the project
from the security point of view.
      </p>
      <p>
        Our study is similar to the work described above as we
also study developer social network. However, instead of using
single layer DSN, we study multilayered (Multifaceted DSN).
Each layer in our Multifaceted DSN represents different sort
of relationship among developers making it a richer framework
for depicting more complex proximities among them. Our
model of MDSN is inspired by Kazienko et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. However,
to the best of our knowledge, we are first to calibrate and use
it in the Software Engineering domain. We also investigate
how the positions of developers in networks represented by
various layers in proposed Multi-faceted DSN convey about
their performance in bug fixing activities. This investigation
also adds to the novelty of the proposed work in this paper.
      </p>
    </sec>
    <sec id="sec-2">
      <title>III. BACKGROUND AND METHODOLOGY</title>
      <p>A. Developer Social Network (DSN) and Multi-layered DSN
(MDSN)</p>
      <p>
        Issue Tracking Systems have been used as a communication
platform by developers while they work on the bugs reported
by users and fellow developers. Developers usually do so by
commenting on the bug reports. A typical issue report in ITS
of large software ecosystem such as NetBeans or Eclipse has
many fields which provides details about the issue e.g. short
and long description of the issue, product and component of
the given product where the issue is likely to be present,
operating system (the product is used/tested with), priority of
the issue, severity of the issue, reporter who reported the issue,
assignee (whom the issue is assigned to) etc. The discussion
on Issue Reporting Systems has been leveraged to construct
the Developer Social Network among developers. However,
the DSN can be modeled as a single-layered DSN as well
as a multilayered DSN. Most of the past studies [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
consider DSN as single-layered, where developers are nodes
and edges between the pair of developers exist if they have
commented on the same bug report.
      </p>
      <p>Fig. (2)</p>
      <p>Multi-layered DSN</p>
      <p>We model our DSN as Multi-layered DSN so that multiple
facets of the collaboration among developers can be
investigated. In our MDSN, each layer represents a different kind
of relationship among the developers. In the case of
singlelayer DSN (DSN considered by past studies), the developers
are connected with the edge between them if they have
commented on the same bug report. In MDSN, developers are
also connected even if they comment on different bugs/issues
which are found in the same product, the same component
of the product, reported by the same reporter or if the bugs
were discovered while software product was used with the
same operating system. In total, we have five layers in our
MDSN. To illustrate further the difference between
singlelayered DSN and MDSN, let us consider a toy example data
set shown in Table I. There are five bug reports with some
(relevant to our study) of their attributes in this table. The
single-layered DSN constructed out of this dataset is shown
in Figure 1 while MDSN consisting of five layers can be
constructed as shown in Figure 2. It can be noted that
singlelayered DSN is contained in the MDSN as one layer (L1) in
it, making it a richer network framework to depict the deeper
relationships among the developers. Other layers in MDSN of</p>
      <p>
        To investigate the difference in the nature and evolution of
various DSNs at every layer, we used similar global network
properties as used by Hong et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We compared the DSNs
at each layer based on the following global network properties:
1) Network density: Network density is defined as the ratio
of number of edges present in the network and the maximum
possible edges which can exist in the network (excluding
selfloops). Higher density of network indicates higher levels of
inter-developer communication.
      </p>
      <p>
        2) Modularity: Modularity of network is important
measure as higher value of modularity denotes the higher
community structure present in the network. We used the same
modularity definition as defined by Newman [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>n
M = X
xi</p>
      <p>yi2
i=1
where, xi denotes the proportion of the edges between the
vertices of the community i while yi denotes the proportion
of the edges that are not part of the community.</p>
      <p>3) Average Path Length (APL): It is the average length
of shortest paths between each pair of nodes in the network.
Shorter APL shows that Developers are well connected to each
other in the network and are easily accessible to each other.</p>
      <p>4) Average Clustering Coefficient: The Clustering
Coefficient in a graph is the degree of clustering by a node with its
neighboring nodes. The clustering coefficient of a vertex can
be defined as follows in an undirected graph:</p>
      <p>Pi =</p>
      <p>2xi
yi (yi 1)</p>
      <p>
        Here xi is the number of edges between neighbors of
node i and yi is the number of node i’s neighbors. Average
Network Clustering Coefficient is the average of all network
nodes clustering coefficients. A significantly higher average
clustering coefficient indicates that network follows small
world phenomenon [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>5) Average Degree (AD): The degree of a node in the graph
is total number of edges incident on that. Since each edge has
two vertices and counts in the degree of both vertices, the
average degree of the undirected graph is defined as :
AD = 2
jEj
jV j</p>
      <sec id="sec-2-1">
        <title>C. Node Importance Measures</title>
        <p>
          Past studies have leveraged the position of developers in
DSN to investigate their importance in the developer
community. In particular, various node based centrality measures
in DSN have been used to measure the importance of the
developers in DSN. To answer our third research question, we
used following node importance measures defined for DSN:
1) Eigenvector Centrality: In graph theory, eigenvector
centrality (also called eigencentrality) is a measure of a node’s
importance in a network. The idea is to assign proportional
score values to all network nodes. Let G(V; E) be a graph,
consisting of vertices V and edges E. Let A = (av, t) be the
adjacency matrix, i.e av, t = 1 if vertex v is linked to vertex t,
and av, t = 0 otherwise. The relative centrality score of vertex
v as defined by Phillip Bonacich [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] is:
1 X
        </p>
        <p>xt = 1 X av,txt
xv =
t M(v)
t G
where M (v) is a set of the neighbours of v and is a constant.
Mathematically, this can be written in vector notation as the
famous eigenvector equation,</p>
        <p>Ax =
x
The principal eigenvector of the above equation denotes the
centrality of all network nodes (here, node is the developer).</p>
        <p>
          2) Betweenness Centrality: The Betweenness Centrality of
the graph is determined by the propensity of a single vertex
to be more central than any other vertex in the graph. In other
words, it measures how often a node appears on shortest paths
between nodes in the network. The following is the standard
measure given by Freeman [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]:
        </p>
        <p>CB (V ) =</p>
        <p>
          X
s6=v6=t V
st (V )
st
where, st is the number of shortest paths from s V to t V .
3) Closeness Centrality: A node’s closeness centrality in
a connected graph is a measure of centrality in a network,
measured as the reciprocal sum of the shortest path length
between the node and all other nodes in the graph. Therefore,
the more central the node, the closer it is to all the other
nodes. The closeness centrality as defined by Sabidussi [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]
is as follows:
        </p>
        <p>
          CC (V ) = Pt V dG (v; t)
1
4) Entropy based measure: The entropy of a system
measures the randomness or uncertainty in it. The entropy of a
graph measures the diversity of edges incident on its nodes.
Higher the entropy of the graph, more uniform is the
distribution of its edges on its nodes. Dehmer and Mowshowitz
[
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] provides good survey on graph entropy. Graph entropy
can be used to measure the importance of the nodes in the
graph. For instance, if entropy of a graph is significantly
changed after removing a node from it, node is considered
important. Though many definitions are available for graph
entropy in literature, we define entropy of a graph as follows:
Let G = (V; E) be an undirected graph. The entropy of the
graph G, denoted as H(G) is defined as:
        </p>
        <p>H(G) = X pi log pi</p>
        <p>i V
where, pi is the degree of node i divided by the sum of the
degrees of all nodes in the graph. The graph entropy based
importance H(Gi) of node i is defined as:</p>
        <p>H(Gi) = jE2</p>
        <p>E1j
where, E1 is the entropy of the graph G with node i and E2
is the entropy of the graph G without node i. This helps us in
determining how important a node is in the graph. Higher the
value of H(Gi), more is the importance of the developer i.
D. Performance of developers in bug fixing process</p>
        <p>To answer our third research question, we require two
set of measures - Node importance measures as defined in
previous sub section and measures to quantify the performance
of the developers in bug fixing process. We used following
measures to measure the performance of developers in bug
fixing process.</p>
        <p>1) Average fix time: The Average Fixed Time for an
assignee a is estimated over a certain period of time using
the equation below. To calculate the developer’s efficiency in
certain time period, we only consider the bugs that are opened
and fixed during that time period.</p>
        <p>AF T = Pin=1 t2bi t1bi
n
where,
bi = ith Bug in set of bugs assigned to the assignee a.
n = Total bugs assigned to the assignee a.
t1 = Time when the bug was assigned to the assignee.
t2 = Time when the FIXED label was added to the bug
report for the first time.</p>
        <p>2) Aggregate Priority Points: This metric is used to
measure the importance of the developer with respect to the type
of bugs he/she fixes. The developer who fixes the bugs with
higher priority is considered to be more important. We assign
priority points to each developer based on the types of bugs
he fixes. First we assign the weightage to each priority type
as follows:</p>
        <p>TABLE (III)</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Weightage of Priorities</title>
      <p>Priority
Points</p>
      <p>P1
5</p>
      <p>P2
4</p>
      <p>P3
3</p>
      <p>P4
2</p>
      <p>P5
1
Then we calculate the priority points for each developer.
Higher value of priority points for the developer signifies
that he fixes the bugs with relatively higher priorities making
him more important developer than those with lower priority
points.</p>
      <p>The equation for estimating the aggregate priority points of
developer over a certain period of time is as follows:
AP P =</p>
      <p>Pn
i=1 pi
n
cp
where,
n = Total number of bugs assigned to assignee a.
pi = Priority of bug.
cp = Points allocated to priority p.</p>
      <p>3) Aggregate Severity Points: This is very similar to the
Aggregate Priority Points. The points allocated for each
severity are as follows.</p>
      <p>TABLE (IV)</p>
      <p>Weightage of Severities
Severity
Points
trivial
1
minor
2
normal
3
major
4
critical
5
blocker
6</p>
      <p>Note that NetBeans and Eclipse allow users to demand new
features that are not technically real bugs. Therefore, we do
not consider those bug reports where the severity attribute
is set for enhancement because this category is reserved for
feature requests or improvements to the product. The formula
for calculating the aggregate severity points is as follows:
ASP =</p>
      <p>Pn
i=1 si
n
cs
where,
n = Total number of bugs assigned to assignee a
si = Severity of bug
cs = Points allocated to severity s
4) Total Components Developer Works Upon: This measure
is the total number of modules/components that the assignee
has worked on during certain time period. This denotes the
diversity in the work profile of the developer.</p>
      <sec id="sec-3-1">
        <title>E. Experimental Set up and Dataset</title>
        <p>To carry out our work,we performed our experiments with
bug reports of two common open-source software
projectsEclipse and NetBeans. We chose these projects because they
are very popular among the community of software
engineering, developed around a similar time as OSS projects and
have similar functionalities that make them a good choice to
test our proposed Multi-layered Developer Social Network. In
total, Our dataset has 283380 comments made upon 48258 bug
reports between 2001 and 2005. We chose this period as both
the projects during this time were in their initial phase making
them ideal to study their evolution. The complete details of the
dataset are shown in Table II.</p>
        <p>
          In Issue Tracking System like Bugzilla (used by Eclipse
and NetBeans), though the issue is assigned to one person
i.e. Assignee, it is fixed collaboratively by OSS contributors.
The collaboration happens through comments made on the
bug report. Hence in answering our research questions, we
constructed single-layered and Multi-layered DSN out of the
comments of the bug reports in our dataset. However, it
should be noted that to answer our RQ3, the node importance
measures and performance metrics are calculated and analyzed
only for assignees as they are most responsible for fixing
the assigned bug/issue. We conjecture that the assignees with
good node importance measure in MDSN are good performers.
In particular, node importance measures of an assignee in
different layers influences her performance differently. For
computing various global network-based metrics and node
importance based measures we used Gephi Network Analysis
tool [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>IV. RESULTS AND DISCUSSION In this section, we discuss our results and their implications with respect to our research questions.</title>
      <p>RQ1: How significantly the global network properties of
DSN vary across the layers of the MDSN?</p>
      <p>
        To answer this research question, we computed all the
network measures defined in section III-B with the help of
Gephi [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and compared the global network properties of
DSNs formed at each layer of MDSN. Past research [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] has also used the similar approach to compare various
networks.The difference in network measures across various
layers of MDSN signifies the importance of individual layers
in MDSN making it more useful framework to study the
collaboration patterns among developers. Our results are
shown in Figure 3. It can be seen from the figure that while
density, average path length, modularity of the DSN vary
significantly across different layers of MDSN, the clustering
coefficient remains relatively stable across the layers. The
modularity of the network describes the community structure
of the network and past studies have leveraged modularity of
DSN for community detection and discovering team structures
in OSS projects [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Variation in modularity across
different layers suggest that the different community structure
and team structure could be discovered using our MDSN
approach improving the knowledge about the team structure
in OSS maintenance activities. Furthermore, other network
measures e.g. network density, average path length etc. have
been used to predict the defects in software modules [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and
hence it would be interesting to investigate if the accuracy
of defect prediction models could also be improved by
incorporating network measures computed based on MDSN.
In nutshell, it is clear from our results shown in Figure 3
that network structure of DSNs formed at various layers is
significantly different from each other and encourages to
leverage MDSN to investigate their usefulness in solving
popular research problems e.g. community detection, defect
prediction etc.
      </p>
      <p>RQ2: How does the evolution of DSNs differ at each layer?
Do some DSNs evolve faster than others?
Fig. (3)</p>
      <p>
        Variation in Network Structure across various layers of MDSN
To answer this research question, we first split 5 years bug
evolution and predicting some of its important aspects [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
report data of Eclipse and NetBeans (2001-2005) into chunks
Our study extends the past study on DSN based software
of 6 months data. This way, we have 10 samples of bug report
evolution and suggests that studying the evolution of DSN
data for each of the projects. Then we compute the global
at various layers of MDSN can provide new insights to the
network properties for each sample to see their evolution over
software evolution research. Variation in evolution of DSNs
time. Figure 4 shows the evolution of network properties of
at various layers also suggests that predicting future aspects
DSN over time for both Eclipse and NetBeans. It can be seen
of some DSNs is more difficult than others. For instance,
easily from the figure that the evolution of almost all the
predicting the developers leaving the project might be much
network properties at layer-L2-D1 (edges between developer
more difficult in layer L2-D1 and L2-D2 in comparison of
nodes if they comment on different bug reports associated
other layers where evolution is relatively smoother. Overall,
with the same product) and L2-D2 (edges between developer
the answer to this research question is affirmative based on
nodes if they comment on different bug reports associated
the results shown in Figure 4 adding the value to our study.
with the same product as well as same component) evolve
faster than properties at other layers. This out-performance
of these layers is observed for both NetBeans as well as
      </p>
    </sec>
    <sec id="sec-5">
      <title>Eclipse. Graph-based metrics such as density, modularity and their evolution have been used to study the software</title>
      <p>RQ3: How significantly various metrics measuring the
importance of the developers in DSN correlate with their bug
fixing performance? How significantly these correlations differ
across different layers of MDSN?
Layers</p>
      <p>L1
L2 - D1
L2 - D2</p>
      <p>L3</p>
      <p>L4
Combined DSN</p>
      <p>Metric
Betweenness Centrality
Closeness Centrality
Eigenvector Centrality
Entropy Based Measure
Betweenness Centrality
Closeness Centrality
Eigenvector Centrality
Entropy Based Measure
Betweenness Centrality
Closeness Centrality
Eigenvector Centrality
Entropy Based Measure
Betweenness Centrality
Closeness Centrality
Eigenvector Centrality
Entropy Based Measure
Betweenness Centrality
Closeness Centrality
Eigenvector Centrality
Entropy Based Measure
Betweenness Centrality
Closeness Centrality
Eigenvector Centrality
Entropy Based Measure
Copyright © 2019 for this paper by its authors.</p>
      <p>Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
NetBeans) from 2001 to 2005.</p>
      <p>The main findings of this paper is the answer to this research
question. To answer this research question, we first computed
various node importance measures for the DSN at each layer
as defined in section III-C and the measures to characterize
the performance of the developers as defined in section III-D.</p>
      <p>
        Then we used the Pearson correlation coefficient to see how
effectively and strongly these two sets of measures correlate with
each other. We chose Pearson Correlation Coefficient to see
the strength of association between two types of measures as
it has been found useful by many past studies [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Table
V shows a summary of our results. The values of correlation
coefficients where the p-value is less than 0.05 are specifically
highlighted. To see the cumulative effect of node importance
measures on the performance of the developers we merged the
DSNs of all the layers into one. In this merged integrated DSN,
a node exists between the pair of the developers if there exists
a link between them in any of the layers - L1, L2-D1, L2-D2,
L3, L4. In general, our results are encouraging. For instance,
the Eigenvector centrality value of a node in DSN is negatively
correlated with the average fix time suggesting that developers
who enjoy good eigenvector centrality value fix the issue faster
than their peers with lower eigenvector centrality. The value of
the correlation coefficient between Average fix time and other
node importance measures is also significant. It can also be
seen that the correlation coefficient between Average fix time
and other node importance measures is maximum for layer-4
out of all the individual layers. This is interesting because past
studies show that predicting the fix-time of a bug in OSS is
hard. Bhattacharya and Neamtiu [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] reported that many of
the features/attributes considered by researchers to build the
predictor in predicting the average fix time of the bug are not
found relevant. Our results suggest that node importance based
measures for assignee can prove to be good features for such
predictors. Second, most of the past studies considered the
node-based centrality at layer-1 only while our study suggests
that similar centrality measures perform better if we leverage
MDSN instead of single-layered DSN.
      </p>
      <p>A closer look at Table V shows that node importance
measures are also significantly correlated with the total number
of components the developer worked upon in fixing the issues.</p>
      <p>This suggests that developers with high node importance
measures gain diverse expertise in fixing the issues making
them more crucial/important for the organization. A significant
correlation between node importance measures of developers
and aggregate priority points as well as aggregate severity
points of the bugs fixed by them shows that node importance
measures selected in our study are good measures to identify
crucial and important developers (developers who are good to
fix the bugs with high priority and high severity).</p>
      <p>Interestingly, MDSN approach of investigating the impact
of various node importance measures on their bug fixing
performance provides new insight as many of the results are
counter-intuitive, e.g. best correlation is found for layer-4
(where developers are connected if they have commented on
two different bug reports associated with the same operating
system). Though the correlation is found significant for both
the projects making the finding general enough, the value of
correlation coefficients are found to be higher with NetBeans
data.</p>
      <p>Overall, there are two takeaways from our results-first,
significant correlation between various node-based measures
and performance of developers suggests that these measures
can be used to identify important developers. Second, the
measures based on different layers of MDSN are differently
correlated with the performance of developers, making the
MDSN framework worthy enough to try for identifying the
crucial and important developers in OSS.</p>
    </sec>
    <sec id="sec-6">
      <title>V. CONCLUSION AND FUTURE WORK</title>
      <p>In this research, we proposed MDSN to investigate the
multifaceted nature of collaboration among the developers
while they fix the bugs and collaborate through the Issue
Reporting System. There are many takeaways from our research.
First, since the structure of networks varies significantly
across various layers of MDSN, replicating the past studies
on community detection and identifying team formation in
OSS on the MDSN framework may provide new insights.
Second, Our results show that many node importance measures
i.e. node centrality based metrics and graph entropy-based
measures have a significant correlation with the performance
of the developers in the bug fixing process. Further, such
correlations vary significantly across the layers suggesting that
MDSN could be more useful to identify important and crucial
developers in the developer community of OSS. Though our
results are consistent with both the case studies, we selected
for our research, there are few threats to its validity. First,
comments are not made only by developers on ITS (Issue
Tracking Systems) and hence considering all commenters as
developers could be a threat to the validity of our results.
Second, ITS is not the only platform where developers
collaborate. Past research has also used version control data to
study collaboration among developers. Hence, performing our
study on version control data can complement our study. We
plan this in our future work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bird</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gourley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Devanbu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gertz</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Swaminathan</surname>
          </string-name>
          , “
          <article-title>Mining email social networks</article-title>
          ,
          <source>” in Proceedings of the 2006 international workshop on Mining software repositories</source>
          , pp.
          <fpage>137</fpage>
          -
          <lpage>143</lpage>
          , ACM,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Zanetti</surname>
          </string-name>
          , I. Scholtes,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Tessone</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Schweitzer</surname>
          </string-name>
          , “
          <article-title>Categorizing bugs with social networks: a case study on four open source software communities</article-title>
          ,”
          <source>in Proceedings of the 2013 International Conference on Software Engineering</source>
          , pp.
          <fpage>1032</fpage>
          -
          <lpage>1041</lpage>
          , IEEE Press,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Meneely</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Snipes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Osborne</surname>
          </string-name>
          , “
          <article-title>Predicting failures with developer networks and social network analysis</article-title>
          ,
          <source>” in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering</source>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>23</lpage>
          , ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Cheung</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bird</surname>
          </string-name>
          , “
          <article-title>Understanding a developer social network and its evolution,”</article-title>
          <source>in 2011 27th IEEE international conference on software maintenance (ICSM)</source>
          , pp.
          <fpage>323</fpage>
          -
          <lpage>332</lpage>
          , IEEE,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          , “
          <article-title>Evolution of developer social network and its impact on bug fixing process</article-title>
          ,”
          <source>in Proceedings of the 6th India Software Engineering Conference</source>
          , pp.
          <fpage>63</fpage>
          -
          <lpage>72</lpage>
          , ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cataldo</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Herbsleb</surname>
          </string-name>
          , “
          <article-title>Communication networks in geographically distributed software development</article-title>
          ,”
          <source>in Proceedings of the 2008 ACM conference on Computer supported cooperative work</source>
          , pp.
          <fpage>579</fpage>
          -
          <lpage>588</lpage>
          , ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Joblin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Apel</surname>
          </string-name>
          , and W. Mauerer, “
          <article-title>Evolutionary trends of developer coordination: A network approach,” Empirical Software Engineering</article-title>
          , vol.
          <volume>22</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>2050</fpage>
          -
          <lpage>2094</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Xuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ren</surname>
          </string-name>
          , and W. Zou, “
          <article-title>Developer prioritization in bug repositories</article-title>
          ,
          <source>” in 2012 34th International Conference on Software Engineering (ICSE)</source>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>35</lpage>
          , IEEE,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Canfora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Di</given-names>
            <surname>Penta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Oliveto</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Panichella</surname>
          </string-name>
          , “
          <article-title>Who is going to mentor newcomers in open source projects?,”</article-title>
          <source>in Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering</source>
          , p.
          <fpage>44</fpage>
          ,
          <issue>ACM</issue>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Q. C.</given-names>
            <surname>Taylor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Stevenson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Delorey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Knutson</surname>
          </string-name>
          , “
          <article-title>Author entropy: A metric for characterization of software authorship patterns</article-title>
          ,
          <source>” in Third International Workshop on Public Data about Software Development (WoPDaSD08)</source>
          , p.
          <fpage>6</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schroter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Damian</surname>
          </string-name>
          , and T. Nguyen, “
          <article-title>Predicting build failures using social network analysis on developer communication,”</article-title>
          <source>in Proceedings of the 31st International Conference on Software Engineering</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          , IEEE Computer Society,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Nagappan</surname>
          </string-name>
          , “
          <article-title>Characterizing and understanding software developer networks in security development</article-title>
          ,” arXiv preprint arXiv:
          <year>1907</year>
          .12141,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kazienko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Musial</surname>
          </string-name>
          , and T. Kajdanowicz, “
          <article-title>Multidimensional social network in the social recommender system</article-title>
          ,
          <source>” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans</source>
          , vol.
          <volume>41</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>746</fpage>
          -
          <lpage>759</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Newman</surname>
          </string-name>
          , “
          <article-title>Modularity and community structure in networks</article-title>
          ,
          <source>” Proceedings of the national academy of sciences</source>
          , vol.
          <volume>103</volume>
          , no.
          <issue>23</issue>
          , pp.
          <fpage>8577</fpage>
          -
          <lpage>8582</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Watts</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Strogatz</surname>
          </string-name>
          , “
          <article-title>Collective dynamics of 'smallworld'networks,” nature</article-title>
          , vol.
          <volume>393</volume>
          , no.
          <issue>6684</issue>
          , p.
          <fpage>440</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonacich</surname>
          </string-name>
          , “
          <article-title>Power and centrality: A family of measures</article-title>
          ,”
          <source>American Journal of Sociology</source>
          , vol.
          <volume>92</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>1170</fpage>
          -
          <lpage>1182</lpage>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Freeman</surname>
          </string-name>
          , “
          <article-title>A set of measures of centrality based on betweenness</article-title>
          ,
          <source>” Sociometry</source>
          , vol.
          <volume>40</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>41</lpage>
          ,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sabidussi</surname>
          </string-name>
          , “
          <article-title>The centrality index of a graph,” Psychometrika</article-title>
          , vol.
          <volume>31</volume>
          , pp.
          <fpage>581</fpage>
          -
          <lpage>603</lpage>
          ,
          <year>Dec 1966</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dehmer</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Mowshowitz</surname>
          </string-name>
          , “
          <article-title>A history of graph entropy measures,”</article-title>
          <source>Information Sciences</source>
          , vol.
          <volume>181</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>57</fpage>
          -
          <lpage>78</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bastian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Heymann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Jacomy</surname>
          </string-name>
          , “
          <article-title>Gephi: an open source software for exploring and manipulating networks</article-title>
          ,
          <source>” in Third international AAAI conference on weblogs and social media</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bird</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pattison</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. D'Souza</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Filkov</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Devanbu</surname>
          </string-name>
          , “
          <article-title>Latent social structure in open source projects</article-title>
          ,”
          <source>in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering</source>
          , pp.
          <fpage>24</fpage>
          -
          <lpage>35</lpage>
          , ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iliofotou</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Neamtiu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Faloutsos</surname>
          </string-name>
          , “
          <article-title>Graphbased analysis and prediction for software evolution</article-title>
          ,”
          <source>in 2012 34th International Conference on Software Engineering (ICSE)</source>
          , pp.
          <fpage>419</fpage>
          -
          <lpage>429</lpage>
          , IEEE,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. K.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          , “
          <article-title>Is non-parametric hypothesis testing model robust for statistical fault localization?</article-title>
          ,
          <source>” Information and Software Technology</source>
          , vol.
          <volume>51</volume>
          , no.
          <issue>11</issue>
          , pp.
          <fpage>1573</fpage>
          -
          <lpage>1585</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Premraj</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zeller</surname>
          </string-name>
          , “
          <article-title>Predicting defects for eclipse,”</article-title>
          <source>in Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops</source>
          <year>2007</year>
          ), pp.
          <fpage>9</fpage>
          -
          <lpage>9</lpage>
          , IEEE,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          and
          <string-name>
            <surname>I. Neamtiu</surname>
          </string-name>
          , “
          <article-title>Bug-fix time prediction models: can we do better?,”</article-title>
          <source>in Proceedings of the 8th Working Conference on Mining Software Repositories</source>
          , pp.
          <fpage>207</fpage>
          -
          <lpage>210</lpage>
          , ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>