<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Predicting Software Defectiveness through Network Analysis</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Informatics University of Lugano Via Giuseppe Bu</institution>
          ,
          <addr-line>13 Lugano, Switzerland 6900</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <fpage>531</fpage>
      <lpage>540</lpage>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>We used a complex network approach to
study the evolution of a large software system,
Eclipse, with the aim of statistically
characterizing software defectiveness along the time.
We studied the software networks associated
to several releases of the system, focusing
our attention specifically on their community
structure, modularity and clustering
coecient. We found that the maximum average
defect density is related, directly or indirectly,
to two di↵erent metrics: the number of
detected communities inside a software network
and the clustering coecient. These two
relationships both follow a power-law distribution
which leads to a linear correlation between
clustering coecient and number of
communities. These results can be useful to make
predictions about the evolution of software
systems, especially with respect to their
defectiveness.</p>
      <p>Copyright c 2015 by the paper’s authors. Copying permitted
for private and academic purposes. This volume is published
and copyrighted by its editors.</p>
      <p>In: A.H. Bagge, T. Mens (eds.): Postproceedings of SATToSE
2015 Seminar on Advanced Techniques and Tools for Software
Evolution, University of Mons, Belgium, 6-8 July 2015,
published at http://ceur-ws.org
1</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>Modern software systems are large and complex
products, built according to a modular structure, where
modules (like classes in object oriented systems) are
connected with each other to enable software reuse,
encapsulation, information hiding, maintainability and
so on. Software modularization is acknowledged as
a good programming practice [Par72, BC99, SM96]
and a certain emphasis is put on the prescription that
design software with low coupling and high cohesion
would increase its quality [CK94]. In this work we
present a study on the relationships between the
quality of software systems and their modular structure.
To perform this study we used an approach based on
the concept of complex networks.</p>
      <p>Due to the fact that software systems are
inherently complex, the best model to represent them is
by retrieving their associated networks [Mye03, SˇB11,
WKD07, SˇZˇBB15, ZN08]. In other words, in a
software network, nodes can be associated to software
modules (e.g. classes) and edges can be associated
to connections between software modules (e.g.
inheritance, collaboration relationships). We investigated
the software modular structure - and its impact on
software defectiveness - by studying specific network
properties: community structure, modularity and
clustering coecient.</p>
      <p>A community inside a network is a subnetwork
formed by nodes that are densely connected if
compared to nodes outside the community [GN01].
Modularity is a function that measures how marked is a
community structure, namely the way the nodes are
arranged in communities [NG04]. The clustering
coefficient is a measure of connectedness among the nodes
of a network [New03].</p>
      <p>We studied several releases of a large software
system, Eclipse, performing a longitudinal analysis of the
relationship between community structure, clustering
coecient and software defectiveness. Our aim is to
figure out if the studied metrics can be used to
better understand the evolution of software defectiveness
along the time and to predict the defectiveness of
future releases. The results shown in this paper are part
of a more extensive research on di↵erent Java projects,
which is currently under consolidation. The aim of the
authors is to show eventually that the results presented
in this work are valid also for other extensively used
Java projects.</p>
      <p>This paper is organized as follows. In Section 2
we review some recent literature on software network
analysis, community structure and defect prediction.
In Section 3 we introduce some background concepts
taken from the research on complex networks, whereas
in Section 4 we thoroughly report the adopted metrics
and the methodology. In Section 5 we present some of
our results and discuss them in Section 6. In Section 7
we illustrate the threats to validity and in Section 8 we
draw some conclusions and outline the future work.
2</p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>Being software systems often large and complex, one of
the best candidates to represent them is the complex
network [FMS01], [Mye03], [TCM+11]. Many software
networks, like class diagrams [RS02, VS03],
collaboration graph [Mye03], package dependencies networks
[CL04], have already been shown to present the
typical properties of complex networks [BAJ00], such as
fractal and self-similar features [TMT12], scale free
[CS05], and small world properties, and consequently
power law distributions for the node degree, for the
bugs [CMM+11] and for refactored classes [MTM+12].</p>
      <p>Modelling a software system as a complex network
has been shown to have many applications to the study
of failures and defects. For example, Wen [WKD07]
reported that scale-free software networks could be more
robust if compared to random failures. Other
methods have been applied to understand the relationship
between number of defects and LOC [Zha09], while in
Ostrand et al. a negative binomial regression model
is used to show that the majority of bugs is contained
only in a fraction of the files (20%) [OWB05].</p>
      <p>So far many other methods have been tried for bug
prediction [DLR10, HBB+12], especially using
dependency graphs [NAH10, ZN08], but only recently many
researchers focused their attention on the community
structure as defined in social network analysis, namely
the division in subgroups of nodes among which there
is a high density of connections if compared to nodes
that are outside the community [NG04]. Being more
connected, elements belonging to the same community
might represent functional units or software modules,
leading to practical applications of the community
detection in the software engineering field. Community
detection is usually performed with methods like
hierarchical clustering and partitional clustering [For10].</p>
      <p>Newman et al. proposed some algorithms for
community detection [NG04, New04, CNM04], which are
now extensively used in the literature, along with the
definition modularity, a quality function which
measures the strength of a network partition in
communities [NG04]. In this work we use one of the algorithms
proposed by Newman et al. to understand if such
division can be related to software modularity as defined
in software engineering and, eventually, if the
community metrics may be useful to predict bugs in future
releases. The issue of community structure and its
application to software engineering has been recently
addressed in a similar fashion by Sˇubelj and Bajek. The
authors applied some community detection algorithms
to several Java systems to show that their evident
community structure does not correspond to the package
structure devised by the designer [SˇB11].</p>
      <p>In this work we consider the concept of
modularity used in software engineering, that is often
associated to high values of cohesion and low values of
coupling metrics [MMCG99, MM07, ASTC11]. In
the literature there are previous attempts to use
software network theory to characterize a modularity
function and relate it to good programming
practice [MMCG99, MM07, ASTC11]. We show that a
modularity function based on pure network topology
can be used to assess the goodness of a division in
clusters, and it is related to the software engineering
concept of the separation of components. Our work uses
a methodology based on community structure, which
is very lightweight from the computational point of
view. We are also introducing for the first time some
concepts from social network analysis that allowed us
to draw the same conclusions of the authors of the
aforementioned papers, but getting also information
on the predictability of software defectiveness.
3</p>
    </sec>
    <sec id="sec-4">
      <title>Modularity and Community</title>
    </sec>
    <sec id="sec-5">
      <title>Structure</title>
      <p>The concept of community derives from social
networks. Nodes belonging to the same community are
densely connected among each other, while they are
poorly connected with nodes which are not in the same
community. Inside a network, a community structure
is the specific way in which the nodes are arranged in
communities [For10]. Since there can be more than one
community structure, we need a quantitative measure
to evaluate the best division. The first and most used
measure is the modularity [NG04]. Although there
are some caveats to take into account while using it
[GMC10, FB07], modularity is considered the
standard measure for the quality of a community
structure.</p>
      <p>The original definition is based on the fact that a
random graph does not possess a community structure,
hence providing a null model for comparison with the
community structure of real networks [NG04].
Consider a complex network of n nodes and m edges. In
order to represent it we can use the following
definitions:
• Adjacency matrix:</p>
      <p>Avw =
If we postulate that nodes are grouped in
communities, then we can compute the fraction among
withincommunity edges and across-communities edges. In
order to have a significant community structure this
fraction has to be large. Given two communities cv
and cw, the latter fraction can be written as follows:
PvwPAvvww A(vcwv, cw) =
1</p>
      <p>X Avw (cv, cw),
2m vw
where (cv, cw) is the Kronecker . In order to obtain
a reliable measure we need to compare the previous
values to a null model. The most used null model is a
graph with the same community structure but random
connections among its nodes. The expected value of
the fraction of edges attached to nodes in community
v and to nodes in community w, in the random case
would be given by:
1 X kvkw (cv, cw).</p>
      <p>2m vw 2m
where v and w are two nodes belonging to
communities cv and cw;
• Number of edges:
m = 1 X Avw,</p>
      <p>2 vw
kv =</p>
      <p>X Avw.</p>
      <p>w
• Node degree, namely the number of its connected
edges:
(1)
(2)
(3)
(4)
(5)</p>
      <p>Subtracting (5) to (4), we get the modularity as
defined in Newman [New06]:</p>
      <p>Q =
1</p>
      <p>X ⇣
2m vw</p>
      <p>Avw
kvkw ⌘
2m
(cv, cw).</p>
      <p>(6)</p>
      <p>A good community structure corresponds to values
of Q as close as possible to 1. However, in real
networks, modularity values that reveal a good
community structure fall in a range from 0.3 to 0.7 [NG04].
Lower values are associated to a weak community
structure, whereas strong community structures,
although rare in practice, may have modularity values
higher than 0.7 and approaching 1.
4</p>
    </sec>
    <sec id="sec-6">
      <title>Experimental Setting</title>
      <p>In this work we analyze the structure and evolution
of Eclipse IDE, a popular software system written in
Java, using its associated software network. We first
retrieved the network associated to each software
system - specifically to each subproject in which the major
system is structured, by parsing their source code
retrieved from their corresponding Source Control
Manager (SCM), looking for relationships like
collaboration, inheritance, etc. This way we obtained the
networks at class level where nodes are classes and edges
are the mentioned relationship among classes (i.e
inheritance, collaboration, etc.). Afterwards we
annotated each class with the corresponding number of
bugs retrieved using the procedure described in the
following paragraph.
4.1</p>
      <sec id="sec-6-1">
        <title>Retrieving Defectiveness Data</title>
        <p>We considered the number of defects (bugs) as the
main indicator of software quality. We collected data
about the bugs of a software system by mining its
associated Bug Tracking Systems (BTS). Bugzilla is the
BTS adopted by Eclipse, where defects are tagged with
a unique ID number. An entry in BTS is called with
the common term ’Issue’, and there is usually no
information about classes associated to defects. Usually all
the changes performed on the source code are reported
on the SCM. To obtain a correct mapping between
Issue(s) and the related Java classes, we analyzed the
SCM log messages, to identify commits associated to
maintenance operations where Issues are fixed.</p>
        <p>We analyzed the text of commit messages,
looking for Issue-IDs. Every positive integer number
(including dates, release numbers, copyright updates, etc)
might be a potential Issue-ID in the BTS. In order to
avoid wrong mappings between a file and the
corresponding Issue, we filtered out any number which did
not refer to bug fixes. This operation was performed
by associating Issue-IDs to files belonging to the same
release, and analyzing the commit logs to perform
the mapping between Issues and classes. We
associated to each release the Issues that are Bugs and that
were classified as “closed” in BTS. In fact, very rarely
Bugs which are labeled as “closed” are re-opened, this
way being permanently associated with a release. The
maintenance operations in Bugzilla are associated to
files, called Compilation Units (CUs), which may
contain one or more classes. Thus, in cases in which a file
contained more than one class, we decided to assign
all the defects to the biggest class of those
Compilation Units. At the end of this process we obtained a
network where to each node is associated the number
of bugs of the corresponding class.
4.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Metrics Analyzed</title>
        <p>We computed the following metrics:
• System Size: the number of classes of the software
system.
• Average Bug Number (ABN): or bug density,
namely the number of defects found in a system
divided by the number of classes.
• Modularity: a measure of the strength of the
obtained community structure, as defined in Section
3.
• Number of Communities (NOC): the number of
disjoint communities in which the network is
partitioned.
• Clustering Coecient (CC) : the average
probability that if vertex i is connected to vertex j and
vertex j to vertex k, then the vertex i will also
be connected to vertex k. It can be defined as
follows:</p>
        <p>Ci =
3 ⇥ number of triangles in the network
number of connected triples of nodes
where a triangle is a set of three nodes all
connected with each other, and a triple centered
around node i is a set composed by two nodes
connected to node i and the node i itself.</p>
        <p>The clustering coecient for the whole graph is
the average of the Ci’s:</p>
        <p>n
C = 1 X Ci,
n</p>
        <p>i=1
where n is the number of nodes in the network
[New03].
(7)
(8)</p>
        <p>Release
Size
Sub-Projects n.</p>
        <p>N. of defects
We analyzed 5 releases of Eclipse, whose main features
are presented in Table 1.</p>
        <p>Each release is structured in almost independent
sub-projects. The total number of sub-projects
analyzed amounts at 375, with more than 60000 nodes
(classes) and more than 350000 defects.</p>
        <p>We detected the modularity and its associated
community structure for each subproject of each
release using the Clauset-Moore-Newman (CMN)
community detection algorithm devised by Clauset et al.
[CNM04]. The latter is an agglomerative clustering
algorithm that performs a greedy optimization of the
modularity. The community structure retrieved
corresponds to the maximum value of the modularity.
Moreover, we retrieved the number of communities in
which the networks are structured, the corresponding
maximum value of the modularity, and the nodes
associated to each community. The CMN algorithm
implementation used is that provided by the R package
igraph [CN06].</p>
        <p>We then performed a correlation analysis among the
network metrics and the software metrics (size and
defectiveness) for each release on its own and also for the
entire dataset, in order to have relevant statistics.
Finally, in order to investigate the system evolution, we
studied the relationship between network metrics and
software defectiveness by cumulating the first and the
second releases in a single set, then adding the third
release to this first set to obtain a second set and so on.
Specifically, we evaluated if, with a starting dataset of
N releases, the best fitting curve for the cumulated
N 1 releases could also be a good fit for the N th
release. To measure the forecast accuracy we adopted
a 2 test. This way we were able to make predictions
about the next release starting from those cumulated
in the previous assembly.
5</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Results</title>
      <p>We performed di↵erent analyses among the network
metrics and the software metrics for each release and
the entire dataset. First and foremost, we noted a
saturation e↵ect of the number of defects and the
clustering coecient as the size of the analyzed systems
increases. Our results show a general tendency for
certain metrics to converge to a narrow range of values</p>
      <p>1500
n. classes
500
1000
2000
2500
3000
when the number of classes increases.</p>
      <p>Figures 1, 2 and 3 show the relationship between
systems’ size (number of classes) and, respectively,
modularity, average bug number (ABN) and clustering
coecient (CC).</p>
      <p>All the metrics display more or less the same
behavior. For relatively small systems, where the number of
classes is roughly below 100, the metrics assume
values in a wide range. Specifically, the defect density (or
ABN) ranges from 0 up to 25, the clustering coecient
and the modularity, whose maximum value may be 1,
range from 0 to 0.6-0.7. For system’s size between 100
and 500 roughly, the variation ranges become smaller:
the ABN lays between 2 and 12, the clustering
coefficient lays between 0.05 and 0.2, and the modularity
between 0.3 and 0.6. Finally, for fairly large systems,
where the number of classes is above 500 or more, the
metrics stabilize, showing small oscillations and
eventually converging asymptotically to precise values.</p>
      <p>Another interesting result is the monotonic increase
of the NOC metric with system’s size, reported in
Figure 4. After a first nonlinear behavior the curve is
aligned along a straight line. Additionally, our results
show a significant correlation between NOC and both
ABN and CC. Figure 5 displays a non-linear decay of
the maximum values of ABN and CC versus NOC. It
is worth to point out that for other network metrics,
such as the mean degree or the average path length,
there is not a similar trend.</p>
      <p>In particular, Figures 5a and 5b show respectively
the distributions of the maximum values of ABN and
CC with respect to NOC for all the sub-projects for
each release. Each point corresponds to the maximum
value of the corresponding metrics computed on all
the projects with the same number of communities.
.
n
g
u
b
v
a
0
2
5
1
0
1
5
c
c
x
a
m</p>
      <p>As these Figures illustrate, these values seem to follow
a power-law like trend.</p>
      <p>The distributions of the maximum values, when
analyzed using a log-log scale, are well fitted by a straight
line, suggesting two power-law-like relationships for
the maximum values of both ABN and CC versus the
number of communities, provided that the systems
have the same number of communities. We applied
a power-law best fitting algorithm in order to check
this hypothesis, finding acceptable best fittings for the
maximum values of CC and for the maximum values of
ABN versus the number of communities. The power
law parameters are reported in Tables 2 - 3. Table
4 shows the best fitting results, reporting the degrees
of freedom and the normalized 2 for the relationship
between these two metrics.
We analyzed a large software system, Eclipse, using
complex network theory with the aim of achieving a
better understanding of software properties by mean
of the associated software network. The application of
the CMN algorithm confirms that the analyzed
software networks present a meaningful community
structure [SˇZˇBB15, CMOT13]. Furthermore, the results
show the existence of meaningful relationships between
software quality, represented by the average bug
number (ABN), and community metrics, in particular the
number of communities (NOC) and clustering
coecient (CC).</p>
      <p>The presence of a strong community structure in
a software system reflects a strong organization of
classes in groups where the number of
dependencies among classes belonging to the same community
(inter-dependencies) is higher with respect to the
number of dependences among classes belonging to
di↵erent communities (external-dependencies).</p>
      <p>From a software engineering perspective this goal
might be achieved by adopting good programming
practices, where class responsabilities are well defined,
classes are strongly interconnected in groups, and
coupling among groups is kept low. Within this
perspective the network modularity can be seen as a proxy for
software modularity.</p>
      <p>Figure 1 shows that, with the exception of
subprojects with less than 500 classes, the modularity does
not increase along with the size, converging to values
that range from 0.6 to 0.7. As reported in Section 3,
these values indicate that the community structure is
significant and well defined. At the same time Figure
4 shows that there is a linear relationship between the
number of communities and the number of classes.</p>
      <p>Such relationship is not trivial: the modularity and
the number of communities are theoretically
independent by the size [GMC10] and, in general, the number
of communities does not increase with network’s size.
Moreover, by and large, there may be large networks
divided in a small number of communities, depending
on the network’s topology. As a consequence our
findings suggest that, in the examined case, it is possible
to partition the software networks into a set of
communities, where the number of communities is correlated
with system’s size.</p>
      <p>Figures 2 and 3 report, respectively, the
relationship of ABN and CC with the number of communities.
Both metrics have a similar trend, with values
converging to a range between 4 and 12 for ABN and between
0.2 and 0.6 for CC. This means that when the system’s
sizes increases the number of defects stabilizes and the
same happens to the clustering coecient. We already
mentioned the significant increment of the number of
communities with system’s size. Since the increment
of NOC is not trivial, this led us to assume that there
might be a relationship among the topology of software
networks, that determines the number of communities,
and the other metrics.</p>
      <p>Figures 5a and 5b show the distributions of the
maximum values of CC and ABN for the projects with
the same number of communities. As previously
mentioned, this relationship seems to follow a power-law
trend. The power law relating the NOC and the
maximum values of ABN indicates that the community
metrics, specifically the number of communities, can
be exploited in order to evaluate the evolution of the
defectiveness of a software system. In other words,
once the relationship between NOC and the maximum
values for ABN is known, one can evaluate
approximately the maximum ABN in a future release of the
same system, by computing the number of
communities for that release.</p>
      <p>This way, we might assume that systems with the
same number of communities should have a number of
defects per class lower than a given value. The same
argument applies to the clustering coecient of
systems having the same number of communities. The
relationship between CC and NOC is again a power
0.633
0.651
0.523
0.547
law. This implies that if the NOC of an initial release
(or of a set of releases) is known, one can in
principle predict that in the following releases the clustering
coecient will not be greater than a certain value.</p>
      <p>Figures 6 and 7 show, in a log-log scatterplot, the
best fitting lines for the data discussed above. Each
color corresponds to one set of releases cumulated
according to the chronological order. The Figures
confirm that the power-law like relationship appears in
every cumulated release and is a regular and stable
behavior throughout software evolution.</p>
      <p>The power law parameters for the mentioned
metrics and for each cumulated releases (see Section 4)
are reported in Tables 2 and 3. As we can see they do
not change significantly from one cumulated release to
another. This suggests the existence of a progressively
more stable behavior during software evolution, where
the fitting with a power law becomes more accurate
and tends to a fixed value as new releases are added in
the cumulated dataset. These results might help
developers to estimate the expected maximum ABN for
software systems with a known community partition.</p>
      <p>The two power laws indirectly connect the
maximum values of ABN with the maximum values of CC
in systems having the same number of communities.
Such relationship can be made explicit reporting
directly the scatter plot of the two metrics where each
metric is computed for the same number of
communities. Such plots are reported in Figure 6 for all the
cumulated releases, and show that the two metrics are
linearly correlated. Table 5 reports the correlation
coecient as well as the degrees of freedom and the 2 for
such data for all cumulated releases. It shows that the
correlation coecient increases and the 2 decreases as
new releases are added in the cumulated data,
indicating a more stable relationship among the two metrics
as the system evolves.</p>
      <p>These results can be explained by noting that the
larger the clustering coecient, the higher is the
number of classes linked to each other and the higher the
probability of di↵usion of defects among them. The
topology of a software network is characterized by
hubs, and the clustering coecient in the area of the
graph around any hub is higher by definition. If one
hub is a↵ected by one or more defects, it is more likely
releases 2.1 − 3.1
releases 2.1 − 3.2</p>
      <p>Internal Validity We conclude that the
relationship among software defectiveness and community
metrics can indicate that a high level of network
modularity is related to low defectiveness, thus
suggesting good programming practices.
However, the relationships we found could be due to
other phenomena, or deserve to be further
investigated. What we propose here is just one possible
formal explanation of our results.</p>
      <p>External Validity We only consider one Java
system, Eclipse, and analyzed its evolution. Our
results should be validated on other systems and
made more general. We are currently extending
the analysis to many releases of NetBeans.</p>
      <p>Construct Validity The rules reported in Section 5
might be faulty in some cases, not being able to
correctly map defects to CUs [AMADP07]. There
8</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusions</title>
      <p>In this work we presented a longitudinal analysis on
the evolution of a large software system with a
focus on software defectiveness. We used a complex
network approach to study the structure of the
system and its modularity by computing the community
structure of the associated network. After having
retrieved the number of defects and associated them to
the software network classes, we performed a
topological analysis of the system defectiveness. We found a
power law relationship between the maximum values of
the clustering coecient, the average bug number and
the division in communities of the software network.
This led to a linear relationship between the maximum
values of the clustering coecient and of the average
bug number. We showed that such relationship can
in principle be used as a predictor for the maximum
value of the average bug number in future releases.
[ASTC11]
[BAJ00]</p>
      <sec id="sec-8-1">
        <title>Gabor Csardi and Tamas Nepusz. The igraph software package for complex network research. InterJournal, Complex Systems:1695, 2006.</title>
      </sec>
      <sec id="sec-8-2">
        <title>Aaron Clauset, M. E. J. Newman, , and Cristopher Moore. Finding community structure in very large networks. Physical Review E, pages 1– 6, 2004.</title>
      </sec>
      <sec id="sec-8-3">
        <title>Hernn A. Makse Chaoming Song, Shlomo Havlin. Self-similarity of complex networks. Nature, 433(4):392–395, January 2005.</title>
      </sec>
      <sec id="sec-8-4">
        <title>Marco D’Ambros, Michele Lanza, and Romain Robbes. An extensive comparison of bug prediction approaches.</title>
        <p>In Mining Software Repositories (MSR),
2010 7th IEEE Working Conference on,
pages 31–41. IEEE, 2010.</p>
      </sec>
      <sec id="sec-8-5">
        <title>S. Fortunato and M. Barth´elemy. Resolution limit in community detection.</title>
        <p>Proceedings of the National Academy of
Sciences, 104(1):36, 2007.</p>
      </sec>
      <sec id="sec-8-6">
        <title>Sergio Focardi, Michele Marchesi, and</title>
        <p>Giancarlo Succi. A stochastic model of
software maintenance and its
implications on extreme programming processes,
pages 191–206. Addison-Wesley
Longman Publishing Co., Inc., Boston, MA,
USA, 2001.</p>
      </sec>
      <sec id="sec-8-7">
        <title>Santo Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75 – 174, 2010.</title>
      </sec>
      <sec id="sec-8-8">
        <title>B.H. Good, Y.A. De Montjoye, and A. Clauset. Performance of modularity maximization in practical contexts. Physical Review E, 81(4):046106, 2010.</title>
      </sec>
      <sec id="sec-8-9">
        <title>M Girvan and M. E. J. Newman. Com</title>
        <p>munity structure in social and biological
networks. Proc. Natl. Acad. Sci. U. S.
A., 99(cond-mat/0112110):8271–8276. 8
p, Dec 2001.</p>
      </sec>
      <sec id="sec-8-10">
        <title>Tracy Hall, Sarah Beecham, David</title>
        <p>Bowes, David Gray, and Steve Counsell.
A systematic literature review on fault
prediction performance in software
engineering. Software Engineering, IEEE
Transactions on, 38(6):1276–1304, 2012.</p>
      </sec>
      <sec id="sec-8-11">
        <title>Brian S. Mitchell and Spiros Man</title>
        <p>coridis. On the evaluation of the bunch
search-based software modularization
algorithm. Soft Comput., 12(1):77–93,
August 2007.</p>
        <p>Thomas J Ostrand, Elaine J Weyuker,
and Robert M Bell. Predicting the
location and number of faults in large
software systems. Software
Engineering, IEEE Transactions on, 31(4):340–
355, 2005.</p>
      </sec>
      <sec id="sec-8-12">
        <title>Thomas Zimmermann and Nachiappan Nagappan. Predicting defects using network analysis on dependency graphs.</title>
        <p>In Proceedings of the 30th
Interna</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [AMADP07]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ayari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Meshkinfam</surname>
          </string-name>
          , G. Antoniol, and
          <string-name>
            <given-names>M. Di</given-names>
            <surname>Penta</surname>
          </string-name>
          .
          <article-title>Threats on building models from cvs and bugzilla repositories: the mozilla case study</article-title>
          .
          <source>In Proceedings of the 2007</source>
          conference
          <article-title>of the center for advanced studies on Collaborative research</article-title>
          ,
          <source>CASCON '07</source>
          , pages
          <fpage>215</fpage>
          -
          <lpage>228</lpage>
          , New York, NY, USA,
          <year>2007</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Mahir</given-names>
            <surname>Arzoky</surname>
          </string-name>
          , Stephen Swift, Allan Tucker, and
          <string-name>
            <given-names>James</given-names>
            <surname>Cain</surname>
          </string-name>
          .
          <article-title>Munch: An efficient modularisation strategy to assess the degree of refactoring on sequential source code checkings</article-title>
          .
          <source>In ICST Workshops</source>
          , pages
          <fpage>422</fpage>
          -
          <lpage>429</lpage>
          . IEEE Computer Society,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>Scale-free characteristics of random networks: the topology of the world wide web</article-title>
          .
          <source>Physica A: Statistical Mechanics and its Applications</source>
          ,
          <volume>281</volume>
          :
          <fpage>69</fpage>
          -
          <lpage>77</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Design</given-names>
            <surname>Rules</surname>
          </string-name>
          :
          <article-title>The Power of Modularity Volume 1</article-title>
          . MIT Press, Cambridge, MA, USA,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>IEEE</given-names>
            <surname>Trans. Software Eng</surname>
          </string-name>
          .,
          <volume>20</volume>
          (
          <issue>6</issue>
          ):
          <fpage>476</fpage>
          -
          <lpage>493</lpage>
          ,
          <year>June 1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>Bug propagation and debugging in asymmetric software structures</article-title>
          .
          <source>pre</source>
          ,
          <volume>70</volume>
          (
          <issue>4</issue>
          ):
          <fpage>046109</fpage>
          ,
          <year>October 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>IEEE</given-names>
            <surname>Trans. Software Eng</surname>
          </string-name>
          .,
          <volume>37</volume>
          (
          <issue>6</issue>
          ):
          <fpage>872</fpage>
          -
          <lpage>877</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Giulio</given-names>
            <surname>Concas</surname>
          </string-name>
          , Cristina Monni, Matteo Orru`, and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Tonelli</surname>
          </string-name>
          .
          <article-title>A study of the community structure of a complex software network</article-title>
          .
          <source>In Proceedings of the 2013 ICSE Workshop on Emerging Trends in Software Metrics</source>
          , WETSoM
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>