<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Series</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Revising the Newman-Girvan algorithm</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jana Coronicˇová Hurajová</string-name>
          <email>jana.coronicova.hurajova@euke.sk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomáš Madaras</string-name>
          <email>tomas.madaras@upjs.sk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The Faculty of Business Economics with seat in Košice, The University of Economics in Bratislava</institution>
          ,
          <addr-line>Tajovského 13, 041 30 Košice</addr-line>
          ,
          <country country="SK">Slovakia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The Faculty of Sciences, P.J. Šafárik University in Košice</institution>
          ,
          <addr-line>Jesenná 5, 040 01 Košice, Slovakia, WWW home page: umv.science.upjs.sk</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>1649</volume>
      <fpage>200</fpage>
      <lpage>205</lpage>
      <abstract>
        <p>One of the common approaches for the community detection in complex networks is the Girvan-Newman algorithm [5] which is based on repeated deletion of edges having the maximum edge betweenness centrality. Although widely used, it may result in different dendrograms of hierarchies of communities if there are several edges eligible for removal (see [6]) thus leaving an ambiguous formation of network subgroups. We will present possible ways to overcome these issues using, instead of edge betweenness computation for single edges, the group edge betweenness for subsets of edges to be subjects of removal.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        One of fundamental analyses performed in the exploration
of complex networks concerns the detection of their
community structure, which means to find, within a graph
representing the network, certain clusters of vertices which
are, at one side, sparsely interconnected and, on the other
side, they have dense in-cluster links by many edges. The
vertices within a cluster show a kind of similarities and
form functionally compact units. As there is no general
definition of cluster, there are many ways to obtain
collections of network communities; a comprehensive overview
of contemporary state-of-art in this area can be found in
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Among the approaches that determine the graph
communities by breaking it into smaller parts, an important
role plays the Girvan-Newman algorithm described first
in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It is based on successive deletion of edges which
have the maximum edge betweenness centrality which is
the quantity measuring the frequency of appearance of an
edge on geodesic paths in a graph. Formally, it is defined
σu,v(e)
as the sum B(e) = ∑ where σu,v is the
numu,v∈V (G) σu,v
ber of shortest u − v-paths and σu,v(e) is the number of
shortest u − v-paths which contain the edge e. Interpreting
∗Research supported by the project for young teachers, researchers
and PhD. students No. I-16-104-00
the edge betweenness as an amount of information flow
being propagated through a link between actors of a
complex network (and assuming that the information exchange
takes place mainly on shortest paths), one may argue that
distinct communities within a network are mutually
connected (and, hence, communicating) with relatively few
edges whose edge betweenness is higher than of those
ones between the actors of the same community. When
deleting those edges, the network tends to simplify,
eventually breaking into smaller subnetworks (note, however,
that after each deletion, edge betweenness centralities of
the resulting network shall be recalculated again). Thus,
we obtain a sequence of graphs starting from the
original one and ending with an edgeless graph, along with
the sequence of partitions the vertex set (the initial
partition is the whole set, the final one consists of isolated
vertices); when two consecutive graphs differ in their
connected components, we record the splitting (refining) of
the partition of the predecessing graph. In this way, the
sequence of partitions forms a dendrogram showing the
hierarchy of communities within the graph (the choice of the
appropriate level describing, in the best way, the
community structure of the graph, is a matter of external decision
and does not follow from algorithm).
      </p>
      <p>
        Despite the elegancy of Girvan and Newman approach
and the popularity of their algorithm, an attention recently
turns to other methods, mainly due to the fact that they are
quicker (the Girvan-Newman algorithm has, in general,
the complexity O(m2 · n), thus can be effectively used on
graphs up to n ∼ 10000, see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). Furthermore, it seems
that many implementations of community detection
algorithms which are based on recursive edge deletion do
not make difference when equivalent edges (for example,
with the same edge betweenness) are considered for
deletion. This issue was adressed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] where it was
demonstrated how the random deletion of different edges
with the same maximum edge betweenness centrality
results in different hierarchies of partitions, when used
on the wheel graph W6. A possible obvious suggestion to
remove all such edges at once (as discussed, for example,
in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in the connection with possible speeding up the
original Girvan-Newman algorithm) would, however,
individualize all the vertices (thus producing no
reasonable hierarchy) – even at the very beginning of the
process – of edge transitive graphs, and, more generally,
of so called edge betweenness-uniform graphs (that is,
the graphs whose edges have the same value of edge
betweenness centrality). Such graphs are not so rare: in
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], it was shown that each strongly regular graph (that is,
an n-vertex k-regular graph with the property that any pair
of its adjacent vertices has λ common neighbours, and any
pair of its nonadjacent vertices has μ common neighbours,
for certain n, k, λ , μ) is edge betweenness-uniform; since
it is also known that, for particular n, k, λ , μ, the number
of nonisomorphic strongly regular graphs is at least
exponential in terms of number of vertices (see [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). We tested
all edge betweenness-uniform graphs on 3–10 vertices
(their list was published first in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) for communities using
the procedures FindGraphCommunities[...,Method
-&gt; "Centrality"] (to obtain the list of sets of vertices
forming communities) and CommunityGraphPlot[...]
(to visualize communities within a graph) of
Wolfram Mathematica, or using the procedure
IGCommunitiesEdgeBetweenness[...] from the
Wolfram Mathematica third-party package IgraphM
(see [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). The results for graphs on 3–9 vertices
are shown in Figure 1 (the brown clusters
correspond to graph communities based on
partitioning by FindGraphCommunities[...,Method -&gt;
"Centrality"] procedure, the yellow clusters to the
ones based on IGCommunitiesEdgeBetweenness
procedure); one can see that, on some graphs, the community
structure is different although the underlying algorithm
should be the same (the most remarkable difference can be
observed on the blue-highlighted 9-vertex graph of Figure
2 obtained from two 6-cycles by identifying the
corresponding vertices of their maximum independent sets).
Also, for many of these graphs it seems that they have
no community structure (as both built-in and IGraphM
community finding procedure aggregate all vertices
into a single cluster); hence, when being processed by
algorithm of [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], one would have only two possibilities for
communities: either the whole vertex set or the partition
consisting of singletons (and the more reasonable choice
would be the single community). Nevertheless, our results
show that there are also edge betweenness-uniform graphs
for which both procedures (as well as other community
detection methods, like modularity maximization) return
non-trivial community structure which, however, cannot
be obtained by the algorithm of [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The revised Newman-Girvan algorithm</title>
      <p>In order to overcome – at least, on theoretical basis – the
problem to decide which edge has to be removed if there
are several ones with the same maximal edge
betweenness, we will utilize the concept of group edge
betweenness which is defined, for a subset A of edge set of a graph
σu,v(A)
G, as the sum B(A) = ∑ where σu,v(A) is the</p>
      <p>
        u,v∈V (G) σu,v
number of shortest u − v-paths which contain at least one
edge from A. Note that if |A| = 1 then we obtain the
standard edge betweenness as in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The revised
NewmanGirvan algorithm on a graph G then proceeds as follows:
starting with G0 = G, a sequence {Gi}ik=0 (where Gk is
edgeless graph) is constructed in such a way that, if there
appears, during the computation of edge betweennesses of
Gi, a set Mi of mi ≥ 2 edges all of them having the
maximum betweenness among the edges of Gi, then determine
the smallest ℓi such that there is unique subset Ebi ⊆ Mi of
ℓi edges with the property that the group edge
betweenness centrality of Ebi is the maximal among all subsets of
Mi consisting of ℓi edges (note that ℓi ≤ mi, thus it is well
defined). The graph Gi+1 is then obtained from Gi by
removing all edges from Ebi; if Gi+1 has more components
than Gi, the vertex sets of its components forms the new
level in hierarchy of partitions of V (G). The pseudocode
for this process is given in Algorithm 1; the used notation
follows the common standards of graph theory, the
particular specialized symbols are b0(G) (the zeroth Betti
number of G, that is, the number of its connected components),
hVii (the subgraph of G induced by the set Vi ⊆ V (G)) and
G \ Ei (the subgraph of G obtained by deleting all edges of
Ei).
      </p>
      <p>We have implemented the key elements of the
algorithm in Wolfram Mathematica 10 along with the
algorithm for edge group betweenness calculation. Since
the latter algorithm is – according to our
knowledge – not yet known to have effective
implementation, we used the straightforward approach which
determines, for each pair u, v of vertices of a graph
G, all shortest u − v-paths (in Wolfram language, this
can be done by calling procedure FindPath[G, u, v,
GraphDistance[G, u, v], All] and then checks how
many of them passes through an edge of the given edge
group. Our implementation of the edge group
betweenness algorithm – when being called on a single edge – is
also useful as an alternative for built-in Wolfram
Mathematica procedure EdgeBetweennessCentrality[..]
which returns numerical approximations (although with
high precision) of edge betweenness centralities whereas
our version returns exact values in the form of fractions.</p>
      <p>Let us note that our approach may lead, in
particular cases, to much worse performance of the
corresponding algorithm when compared with the original
NewmanGirvan algorithm; this is caused mainly by large number of
subsets of edges with the same maximum edge
betweenness which have to be checked to select the unique one
with the maximum group edge betweenness. This is,
however, the trade-off for getting rid of uncertainties in edge
removal.</p>
      <p>
        To show the difference of behaviour of our
algorithm in comparison with the original one or the one
of [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], consider the graph of Figure 3. It contains
1 3 2 2
1 2 3 4
3 3 5
      </p>
      <p>2
1 5
5
6
6
1</p>
      <p>4
RevisedNewmanGirvan(G)
Data: a graph G
Result: the hierarchy of nested partitions of V (G)
G0 := G ;
P0 := {V (G)} ;
i := 0 ;
c := 0 ;
while E(Gi) 6= 0/ do</p>
      <p>Si := {B(e) : e ∈ E(Gi)} ;
mx := max Si ;
Mi := {e ∈ E(Gi) : B(e) = mx} ;
Ebi := Mi ;
ℓi := 1 ;
while |Ebi| &gt; 1 do
ℓi := ℓi + 1;
Uℓi := {B(A) : A ⊂ Mi ∧ |A| = ℓi} ;
gmx := maxUℓi ;
end</p>
      <p>Ebi := {A ⊂ Mi : |A| = ℓi ∧ B(A) = gmx} ;
end
return {Pi : i = 0, . . . , c}
Algorithm 1: The group edge-centrality based
NewmanGirvan algorithm for community detection.
five edges with the maximum edge betweenness, namely
{8, 11}, {6, 9}, {3, 10}, {3, 9} and {2, 11}. Now, these five
edges form ten 2-element subsets with group edge
betweenness centralities 532 , 532 , 532 , 439 , 532 , 439 , 532 , 16, 532 , 532 ,
and ten 3-element subsets with group edge
betweenness centralities 26, 25, 25, 734 , 25, 25, 731 , 26, 25, 734 ; we
see that, among these subsets, the uniqueness with
respect to the maximum group edge betweenness
is not preserved. But, for five 4-element
subsets of {{8, 11}, {6, 9}, {3, 10}, {3, 9}, {2, 11}}, the group
edge betweenness centralities are 937 , 1301 , 938 , 937 , 937 ,
thus there is unique 4-element subset – the set
{{8, 11}, {6, 9}, {3, 10}, {2, 11}} – reaching the
maximum value 1301 . Hence, the edges of this 4-element set are
removed, and the sequence of single edge removals
continues with {1, 12}, {6, 7}, {4, 9} after which there are
detected two edges ({4, 7} and {1, 7}) with the same highest
edge betweenness. After they are removed, the edge
removal continues with {2, 12} and then with {2, 5} where
the graph splits, for the first time, into two components
with vertex sets {1, 2, 4, 6, 10, 11} and {3, 5, 7, 8, 9, 12}.</p>
      <p>On the other hand, when the algorithm of</p>
      <p>
        Gi+1 := Gi \ Ebi ;
if b0(Gi+1) &gt; b0(Gi) then [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is used on the same graph, first, five edges
c := c + 1 ; {8, 11}, {6, 9}, {3, 10}, {3, 9}, {2, 11} are removed at
Pc := {V1, . . . ,Vrc : once, followed by sequential removals of {3, 12}, {7, 8}
hV1i, . . . , hVrc i are connected components of Gi+1} and {8, 12} where the graph splits into two components,
; one of them being the single edge {3, 8}. Therefore,
end we see that the hierarchies of nested partitions produced
i := i + 1 ; by our algorithm and the one of [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] differ already at the
highest level. In addition, a particular run of the original
Newman-Girvan algorithm (using random selection of
an edge from the set of several edges with the same
maximum edge betweenness) on this graph may produce
yet another hierarchy: if the edge {3, 9} is removed first,
then the edges {3, 12}, {3, 8} and {3, 10} are removed
sequentially, thereby separating the single vertex 3 from
the rest of the graph.
      </p>
      <p>Note also that the existence of unique set of edges which
have to be removed depends heavily also on the edge
automorphism group Aut∗ of a graph. It is easy to see that, for
any edge automorphism ϕ of a graph G and any A ⊂ E(G),
B(A) = B(ϕ(A)) holds; consequently, if Aut∗(G) is
nontrivial and the edge automorphisms do not fix A, then there
are several different subsets of edges with the same group
edge betweenness as A. Thus the uniqueness of the edge
subset of particular size with the maximal group edge
betweenness cannot be guaranteed for graphs possessing a
lot of symmetries. Some particularly bad examples occur
among edge betweenness uniform graphs – in the wheel
W6, among all sets of edges of cardinality i ≤ 9, there
are always at least two distinct sets whose group edge
betweenness is maximal among all i-sets, hence, the unique
maximum group edge betweenness set coincides with the
whole edge set of W6, and the revised Newman-Girvan
algorithm breaks the vertex set of W6 into six singletons.</p>
      <p>
        Unfortunately, similar issues may appear also in real
networks. We illustrate this on the example of the
network of Zachary karate club [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] shown at Figure 4.
The standard Newman-Girvan algorithm removes the
edges with the maximum edge betweenness in the order
{32, 1}, {3, 1}, {9, 1}, {34, 14}, {34, 20}, {33, 3}, {31, 2},
{3, 2}, {4, 3} after which two edges with the same
maximum edge betweenness are detected, namely {14, 3}
and {8, 3}. After their simultaneous removal, the graph
splits into two components and the sequence of
removed edges continues with {34, 10}, {34, 28}, {10, 3}
after which again two maximum betweenness edges,
{7, 1} and {6, 1}, are detected; their removal yields
another pair of edges with the same maximum edge
betweenness, namely {1, 5} and {1, 11}. The
sequence of single edge removals continues with edges
{34, 32}, {33, 32}, {34, 29}, {26, 24}, {28, 24}, {9, 3}
followed by simultaneous removal of {34, 27} and {1, 12},
then by single removals of {30, 27}, {13, 1}, {13, 4}.
      </p>
      <p>Now here appears the situation when the graph
contains even 10 edges of maximum edge betweenness,
namely {34, 23}, {34, 21}, {34, 19}, {34, 16}, {34, 15},
{33, 23}, {33, 21}, {33, 19}, {33, 16} and {33, 15} (which
are all contained in the same star-like connected
component). However, the computation of group edge
betweenness for subsets of this edge set reveals that the
whole set has to be removed as there are always many
proper subsets of smaller sizes having the same maximum
group edge betweenness (this is most likely also caused
by symmetries of that particular connected component).</p>
      <p>
        Hence, for the network of Zachary karate club, the revised
algorithm just confirms that the order of edge removal
as obtained by the version of Newman-Girvan algorithm
from [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is probably optimal; nevertheless, it would be
interesting to find an example of real network where the
revised algorithm would lead to different sequence of
removed edges, or even a different hierarchy of graph
communities.
      </p>
      <p>
        An area where the revised Newman-Girvan algorithm
would apply concerns the looking for "null models",
that is, the graphs without community structure (see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
pages 90–91). Based on the above considerations, we
propose to take, for such graphs, the ones which are
edge betweenness-uniform, have trivial edge
automorphism group and, moreover, for each i which is less than
the number of edges, there are always at least two sets
consisting of i edges such that their group edge betweenness
is the maximal among all i-subsets. For these graphs, the
hierarchy of communities produced by our revised
algorithm collapses into singletons although their trivial
automorphism group should prevent easy replication of subsets
of edges with high group edge betweenness. At the
moment, no infinite family of such graphs is known;
nevertheless, we believe that the candidate graphs might be found
among strongly regular graphs, where are known
examples with trivial vertex automorphism group, and, possibly,
also ones having trivial edge automorphism group.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Despalatovic</surname>
            <given-names>´</given-names>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          , Vojkovic´,
          <string-name>
            <surname>T.</surname>
          </string-name>
          ,
          <article-title>Vukicˇevicˇ: Community structure in networks: Girvan-Newman algorithm improvement</article-title>
          . MIPRO, Opatija, Croatia, May
          <volume>26</volume>
          -30,
          <year>2014</year>
          ,
          <fpage>997</fpage>
          -
          <lpage>1002</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] Fon-der-</article-title>
          <string-name>
            <surname>Flaass</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          :
          <article-title>New prolific constructions of strongly regular graphs</article-title>
          .
          <source>Adv. Geom. 2</source>
          (
          <year>2002</year>
          )
          <fpage>301</fpage>
          -
          <lpage>306</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Fortunato</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Community detection in graphs</article-title>
          .
          <source>arXiv:0906.0612 [physics.soc-ph]</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Gago</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Coronicˇová Hurajová,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Madaras</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>Betweenness centrality in graphs</article-title>
          . In:
          <article-title>Quantitative graph theory: mathematical foundations and applications (Dehmer, M. and</article-title>
          <string-name>
            <surname>Emmert-Streib</surname>
          </string-name>
          , F., eds.), CRC Press,
          <year>2015</year>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Girvan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Newman</surname>
            ,
            <given-names>M. E. J.:</given-names>
          </string-name>
          <article-title>Community structure in social and biological networks</article-title>
          .
          <source>Proc. Natl. Acad. Sci. USA</source>
          <volume>99</volume>
          (
          <year>2002</year>
          )
          <fpage>7821</fpage>
          --
          <lpage>7826</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Coronicˇová</given-names>
            <surname>Hurajová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Madaras</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>The edge betweenness centrality - theory and applications</article-title>
          .
          <source>Journal of innovations and applied statistics 5 (1)</source>
          (
          <year>2015</year>
          )
          <fpage>20</fpage>
          -
          <lpage>29</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>[7] https://github.com/szhorvat/IGraphM</mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Zachary</surname>
            ,
            <given-names>W. W.:</given-names>
          </string-name>
          <article-title>An information flow model for conflict and fission in small groups</article-title>
          .
          <source>Journal of Anthropological Research</source>
          <volume>33</volume>
          (
          <issue>4</issue>
          ) (
          <year>1977</year>
          )
          <fpage>452</fpage>
          -
          <lpage>473</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>