<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Strength of Co-authorship Ties in Clusters: a Comparative Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michele A. Brandão</string-name>
          <email>micheleabrandao@dcc.ufmg.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mirella M. Moro</string-name>
          <email>mirella@dcc.ufmg.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidade Federal de Minas Gerais</institution>
          ,
          <addr-line>Belo Horizonte</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We analyze the strength of ties through three different clustering algorithms applied to co-authorship social networks from three different research areas. This study reveals if tie strength metrics can be used to evaluate clusters quality. We obtain different results for each algorithm and observe that Markov cluster algorithm provides the best results for co-authorship social networks. Also, researchers in overlapped communities detected by clique percolation method work as bridges.</p>
      </abstract>
      <kwd-group>
        <kwd>Social Networks</kwd>
        <kwd>Tie Strength</kwd>
        <kwd>Clustering Algorithms</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Clustering algorithms represent a classical problem of data mining and has many
applications over a plethora of domains. Then, identifying which algorithm is
proper to one such domain is a challenge per se. Likewise, evaluating the
quality of the created clusters is hard due to its problem-driven nature, as a good
clustering algorithm for a problem may not be as good for another [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        In the context of social networks (SN), clustering algorithms are useful for
detecting (finding) communities. Examples of studies include to explore regional
innovation systems, clustering effect in scientific communities and concentration
of developers in a country [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Specially in academic SN, detecting clusters helps
to discovery patterns that may increase the researchers’ productivity, reveal the
impact in research policy and understand group formation [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. However, once
again, the problem is how to verify the quality of the created clusters.
      </p>
      <p>
        Here, we apply clustering techniques in co-authorship SN, a type of academic
SN in which the nodes are researchers and there are edges between those who
have published together. By definition, a cluster in SN is a collection of
individuals with dense interactions patterns internally and sparse interactions externally
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Therefore, to evaluate the quality of the created clusters, we use existing
metrics to assess the strength of co-authorship ties intra and inter clusters.
      </p>
      <p>
        In summary, when the strength of ties is measured by metrics that consider
the neighborhood of nodes, the strength of ties intra cluster should be higher than
inter clusters. Hence, we measure tie strength using two metrics that provide such
information [
        <xref ref-type="bibr" rid="ref15 ref3 ref7">3,7,15</xref>
        ]: Neighborhood Overlap (N O), the collaboration between two
nodes regarding their neighbors; and co-authorship frequency (W ), the absolute
number of publications between two persons.
      </p>
      <p>
        Brief Related Work. There are many clustering techniques and they are
applied to different types of networks, for example, similarity graphs [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], directed
networks [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], social professional networks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and mobile SN [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. From these
techniques, we have chosen three that are commonly applied to undirected graphs
and represent good strategies commonly used to detect communities in SN [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Also, there are different ways to measure clustering quality, such as BetaCV,
C-index and modularity [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, identifying whether such metrics give the
expected answer for a graph is very difficult [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Moreover, most of these metrics
are biased and unreliable in larger real graphs. Indeed, in this work, we
investigate whether tie strength metrics can be used to evaluate clustering quality.
This study represents a new direction in the evaluation of clustering algorithms
and may help to fill this gap in the state-of-the-art.
      </p>
      <p>Contributions. Overall, our contributions are the analyses of: the distribution
of strong and weak ties intra and inter clusters and the dynamism of the strength
of ties through different clustering algorithms. Such analyses reveal whether tie
strength metrics can be used to evaluate clusters quality based on the definition
that ties intra a community should be strong and inter should be weak.</p>
      <p>
        Next, we present the analysis setup that includes creating reals SNs (Section
2). Then, we analyze three clustering methods: Louvain method (Section 3.1),
clique percolation method (Section 3.2) and Markov cluster algorithm (Section
3.3), and compare their results (Section 4). We have chosen such algorithms
because they are important to detect core groups (a.k.a. clusters or
communities) on SN [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Thus, we use both terms interchangeably and maintain the
nomenclature of the clustering methods’ authors.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Analyses Setup</title>
      <p>
        A co-authorship social network can be modeled as a weighted graph Gw =
(V; E w), with V the set of nodes and E w the set of non-directed weighted links.
Nodes are researchers (or authors), a tie between any two researchers exists if
they have published together, and the tie weight represents the absolute number
of publications between them, called as co-authorship frequency or simply W ,
which has been applied to measure the strength of ties [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        Another topological property to measure the strength of ties is Neighborhood
Overlap – N O [
        <xref ref-type="bibr" rid="ref3 ref7">3,7</xref>
        ]. The N O of an edge connecting researchers vi and vj is given
by the equation: jN (vjiN)[(Nvi)(\vjN)j(vfjv)ji;vjg , where N (vi) represents the co-authors of
researcher vi, and N (vj ) the co-authors of vj .
      </p>
      <p>Here, we build three co-authorship SNs using the CiênciaBrasil datasets1.
The publications available in CiênciaBrasil are from Brazilian researchers and
have been collected from Lattes, an online platform for archiving researchers’
curriculum vitae, in November 2013. Each network represents the co-authorships
among researchers from three areas: computer science, medicine and sociology.
Table 1 has the datasets statistics: number of authors (researchers), number of</p>
      <sec id="sec-2-1">
        <title>1 Datasets available at http://www.dcc.ufmg.br/ mirella/projs/apoena</title>
        <p>publications, average number of publications per author and number of pairs of
co-authors (and number of distinct pairs of co-authors).</p>
        <p>
          Considering these datasets, we apply three clustering algorithms: Louvain
method (LM), clique percolation method (CPM) and Markov cluster algorithm
(MCL). Then, we measure the strength of ties for each pair of researchers in
each cluster detected by the algorithms. Such strength is measured by using N O
and W . Following Brandão and Moro [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], we consider that a tie is weak when
N O is in the range [0; 0:2] and strong otherwise. Likewise, a tie is weak when W
is in the range [1; 5] and strong otherwise.
        </p>
        <p>
          Overall, analyses provide insights whether the strength of ties metrics can
be used to evaluate clustering quality. By clusters definition [
          <xref ref-type="bibr" rid="ref14 ref16 ref2">2,14,16</xref>
          ], ties
intraclusters should be strong and ties inter-clusters should be weak. Therefore, a
cluster should have most pairs of researchers (ties) classified as strong and most
ties that connect different clusters as weak.
        </p>
        <p>
          One of the problems in evaluating clustering quality is the absence of a ground
truth for comparison [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Thus, we now verify the strength of ties in a synthetic
data that represents a situation with perfect clustering. According to Harman
et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], a perfect clustering has a perfect modularization, i.e., all modules
in a cluster are connected to all other modules and there are no inter-cluster
connections. Thus, we build a graph with 17 nodes and 23 edges (two randomly
chosen prime numbers). We link the nodes in a way to form four clusters and
there are no connections among nodes from different clusters. Figures 1a and
1b present N O and W of a perfect clustering, respectively. Cluster #1 is the
largest one (7 nodes and 12 edges), cluster #2 is the second largest (4 nodes and
5 edges), clusters #3 and #4 have the same size (3 nodes and 3 edges).
        </p>
        <p>
          Note the minimum value of N O is 0.2, i.e., most communities are composed
by strong ties. The smallest clusters have N O equal to 1 (i.e., all ties are strongly
connected), because all nodes are connected to each other, but in a real social
network this hardly happens. We emphasize that a high N O indicates that pairs
of researchers are more connected to each other intra a cluster. Also, W of all
clusters has the median higher than 20. This is a property strictly related to the
frequency of nodes interactions – not always found in real networks. However,
co-authorship SN with a high degree of collaboration tend to have a high W [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
Hence, most detected clusters should have more strong ties than weak ones.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluated Clustering Techniques</title>
      <p>
        In this section, considering three real co-authorship networks, we analyze three
clustering techniques: Louvain method (Section 3.1), Clique Percolation method
(Section 3.2) and Markov Cluster algorithm (Section 3.3). For space constraint,
all graphs are presented in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and we discuss only the main findings next.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Community Detection Using Louvain Method</title>
        <p>
          The Louvain method (LM) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is a simple, efficient and a very common method
for detecting communities in large networks. It makes greedy seeks to optimize
the modularity of a partition of the network, where modularity is a topological
property and designed to measure the density of links intra communities [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. As it
works over unweighted networks, it allows to study the links between researchers
in clusters that are formed by the modularity and the network topology.
        </p>
        <p>Considering only computer science (CS), Figure 2 presents the results for
measuring intra and inter-communities created by LM using both N O and W .
Overall, medicine has more communities with smaller mean and median N O
values than computer science, but W of such communities are higher than
computer science. Also, in both areas, the communities with highest N O do not
indicate communities with highest W . In sociology, N O and W of researchers
in each community is small, but communities with the highest N O do not have
the highest W . Such aspects suggest that the strength of the intensity of
coauthorships among researchers measured by W does not always correspond to
the strength of the interactions among researchers’ neighbors measured by N O.
Moreover, there are communities with N O equal to zero, few edges compose
all such communities in the three SN, and all edges have one node in common.
These smaller communities are detected by LM because the researchers are not
(a) CS: LM Intra-community with N O
(b) CS: LM Intra-community with W
(c) CS: LM Inter-community with N O
(d) CS: LM Inter-community with W
densely connected to other communities. Also, considering the outliers, the
communities have less outliers to N O than to W . For future work, the study of such
outliers might reveal interesting properties about co-authorships.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Uncovering Communities with Clique Percolation Method</title>
        <p>
          The clique percolation method (CPM) locates the k-clique communities of
networks and considers that a typical node in a community is linked to many others,
but not necessarily to all other nodes in the community [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Overall, a
community is a union of smaller fully connected subgraphs that share nodes. Such
complete subgraphs are called k-cliques, where k refers to the number of nodes
in the subgraph. Then, k-clique-community is defined as the union of all k-cliques
that can be reached from each other through adjacent k-cliques [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>We apply this method using the algorithm implemented in CFinder2 and
k=3. By definition, a community is actually a connected graph when k=2 and</p>
        <sec id="sec-3-2-1">
          <title>2 CFinder: http://www.cfinder.org</title>
          <p>
            (a) Med: CPM Intra-community with N O (b) Med: CPM Intra-community with W
a set of disconnected nodes without any edge when k=1. The parameter k
determines the nature of the communities. Using different values for k reveals the
nature of the communities [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. We have chosen k=3 in order to discover
triangles and because such a value is also used in most general cases [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]. Finally, the
CPM allows overlap, i.e., a node can be a member of different communities at
the same time, and communities overlap with each other by sharing nodes.
          </p>
          <p>Figure 3a show the communities uncovered by the CPM (k=3) applied to
the medicine SN. Note that it considers the social network as unweighted. The
N O values reveal that although the communities are formed by cliques, some of
them have only weak ties (i.e., pairs of researchers weakly connected regarding
N O): seven in medicine, ten in computer science, and two in sociology. In other
words, cliques formed by co-authorship of researchers do not have only strong
interactions. Additionally, each community may have ties linking different cliques,
and such ties are also weak in communities with only weak ties. Other
communities have only strong ties: eight in medicine, four in computer science, and
two in sociology. It is also interesting to investigate these communities in order
to identify patterns in the high cooperativeness. In the remaining communities,
there is a mix of strong and weak ties. Furthermore, most communities have
the median and mean different from the others, meaning that researchers have
distinct behavior of co-authorship in each community.</p>
          <p>Regarding W as a measure of the strength of tie, Figure 3b shows that
most communities are composed by ties between researchers with small W . In
medicine and computer science, only one community has tie with W greater
than 10. In sociology, W does not reach five. Although the cliques compose such
communities, the high connectivity among researchers groups does not indicate
a strong intensity of co-authorship.</p>
          <p>
            According to CPM definition, one researcher may be in more than one
community. The number of overlaps is small in the three networks: in computer
science, only one researcher is in four communities; in sociology, there is no
overlap; and in medicine, one researcher is in three communities. An analysis
(a) CS: MCL Intra-communitity with NO (b) Soc: MCL Intra-communitity with NO
(c) CS: MCL Intra-communitity with W
(d) Soc: MCL Intra-communitity with W
Fig. 4: MCL intra-communities with N O and W for computer science (CS),
sociology (soc) – clusters’ identifiers in x axis are ordered by the size of communities.
suggests that researchers in overlapped communities have weak ties with other
researchers and work as a bridge (details available in [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]).
3.3
          </p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Clustering with MCL Algorithm</title>
        <p>
          The Markov Cluster Algorithm (MCL) is an unsupervised clustering algorithm
for graphs based on simulation of stochastic flow in graphs (known as network)
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. MCL deterministically finds cluster structures by computing the probability
of random walks though the network. We use the algorithm available in Micans3
and keep the default values of the parameters.
        </p>
        <p>One input to MCL is a file describing the graph edges: the source and target
nodes, and W as edges weight. The MCL interprets W of the edges as
similarity to cluster the nodes. In order to understand how N O and W influence on
clustering formation, we run the algorithm twice changing the value of the edge
weight (one time weights equal to N O and another to W ). Using N O, the MCL</p>
        <sec id="sec-3-3-1">
          <title>3 Micans: http://micans.org/mcl</title>
          <p>has found 140 clusters in computer science, 35 in sociology and 139 in medicine.
Then, using W , the MCL has detected 82 clusters in computer science, 16 in
sociology and 68 in medicine. Some clusters are composed of only one node, and
they are more present in clusters formed with N O as edge weight. This result
indicates that the similarity among researchers is lower considering N O than W .</p>
          <p>Figure 4 shows the results ordered by the size of communities when N O and
W between researchers are considered as edge weight. The graphs do not include
clusters with only one node for clarity. There are communities formed only by
weak ties and only by strong ties, for example, the communities #25 and #34 in
computer science, respectively. However, most communities include both types
of tie. Considering the clusters size, the biggest communities are in the beginning
of each graphic. For example, clusters #0 and #1 are the largest in sociology.
The number of nodes in the largest clusters for N O and W as edges weight is
respectively 30 and 27 for computer science, four nodes (two communities of the
same size) and four nodes (four communities of the same size) for sociology, 22
and 17 for medicine. Figure 4 also presents that the largest clusters are more
formed by strong ties than weak ties, because the first quartile of these clusters
is higher or equal to 0.2. Then, the largest communities have high W , because
the third quartile pass 30 in the three areas.</p>
          <p>
            Lastly, MCL does not find ties connecting researchers from different
communities for the three co-authorships SN. This reveals that MCL provides a good
clustering result since clustering algoritms minimizes inter-cluster edges [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ].
4
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Comparative Analyses</title>
      <p>We now compare the results of the three clustering methods. Figures 5 and 6
contrast the clusters of each method regarding N O and W , respectively. We
observe that LM tends to find less and larger clusters than the other two methods.
Also, MCL detects a huge number of clusters, and some of them are singleton.
In the co-authorship context, although CPM allows community overlaps, as a
researcher may publish with researchers from others communities, MCL provides
the best clusters because most of the detected ones are composed by strong ties.</p>
      <p>Moreover, Figure 5 shows a high concentration of edges until N O reaches
0.6 in computer science and medicine. In sociology, the maximum value of N O
is 0.5, and there is more concentration of edges between 0.2 to 0.3. Also, note
that CPM and MCL exclude edges with N O equal to 0. Some strong ties are
also removed in CPM, probably because these edges are in a 2-clique (we choose
k=3 for CPM). Here, MCL better differentiates the relationships putting them in
distinct clusters. On the other hand, the concentration of points in CPM and LM
does not show the same for these methods. Additionally, Figure 6 shows a high
concentration of edges for W less than 100 in computer science and medicine,
and less than 10 in sociology. We also note that co-authorship frequencies equal
to zero are not removed in any clustering method. Overall, the three algorithms
form clusters with weak and strong ties. However, MCL mostly detects clusters
with more strong ties than weak ones.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Concluding Remarks</title>
      <p>We applied three clustering algorithms in three co-authorship SN. For the
unweighted LM, its evaluation results showed it identifies less clusters than the
others. When applying CPM, there was a small number of overlaps between
communities and researchers in the overlaps are weak ties (they work as bridges).
For MCL, we have applied it twice in each algorithm: one with N O as weight
and another with W . MCL identified a larger number of clusters than the other
methods. Furthermore, the tie strength inter-communities tends to be weak for
LM and CPM; whereas MCL algorithm does not find edges inter-communities.</p>
      <p>A main conclusion of using N O and W in clustering evaluation is: MCL is the
best clustering algorithm to be applied in co-authorship SN when compared to
LM and CPM. Nevertheless, we also conclude that considering only the strength
of ties metrics is not enough to define clustering qualities. Therefore, in the next
steps, we plan to apply internal measures (like BetaCV, C-index, and so on) to
compare with the results generated by the tie strength metrics. Also, we plan to
investigate how the network structure affects the clustering results.
Acknowledgments. Work funded by CAPES, CNPq and FAPEMIG, Brazil.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Almeida</surname>
          </string-name>
          et al., H.:
          <article-title>Is there a best quality metric for graph clusters</article-title>
          ? In: ECMLPKDD. (
          <year>2011</year>
          )
          <fpage>44</fpage>
          -
          <lpage>59</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Blondel</surname>
          </string-name>
          et al., V.D.:
          <article-title>Fast unfolding of communities in large networks</article-title>
          .
          <source>Journal of Statistical Mechanics: Theory and Experiment</source>
          <year>2008</year>
          (
          <volume>10</volume>
          ) (
          <year>2008</year>
          ) P10008
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Brandão</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>M.: Analyzing the strength of co-authorship ties with neighborhood overlap</article-title>
          .
          <source>In: DEXA</source>
          . (
          <year>2015</year>
          )
          <fpage>527</fpage>
          -
          <lpage>542</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Brandão</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moro</surname>
            ,
            <given-names>M.M.:</given-names>
          </string-name>
          <article-title>A comparative analysis of the strength of co-authorship ties in clusters</article-title>
          .
          <source>Technical Report 4</source>
          ,
          <string-name>
            <surname>UFMG</surname>
          </string-name>
          (March
          <year>2017</year>
          ) http://www.dcc.ufmg.br/˜mirella/projs/apoena.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Brandão</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moro</surname>
            ,
            <given-names>M.M.:</given-names>
          </string-name>
          <article-title>Social professional networks: A survey and taxonomy</article-title>
          .
          <source>Computer Communications</source>
          <volume>100</volume>
          (
          <year>2017</year>
          )
          <fpage>20</fpage>
          -
          <lpage>31</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Deb</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vishveshwara</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vishveshwara</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Understanding protein structure from a percolation perspective</article-title>
          .
          <source>Biophysical journal 97(6)</source>
          (
          <year>2009</year>
          )
          <fpage>1787</fpage>
          -
          <lpage>1794</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Easley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kleinberg</surname>
          </string-name>
          , J.:
          <article-title>Networks, crowds, and markets: Reasoning about a highly connected world</article-title>
          . Cambridge University Press (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Harman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swift</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahdavi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>An empirical study of the robustness of two module clustering fitness functions</article-title>
          .
          <source>In: GECCO</source>
          . (
          <year>2005</year>
          )
          <fpage>1029</fpage>
          -
          <lpage>1036</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hassanzadeh</surname>
            et al.,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Framework for evaluating clustering algorithms in duplicate detection</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .
          <volume>2</volume>
          (
          <issue>1</issue>
          ) (
          <year>2009</year>
          )
          <fpage>1282</fpage>
          -
          <lpage>1293</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A detection of overlapping community in mobile social network</article-title>
          .
          <source>In: Procs. of ACM SAC</source>
          . (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kshitij</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>B.M.</given-names>
          </string-name>
          :
          <article-title>Embedded information structures and functions of co-authorship networks: Evidence from cancer research collaboration in india</article-title>
          .
          <source>Scientometrics</source>
          <volume>102</volume>
          (
          <issue>1</issue>
          ) (
          <year>2015</year>
          )
          <fpage>285</fpage>
          -
          <lpage>306</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Malliaros</surname>
            ,
            <given-names>F.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vazirgiannis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Clustering and community detection in directed networks: A survey</article-title>
          .
          <source>Physics Reports</source>
          <volume>533</volume>
          (
          <issue>4</issue>
          ) (
          <year>2013</year>
          )
          <fpage>95</fpage>
          -
          <lpage>142</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mishra</surname>
            et al.,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>Clustering social networks</article-title>
          . In Bonato,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.R.K</surname>
          </string-name>
          ., eds.
          <source>: Algorithms and Models for the Web-Graph</source>
          . Springer (
          <year>2007</year>
          )
          <fpage>56</fpage>
          -
          <lpage>67</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Palla</surname>
            et al.,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Uncovering the overlapping community structure of complex networks in nature and society</article-title>
          .
          <source>Nature</source>
          <volume>435</volume>
          (
          <issue>7043</issue>
          ) (
          <year>2005</year>
          )
          <fpage>814</fpage>
          -
          <lpage>818</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Silva</surname>
            et al.,
            <given-names>T.H.P.</given-names>
          </string-name>
          :
          <article-title>Community-based endogamy as an influence indicator</article-title>
          .
          <source>In: JCDL</source>
          . (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Van Dongen</surname>
            ,
            <given-names>S.M.:</given-names>
          </string-name>
          <article-title>Graph clustering by flow simulation</article-title>
          .
          <source>PhD thesis</source>
          , Utrecht University (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>