<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semi-Supervised Overlapping Community Finding with Pairwise Constraints</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elham Alghamdi</string-name>
          <email>elham.alghamdi@ucdconnect.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Derek Greene</string-name>
          <email>derek.greene@ucd.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science &amp; Informatics, University College Dublin</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In complex networks, we say that a network has community structure if subsets of its nodes form dense, highly-connected groups. Algorithms for detecting communities are generally unsupervised, relying solely on the network topology. However, such algorithms can often fail to uncover structure that re ects the underlying communities in the data, particularly when those communities are highly overlapping. One of the ways to improve accuracy is by harnessing additional background information (e.g. from domain experts), which can be used as a source of constraints to guide the community detection process. In this work, we explore the potential of semi-supervised strategies to improve algorithms for nding overlapping communities in networks. Speci cally, we propose a greedy approach for nding communities using a limited number of pairwise constraints.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Many applications of machine learning do not neatly correspond to the standard
distinction between supervised and unsupervised learning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In many domains,
a limited degree of background knowledge will be available. Often supervision
will take the form of pairwise constraints, which describe the relations between
pairs of data objects. Such constraints have been used to guide and improve the
usefulness of clustering algorithms [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The idea of semi-supervised learning also
extends to the area of network analysis. Tasks such as community detection can
potentially bene t from the introduction of limited supervision originating from
individual domain experts or crowdsourcing, where this knowledge might be
encoded as constraints indicating that a pair of nodes should always be assigned
to the same community or should never be assigned to the same community.
By harnessing this knowledge, we can potentially uncover communities of nodes
which are di cult to identify when analyzing large or noisy networks.
      </p>
      <p>
        Initial work in community detection focused on the development of
algorithms which produce disjoint groups, where each node belongs one community
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, in many real-world networks, nodes will naturally belong to
multiple communities. In the case of both online and o ine social networks, we can
observe pervasive overlap where individuals belong to many highly-overlapping
social groups [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. More recently, overlapping community nding algorithms have
been developed for application to these real networks [
        <xref ref-type="bibr" rid="ref16 ref2">2, 16</xref>
        ]. In contrast, work
on semi-supervised community nding continues to focus on cases where
communities are required to be disjoint.
      </p>
      <p>
        In this paper, we propose a semi-supervised approach for community
nding, based on the concept of greedy clique expansion [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which we refer to as
Pairwise Constrained GCE (PC-GCE). We introduce must-link and cannot-link
constraints to both the initialization phase of the process and to the
subsequent community expansion process. Experimental evaluations on benchmark
synthetic networks demonstrate that the introduction of a relatively small
number of constraints can improve our ability to correctly uncover the underlying
communities in these networks.
      </p>
      <p>The remainder of this paper is structured as follows: Section 2 provides a
summary of relevant work pertaining to semi-supervised learning and
community nding. In Section 3 we describe the proposed approach for community
nding. To demonstrate the e ectiveness of the approach, in Section 4 we
perform a benchmarking evaluation on several synthetic networks. Finally, Section
5 presents concluding remarks and suggestions for extending this work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>Community Finding</title>
        <p>
          Finding non-overlapping communities. Algorithms in this context can be
broadly grouped into three types. (1) Hierarchical algorithms construct a tree of
communities based on the network topology. These algorithms can be one of two
types: divisive algorithms [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] or agglomerative algorithms [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. (2)
Modularitybased algorithms optimize the well-known modularity objective function to
uncover communities in the network [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. (3) Other algorithms. This category
includes algorithms based on label propagation approach, spectral algorithms that
make use of the eigenvectors of Laplacian matrix or standard matrix, and
methods based on statistical models [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>
          Finding overlapping communities. Existing algorithms in this context can
be classi ed into four main categories. (1) Node seeds and local expansion. These
algorithms detect communities by starting from a node or a group of nodes, then
expanding them into a community using a quality function. OSLOM [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] is an
example of such an algorithm, which uses a statistical function to evaluate the
node value to expand it to a community. Another example is MOSES [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], which
is an algorithm based on a statistical model and uses an objective function as a
greedy optimization technique. (2) Clique expansion. This type of method uses
a group of fully-connected nodes, called a clique, as the starting point for
expansion. CFinder [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and Greedy Clique Expansion (GCE) [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] are examples of this
type of algorithm. (3) Link clustering. This category of algorithms detects
communities by splitting the links rather than the nodes [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. (4) Label propagation.
This strategy classi es each node into a community based on its neighboring
nodes a nities. An example is the COPRA algorithm [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Semi-Supervised Learning</title>
        <p>
          Several forms of prior knowledge have been used to guide the community
detection process. The most widely-used has been pairwise constraints ("must-link"
or "cannot-link"), which indicate that two nodes must be in the same community
or must be in di erent communities. Such constraints have been implemented in
several algorithms, including a modularity-based method [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], a spectral analysis
method [
          <xref ref-type="bibr" rid="ref12 ref23">12, 23</xref>
          ], and methods based on matrix factorization [
          <xref ref-type="bibr" rid="ref22 ref23">22, 23</xref>
          ]. Instead of
constraints, other algorithms use node labels as prior knowledge to improve the
process of community detection [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. In [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], the authors propose a method that
uses a semi-supervised label propagation algorithm based on node labels and
negative information, where a node does not belong to a speci c community.
        </p>
        <p>
          The clear majority of semi-supervised algorithms in this area aim at detecting
disjoint communities, whereas many real-world networks contain overlapping
communities [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. To the best of our knowledge, very little work has been done
in the context of nding overlapping communities context. In [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], a small set of
nodes called seed nodes was used, whose a nities to a community is provided
as prior knowledge to infer the rest of the nodes a nities in the network. In the
remainder of this paper we focus on the problem of semi-supervised community
nding suitable for application to networks containing overlapping communities.
3
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <sec id="sec-3-1">
        <title>Greedy Clique Expansion (GCE)</title>
        <p>
          The GCE [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] community nding algorithm initially nds maximal cliques as
seeds, and subsequently expands these seeds into larger communities in a greedy
fashion, by optimizing a local tness function. Given a network G, a
userspeci ed minimum clique size k, and a minimum community distance e, the
GCE algorithm involves the following steps:
1. Find the seeds, which are all maximal cliques in G with at least k nodes.
2. Choose the largest unexpanded seed and greedily expand it into a candidate
community C0 by using a community tness function FS until adding any
node would not increase the tness value.
3. Test the distance between C0 and all previously accepted communities. If
the distance between C0 and an existing community C is &lt; e, then C and
C0 are deemed to be near-duplicates, so discard C0. Otherwise, accept C0.
4. Repeat steps 2 and 3 until no more seeds remain
GCE employs a tness function de ned [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] to expand each seed, which de nes
the tness community of a given community S in terms of its internal and
external degrees (kin and kout) as follows
where the parameter typically takes values in the range [0:9; 1:5]. Generally,
we can summarize the greedy expansion step using the tness function FS as
follows:
1. For each node v in the frontier of a given seed S, calculate v's tness value,
i.e., the amount by which the community tness of S would change if the
node v was added to S.
2. Choose the node that has the maximum tness value, vmax.
3. If the tness value vmax is positive, then insert the node v into S and go
back to Step 1. Otherwise, terminate and return S.
        </p>
        <p>To discard near-duplicate communities in Step 4 of the GCE algorithm, the
authors proposed the use of an overlap-based measure of distance between two
communities:</p>
        <p>E (S; S0) = 1</p>
        <p>jS \ S0j
min(jSj; jS0j)
(2)
Given two communities S and S0, Eqn. 2 measures the number of nodes in the
smaller community that are not included in the larger one. So, for a given set of
communities, a near-duplicate community of a given community S would be all
the communities that are within a distance e of S.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Pairwise Constraints in Community Detection</title>
        <p>Given a network that contains a set of nodes V , semi-supervised pairwise
constraints typically take two possible forms:
1. Must-link constraints specify that two nodes must be in the same community.</p>
        <p>Let CML be the must-link constraint set: 8 vi; vj 2 V where i 6= j, (vi; vj )
2 CML indicates that the two nodes vi and vj must be assigned to the same
community.
2. Cannot-link constraints specify that two nodes must be in di erent
communities. Let CCL be the cannot-link constraint set: 8 vi; vj 2 V where i 6= j,
(vi; vj ) 2 CCL indicates that the two nodes vi and vj must be assigned to
two distinct communities.</p>
        <p>The simplest approach to selecting this form of constraints is to naively select
a pair of nodes (vi; vj ) at random, and query the oracle about whether the pair
should share a must-link or cannot-link constraint. This process can be repeated
to select the required number of constraints or until some supervision budget is
exhausted.</p>
        <p>In non-overlapping community nding, must-link constraints have what is
referred to as a transitive property, where a third must-link relationship can be
inferred from two other associated must-link constraint pairs. So, if (vi; vj ) 2
CML, and (vi; vk) 2 CML, then we can also infer that (vj ; vk) 2 CML (see the
rst example in Fig. 1).</p>
        <p>However, incorporating constraints into the context of overlapping
communities is more challenging. This is because the transitive property does not hold
%&amp;
!"
!$
!#
%'</p>
        <p>%&amp;
!$
!"
%'
!#
%&amp;
!"
!$
!#
%'
Non-overlapping Case 1
Overlapping Case
Fig. 1: In the non-overlapping context, the transitive property allows us to infer a third
must-link constraint from two existing must-link constraints. However, this does not
automatically apply in the overlapping context, where two possible situations exist.
Must-link </p>
        <p>Set
Cannot-link </p>
        <p>Set
Must-link </p>
        <p>Set
Cannot-link </p>
        <p>Set
!" !#
!" !$
!" !%
!" !&amp;
!' !(
!) !*
!+ !,
!&amp; !%
!# !&amp;
!$ !%
!" !#
!" !$
!" !%
!" !&amp;
!' !(
!) !*
!+ !,
!- !,
!. !/
!0 !1
!&amp; !%
!# !&amp;
!$ !%
!$ !#
here (see the second example in Fig. 1). Speci cally, if (vi; vj ) 2 CML, and (vi; vk)
2 CML, there are two possible scenarios for the pair (vj ; vk). It can be the case
that either (vj ; vk) 2 CML or (vj ; vk) 2 CCL. This is because an overlapping
node vj can have a must-link constraint with both vi and vi, yet these two nodes
could belong to two di erent communities. However, it is also possible that all
three nodes are in fact in the same community. Unless we explicitly inform the
algorithm about whether a must-link or cannot-link constraint exists for the pair
(vj ; vk), we cannot reliably distinguish between the two cases. If the network has
highly-overlapping communities, then this problematic situation will occur more
frequently. If we naively attempt to incorporate pairwise constraints into
overlapping community nding without taking this situation into account, it is likely
that the quality of the resulting communities can potentially decrease even as
more constraints are added.</p>
        <p>To resolve this issue, after selecting pairwise constraints, we need to explicitly
detect every cannot-link pair that derived from any two connect must-link pairs
and insert it to the cannot link set. However, this will signi cantly increase the
set of cannot-link constraints. For example, if we want to feed only ve pairwise
constraints, each as shown in Fig. 2, inserting the required cannot-link pairs
will double the size of the cannot-link set. In the next section, we introduce an
approach to address this issue.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Semi-Supervised GCE</title>
        <p>We now describe our proposed approach for overlapping community nding
with limited supervision, referred to as Pairwise Constrained GCE (PC-GCE).
This approach consists of two stages: The rst stage includes selecting and
preprocessing constraints and resolves the problem of the lack of the transitive
property for must-link constraints in the context of overlapping communities.
The second stage supplies the resulting constraints to the GCE algorithm to
process them during community detection. In the following, these stages are
described in more detail.</p>
        <p>Stage 1: Selecting and pre-processing constraints. In this stage, we can
treat the set of pairwise constraints as a new graph, where an edge exists between
two nodes from the original network if they share a pairwise constraint (either
must-link or cannot-link). Then we look for all possible forbidden triads among
the nodes involved in the must-link set. Given three nodes A, B, C, a forbidden
triad (or open triad ) occurs when A is connected to B and C, but no edge exists
between B and C. In our pre-processing step, we look for such cases | i.e. where
we do not know whether a must-link or cannot-link exists between a pair of nodes
B and C. To control the size of the constraints set, we greedily expand it until
we reach a pre-de ned maximum size. This stage can be summarized as follows
(see also Fig. 3):
1. Choose a small initial random set of both must-link and cannot-link sets.
2. Find all possible forbidden triad cases in the must-link set to query the oracle
about their relationship.
3. If their relationship is must-link insert it into must-link set, otherwise insert
it into cannot-link set.
4. Repeat all steps until the set reaches the maximum size.</p>
        <p>At the end of this stage, the pairwise constraints set is ready to be supplied to
the GCE algorithm for community detection.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Stage 2: Pairwise Constrained GCE (PC-GCE). During the community</title>
        <p>detection phase, we incorporate only cannot-link constraints into the existing
GCE algorithm as follows (see Fig. 4 for an illustration):
1. Find seeds, which are all maximal cliques in G with at least k nodes.
2. Choose the largest unexpanded seed and greedily expand it to a candidate
community C0 by using a community tness function (Eqn. 1) until adding
any node no longer increases the tness value. However, during this
expansion process, do not add any node, which has a cannot link relationship with
any existing node in the seed.
3. Check for the existence of any cannot-link constraints among all pairs of
nodes in C0. If such a pair exists, calculate the tness for both nodes relative
to C0, and remove the one with the lower value.</p>
        <p>The justi cation for using only cannot-link constraints in the community
detection process is as follows: because of the greedy nature of the seed expansion
step, incorporating must-link constraints into this step results in a smaller
number of considerably larger communities. Mainly using the tness function as a
technique of greedy local optimization to expand clique to community already
achieves some of the must-link relationships, and processing cannot-link set will
mostly detect the pairs that derived from two connected must-link pairs. Thus,
using must-link set to be explicitly processed by the algorithm will be as
processing extra non-informative constraints which cause noise to the algorithm and
reduction of the accuracy.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>
        In this section, the performance of the Pairwise Constrained GCE algorithm
(PC-GCE) is evaluated by running experiments on two groups of synthetic
benchmark networks containing overlapping communities. Since, to the best of
our knowledge, no work has been conducted in the literature regarding
pairwise constrained algorithms for nding overlapping communities, for the sake of
comparison, the PC-GCE results are compared with the following unsupervised
overlapping community detection algorithms: standard GCE [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], OSLOM [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
MOSES [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], and COPRA [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
4.1
      </p>
      <sec id="sec-4-1">
        <title>Data</title>
        <p>
          The synthetic networks used in our experiments is generated using the
widelyused LFR benchmark generator [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], which can produce networks with properties
Step 1: Generate
small initial random
set of constraints.
        </p>
        <p>Step 2: Find all instances of
forbidden triads among the
must-link constraints.</p>
        <p>Step 3: Query the
oracle for constraints
for the missing pairs.</p>
        <p>Step 4: Generate
another random
set of constraints.</p>
        <p>Must</p>
        <p>Cannot
Must</p>
        <p>Cannot</p>
        <p>Must</p>
        <p>Cannot
Fig. 3: An illustration of the four steps involved in Stage 1 of semi-supervised GCE.
Step 1: Detect all maximal
cliques in the network
containing at least k nodes.</p>
        <p>Step 2: Expand each seed, Step 3: Process the Step 4: If C' overlaps with
skipping any node that has a cannot-link set for each any previously accepted
cannot-link with any existing resulting community C'. community, then discard it.</p>
        <p>node in the current seed. Otherwise, accept C'.</p>
        <p>Fig. 4: An illustration of the four steps involved in Stage 2 of semi-supervised GCE.
similar to real-world networks, which also contain embedded ground truth
communities. The full set of parameter values used to generate the networks is listed
in Table 1.</p>
        <p>
          In our experiments, we generate two di erent groups of networks, containing
small and large communities respectively. Small communities have 10{50 nodes,
while large communities have 20{100 nodes. Each group consists of 16 networks
with di erent combinations of the two parameters Om and On. The parameter
Om controls the number of communities per node, and On controls the
number of overlapping nodes. For the rst network in each set, all nodes belong to
two communities (Om = 2), then for each successive network this parameter
increments in value by 1 until Om = 5 is reached. For each value taken by the
parameter Om, we increase the fraction of overlapping nodes On by 25% until
100% of the nodes belong to more than one community.
To compare the performance of the di erent algorithms in our experiments, we
use the overlapping form of the standard Normalized Mutual Information (NMI)
measure, as proposed in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. This measures the level of agreement between the
communities produced by an algorithm on a network and the ground truth
communities in that network. A value close to 1 indicates a high level of agreement,
while a value close to 0 indicates that the algorithms communities are no better
than random.
        </p>
        <p>We have conducted two experiments. The rst experiment aims to measure
the current performance of the selected community detection algorithms. We
use these values as a baseline for evaluating the performance of the proposed
PC-GCE algorithm. The second experiment evaluates the performance of the
PC-GCE on di ering numbers of constraints, ranging from 1% to 5% of the
total number of possible constraints pairs in each network. In this experiment,
the initial pairwise constraints are selected at random. Therefore, we repeat the
process over 20 runs and average the NMI scores. Finally, we compare the results
obtained from PC-GCE with the selected benchmark algorithms.
oFvrearcltaipopninogf 255000 00..75563664 00..60103080 00..40905010 00..30509050 oFvrearcltaipopninogf 255000 00..75122117 00..60203050 00..40804040 00..40307070
We have two baselines for comparison and evaluation of the results. Firstly, we
compare the accuracy of PC-GCE to standard GCE (Tables 2). Secondly, we
compare the accuracy of PC-GCE algorithm to the other benchmark algorithms
(Table 3).</p>
        <p>As we observe from Tables 2, in most cases, regardless of the fraction of
overlapping nodes in the networks, PC-GCE outperforms the standard GCE
algorithm. To be more speci c, for all measures of fraction of overlapping nodes,
as the percentage of pairwise constraints increases, the accuracy of PC-GCE also
improves. On the other hand, as we increase the fraction of overlapping nodes
On from 25% to 100% of the total number of nodes, the NMI of GCE drops to
0 for almost 60% of the total set of results. For instance, in the case of small
community networks, the NMI score of GCE drops from 0.764 to 0 for Om = 3,
and from 0.648 to 0 for Om = 4. In contrast, the PC-GCE algorithm shows a
moderate decrease of accuracy as the value of On increases. This indicates that
PC-GCE outperform the standard GCE with highly-overlapping communities.
However, in general, both algorithms show better performance on the networks
containing smaller communities.</p>
        <p>We can see from Table 3 that PC-GCE algorithm outperforms COPRA
algorithm on both types of networks. When considering the smallest fraction of
overlapping nodes (i.e., On = 250), COPRA performs almost comparably with
PC-GCE. However, as the value of On increases, the performance of COPRA
drops to 0, whereas the performance of PC-GCE shows only a slight decrease in
the accuracy. Thus, it can be concluded that PC-GCE outperforms COPRA on
highly overlapping community networks. On the other hand, PC-GCE
outperforms MOSES on large networks but shows almost similar performance on small
networks. Finally, the NMI scores exhibit comparable overall performance with
PC-GCE in the case of OSLOM, on both types of networks.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper, we have explored the potential of semi-supervised strategies to
improve existing algorithms for nding overlapping communities in networks, and a
new algorithm for detecting overlapping communities with pairwise constraints
(PC-GCE) is proposed. Extensive experiments were carried out, and the results
show GCE algorithm with constraints (PC-GCE) outperforms unconstrained
algorithms with highly-overlapping communities, and its performance improves
with increasing number of pairwise constraints. This shows the potential of
using semi-supervised strategies for nding overlapping communities. However, in
most networks, only a small percentage of the nodes have informative pairwise
constraints. Thus, using random selection of pairwise constraints could have
adverse e ects by decreasing the accuracy of the community detection process
caused by the selection of non-informative nodes. Therefore, our future work
will aim to apply ideas from active learning for selecting informative pairwise
constraints. We will also explore the e ect of incorrect \noisy" constraints and
how they a ect algorithm performance.</p>
      <p>Acknowledgements. This publication has partly emanated from research
conducted with the nancial support of Science Foundation Ireland (SFI) under
Grant Number SFI/12/RC/2289.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Adamcsek</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palla</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farkas</surname>
            ,
            <given-names>I.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Derenyi</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vicsek</surname>
          </string-name>
          , T.:
          <article-title>C nder: locating cliques and overlapping modules in biological networks</article-title>
          .
          <source>Bioinformatics</source>
          <volume>22</volume>
          (
          <issue>8</issue>
          ),
          <volume>1021</volume>
          {
          <fpage>1023</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ahn</surname>
            ,
            <given-names>Y.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bagrow</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Link communities reveal multiscale complexity in networks</article-title>
          .
          <source>Nature</source>
          <volume>466</volume>
          (
          <issue>7307</issue>
          ),
          <volume>761</volume>
          {
          <fpage>764</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Amelio</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pizzuti</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Overlapping community discovery methods: a survey</article-title>
          .
          <source>In: Social Networks: Analysis and Case Studies</source>
          , pp.
          <volume>105</volume>
          {
          <fpage>125</fpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Basu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bilenko</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mooney</surname>
            ,
            <given-names>R.J.:</given-names>
          </string-name>
          <article-title>A probabilistic framework for semi-supervised clustering</article-title>
          .
          <source>In: Proc. 10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining</source>
          . pp.
          <volume>59</volume>
          {
          <issue>68</issue>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guillaume</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lambiotte</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lefebvre</surname>
          </string-name>
          , E.:
          <article-title>Fast unfolding of communities in large networks</article-title>
          .
          <source>J. Stat. Mech</source>
          <volume>10008</volume>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chapelle</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , Scholkopf,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Zien</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.):
          <article-title>Semi-Supervised Learning</article-title>
          . MIT Press, Cambridge (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Clauset</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Newman</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Finding community structure in very large networks</article-title>
          .
          <source>Physical review E</source>
          <volume>70</volume>
          (
          <issue>6</issue>
          ),
          <volume>066111</volume>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dreier</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuinke</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Przybylski</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reidl</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rossmanith</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sikdar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Overlapping communities in social networks</article-title>
          .
          <source>arXiv preprint arXiv:1412.4973</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Fortunato</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Community detection in graphs</article-title>
          .
          <source>Physics reports 486(3)</source>
          ,
          <volume>75</volume>
          {
          <fpage>174</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Girvan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Newman</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          :
          <article-title>Community structure in social and biological networks</article-title>
          .
          <source>Proceedings of the national academy of sciences 99(12)</source>
          ,
          <volume>7821</volume>
          {
          <fpage>7826</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Gregory</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Finding overlapping communities in networks by label propagation</article-title>
          .
          <source>New Journal of Physics</source>
          <volume>12</volume>
          (
          <issue>10</issue>
          ),
          <volume>103018</volume>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Habashi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>N.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ismail</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Enhanced community detection in social networks using active spectral clustering</article-title>
          .
          <source>In: Proceedings of the 31st Annual ACM Symposium on Applied Computing</source>
          . pp.
          <volume>1178</volume>
          {
          <fpage>1181</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lancichinetti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radicchi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramasco</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fortunato</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben-Jacob</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Finding statistically signi cant communities in networks</article-title>
          .
          <source>PLoS ONE</source>
          <volume>6</volume>
          (
          <issue>4</issue>
          ),
          <year>e18961</year>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lancichinetti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fortunato</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kertesz</surname>
          </string-name>
          , J.:
          <article-title>Detecting the overlapping and hierarchical community structure in complex networks</article-title>
          .
          <source>New Journal of Physics</source>
          <volume>11</volume>
          (
          <issue>3</issue>
          ),
          <volume>033015</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lancichinetti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fortunato</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radicchi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Benchmark graphs for testing community detection algorithms</article-title>
          .
          <source>Physical review E</source>
          <volume>78</volume>
          (
          <issue>4</issue>
          ),
          <volume>046110</volume>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reid</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDaid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hurley</surname>
          </string-name>
          , N.:
          <article-title>Detecting highly overlapping community structure by greedy clique expansion</article-title>
          .
          <source>In: Workshop on Social Network Mining and Analysis</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Leng</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
          </string-name>
          , Y., Cheng, J.,
          <string-name>
            <surname>Lv</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Active semi-supervised community detection algorithm with label propagation</article-title>
          .
          <source>In: International Conference on Database Systems for Advanced Applications</source>
          . pp.
          <volume>324</volume>
          {
          <fpage>338</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Extremal optimization-based semisupervised algorithm with con ict pairwise constraints for community detection</article-title>
          .
          <source>In: Advances in Social Networks Analysis and Mining (ASONAM)</source>
          ,
          <year>2014</year>
          IEEE/ACM International Conference on. pp.
          <volume>180</volume>
          {
          <fpage>187</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sui</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
          </string-name>
          , G.:
          <article-title>E ective semi-supervised community detection using negative information</article-title>
          .
          <source>Mathematical Problems in Engineering</source>
          <year>2015</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>McDaid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hurley</surname>
          </string-name>
          , N.:
          <article-title>Detecting highly overlapping communities with modelbased overlapping seed expansion</article-title>
          .
          <source>In: Advances in Social Networks Analysis and Mining (ASONAM)</source>
          , 2010 International Conference on. pp.
          <volume>112</volume>
          {
          <fpage>119</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Newman</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          :
          <article-title>Modularity and community structure in networks</article-title>
          .
          <source>Proceedings of the national academy of sciences 103(23)</source>
          ,
          <volume>8577</volume>
          {
          <fpage>8582</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Community detection in social network with pairwisely constrained symmetric non-negative matrix factorization</article-title>
          .
          <source>In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining</source>
          <year>2015</year>
          . pp.
          <volume>541</volume>
          {
          <fpage>546</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.Y.</given-names>
          </string-name>
          :
          <article-title>Community structure detection in complex networks with partial background information</article-title>
          .
          <source>EPL (europhysics letters)</source>
          <volume>101</volume>
          (
          <issue>4</issue>
          ),
          <volume>48005</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>