Community Detection and Correlated Attribute Cluster
               Analysis on Multi-Attributed Graphs

             Hiroyoshi Ito† , Takahiro Komamizu‡ , Toshiyuki Amagasa‡ , and Hiroyuki Kitagawa‡
                           † Graduate School of Systems and Information Engineering, University of Tsukuba
                                           ‡ Center for Computational Sciences, University of Tsukuba

                    hiro.3188@kde.cs.tsukuba.ac.jp,taka-coma@acm.org,{amagasa,kitagawa}@cs.tsukuba.ac.jp

ABSTRACT                                                                                    However, community detection and extraction of semantics
Multi-attributed graphs, in which each node is characterized by                         in multi-attributed graphs remain challenging due to difficul-
multiple types of attributes, are ubiquitous in the real world. De-                     ties on integrating graph structures and multiple attributes of
tection and characterization of communities of nodes could have                         different types. Community detection and extraction of seman-
a significant impact on various applications. Although previous                         tics involves multiple steps. First, useful information from each
studies have attempted to tackle this task, it is still challenging                     attribute must be extracted because certain node attributes de-
due to difficulties in the integration of graph structures with mul-                    scribe different aspects. Second, all extracted information must
tiple attributes and the presence of noises in the graphs. There-                       be exploited to enhance community detection by effectively in-
fore, in this study, we have focused on clusters of attribute val-                      tegrating heterogeneous information. Notice that the previous
ues and strong correlations between communities and attribute-                          works [19, 23, 25, 26] do not differentiate multiple attributes, that
value clusters. The graph clustering methodology adopted in the                         is, they consider multiple attributes equally. Moreover, real-world
proposed study involves Community detection, Attribute-value                            graphs are often incomplete and noisy. That is, some edges or
clustering, and deriving Relationships between communities and                          nodes may be missing or attribute values may contain incorrect
attribute-value clusters (CAR for short). Based on these concepts,                      values, leading to inappropriate results.
the proposed multi-attributed graph clustering is modeled as                                To overcome these difficulties, we propose a novel clustering
CAR-clustering. To achieve CAR-clustering, a novel algorithm                            scheme based on the following two assumptions:
named CARNMF is developed based on non-negative matrix                                  (1) Relevant attribute values form clusters by attribute type. This
factorization (NMF) that can detect CAR in a cooperative man-                           is based on the observation that an attribute reflects a node’s
ner. Results obtained from experiments using real-world datasets                        interests in a network. Hence, an attribute tends to be associated
show that the CARNMF can detect communities and attribute-                              to a specific group of values related to an interest. For example, in
value clusters more accurately than existing comparable methods.                        a co-author network where the nodes correspond to the authors
Furthermore, clustering results obtained using the CARNMF indi-                         (researchers), each author typically has specific research interests
cate that CARNMF can successfully detect informative communi-                           (e.g., AI, data mining, and database). Thus, attributes (e.g., paper
ties with meaningful semantic descriptions through correlations                         title and conference) present biased values according to interests.
between communities and attribute-value clusters.                                       Consequently, it is possible to identify clusters of attributes values
                                                                                        (attribute-value clusters) reflecting a node’s interests.
                                                                                        (2) Communities are strongly correlated with attribute-value clus-
1    INTRODUCTION                                                                       ters. This is related to the previous assumption. Consider the
Community detection is a task to detect densely connected sub-                          example above. The nodes in a community share similar inter-
graphs as communities. Nodes in a community tend to share same                          ests (e.g., research interests) and consequently, similar attribute-
or similar properties, such phenomenon is called homophily ef-                          value clusters (e.g., research topics, and conferences). Conversely,
fect [11, 17], meaning that nodes having similar properties tend                        if some nodes (researchers) have similar attribute values, they
to link together. Because diverse applications are derived from                         should share similar interests and can be grouped in the same
the nature of real communities, community detection is impor-                           community.
tant in graph/network analyses. Examples include node property                              Exploiting the correlation between communities and multiple
estimations [7, 9, 24], community-wise information recommen-                            attributes should improve the quality of community detection
dations [10], and semantic reasoning for nodes/edges [1].                               as well as attribute-value clustering. Using the information from
   Moreover, using the attributes in a graph is advantageous to                         different sources (attributes) to alleviate the effect of noise (e.g.,
realize high-quality community detection as well as to under-                           missing values and errors), we simultaneously implement com-
stand the characteristics of communities. Multi-attributed graphs                       munity detection and attribute-value clustering.
are reasonable models of real-world networks such as social net-                            Based on the aforementioned ideas, we study a novel clustering
works, co-author networks, protein-protein interaction networks,                        scheme for multi-attributed graphs, called CAR-clustering. CAR
etc. In fact, several works have proposed algorithms that employ                        includes Community detection, Attribute-value clustering, and
attribute information (i.e., shared interests or functional behav-                      deriving Relationships between communities and attribute-value
iors of each community) to detect not only communities but also                         clusters for multi-attributed graphs. Additionally, we develop a
their semantic meanings [19, 23, 25, 26].                                               novel clustering algorithm called CARNMF, which employs a
                                                                                        non-negative matrix factorization (NMF).
© 2018 Copyright held by the owner/author(s). Published in the Workshop                     The contributions of this paper are summarized as follows:
Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018, Vienna,
Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted           • We propose a novel clustering scheme CAR-clustering to
under the terms of the Creative Commons license CC-by-nc-nd 4.0.                              address two technical questions. (i) Given a multi-attributed


                                                                                    2
      graph, how can community detection and attribute-value               individually, but solves community detection and attribute-value
      clustering be performed for different types of attributes in         clustering in a unified manner.
      a cooperative manner? (ii) How should reasonable relation-
      ships be determined between communities and attribute-               3     PROBLEM STATEMENT
      value clusters for each type of attribute?                           In this work, we deal with multi-attributed graphs, where each
    • We develop a novel algorithm CARNMF, which achieves                  node is characterized by two or more attributes. Given such a
      CAR-clustering. Specifically, a dedicated loss function is           graph, CAR-clustering is used to solve the following three sub-
      designed to perform multiple NMFs simultaneously.                    problems: community detection, attribute-value clustering, and
    • We conduct experiments using real-world datasets (DBLP               derivation of relationships between communities and attribute-
      computer science bibliography and arXiv physics bibliog-             value clusters, which have been independently studied. Below,
      raphy). The accuracy of CARNMF with respect to commu-                we provide the formal definitions which are necessary to define
      nity detection and attribute-value clustering and a compar-          the clustering scheme.
      ison to other methods are examined. Relative to compara-
      tive methods, CARNMF achieves a better accuracy of up to             3.1    Multi-Attributed Graph
      11% for community detection and up to 22% for attribute-
      value clustering. Furthermore, CARNMF detects informa-               Multi-attributed graph G is defined by extending weighted graph
      tive communities and their rich semantic descriptions by             G ′ with several attributed graphs Gt for attribute t ∈ T. The
      correlating multiple types of attribute-value clusters.              following are formal definitions.

                                                                              Definition 1 (Weighted graph). Weighted graph G ′ is de-
2   RELATED WORK                                                           fined by a triplet, ⟨V, E, W⟩, where V is a set of nodes, E(⊆ V × V)
                                                                           is a set of edges, and W : E → R+ is a map of edge weights. □
Community detection in graphs is a current topic of interest in
graph analysis and AI research. Existing works for non-attributed             Definition 2 (Attributed garaph). Attributed graph Gt =
graphs can be categorized according to the techniques used:                ⟨V ∪ Xt , Et , Wt ⟩ of attribute t ∈ T is a bipartite graph consisting
graph separation [9, 20], probabilistic generative model [27], and         of set V of nodes, set Xt of attribute-values, a set of edges Et ⊆
matrix factorization [12, 18, 24]. [9] defined modularity, which           V × Xt , and Wt : Et → R+ is a map of edge weights. □
indicates how separated a community is from other nodes. More
comprehensive surveys can be found in [6, 22].                                Definition 3 (Multi-attributed graph). Given weighted
   Recently, several works have addressed the problem of de-               graph G ′ = ⟨V, E, W⟩ and a set of attributed graphs {Gt }t ∈T
tecting communities and their semantic descriptions on node-               where Gt = ⟨V ∪ Xt , Et , Wt ⟩, multi-attributed graph
attributed graphs. [25] proposed CESNA, where communities and              G = ⟨G ′, {Gt }t ∈T ⟩ is a union of these graphs. □
their attributes are simultaneously detected in an efficient man-
ner. [23] proposed SCI to detect communities and their semantics           3.2    CAR-clustering
using NMF. [19] proposed a probabilistic generative model called           Given a multi-attributed graph, information can be extracted
the author-topic model to model communities and related top-               from different perspectives. In this work, we extract communities,
ics. [2] proposed COMODO to detect communities with shared                 attribute-value clusters, and the relationship between them.
properties using subgroup discovery techniques. Likewise, [26]                Community. For a multi-attributed graph, a set of nodes with
proposed LCTA, where communities and their topics are mod-                 the following properties is regarded as a community. (1) Nodes in
eled separately, and then their relationships are modeled using            a community are densely connected with each other and sparsely
a probabilistic generative model. A comprehensive survey over              connected with other nodes. (2) Nodes in a community tend to
these works can be found in [5].                                           share common values in distinct attributes. This study assumes
   The aforementioned works only consider single textual at-               that communities can overlap. That is, each node belongs to
tributes or uniformly handle multiple attributes without any dis-          more than one community. This assumption is reasonable for
tinction. In reality, each attribute represents different aspects of       real applications. Formally, given the number of communities
the nodes. In our research, we deal with heterogeneous attributes          ℓ, node n ∈ V belonging to community c ∈ C is described by
individually. In addition to community detection, we perform               probability distribution p(n | c), where |C| = ℓ.
clustering over attribute values for each attribute, which, in turn,          Attribute-value cluster. For attribute t ∈ T in a multi-attributed
can be used to improve the quality of communities detected.                graph, similar or highly correlated attribute values can be grouped
   Some works have investigated clustering over networks con-              into attribute-value clusters. Herein, we assume overlapping clus-
taining different types of nodes and/or edges. [3] studied com-            ters. That is, each attribute-value belongs to more than one cluster.
munity detection with characterization from multidimensional               Formally, given the number of clusters kt of attribute t ∈ T, clus-
networks, which is defined as a graph consisting of a set of nodes         ter member x ∈ Xt for attribute-value cluster st ∈ St is described
and multiple types of edges. [4] studied subgraph detection from           by probability distribution p(x | st ), where |St | = kt .
multi-layer graphs with edge labels. In contrast, we assume a                 Relationship between a community and an attribute-
different model where each node is characterized by multiple               value cluster. Nodes in a community often share common attribute-
attributes. [21] proposed a scheme of ranking-based clustering             value clusters. Detecting such relationship is useful in many ap-
for multi-typed heterogeneous networks, where two or more                  plications. Given community c ∈ C and attribute-value cluster
types of nodes are included. Similarly, [16] proposed an NMF-              st ∈ St of attribute t ∈ T, the probability that c is related to st
based method for such networks. These methods differ from ours             is described as the relationship between c and st . In this work,
in that they define a cluster consisting of all types of nodes. In         a community may be related to more than one attribute-value
other words, these methods cannot handle each attribute in a               cluster. Formally, this is described by probability distribution
unique way. In contrast, our work deals with different attributes          p(st | c).


                                                                       3
   CAR-clustering. CAR-clustering is formally defined by Def-                      a matrix V (t ) ∈ R |Xt |×kt , where each row and column corre-
inition 4.                                                                         spond to an attribute x ∈ Xt and an attribute cluster st ∈ St ,
                                                                                                            (t )
                                                                                   respectively. A cell Vx,st represents probability p(x | st ).
   Definition 4 (CAR-clustering). Given a multi-attributed
                                                                                      To derive V from X (t ) , we introduce a matrix U (t ) ∈ R |V|×kt ,
                                                                                                     (t )
graph G, CAR-clustering is to perform community detection, attribute-
                                                                                   which denotes the relationships between the nodes and attribute-
value clustering, and detection of the relationship between the com-
                                                                                   value clusters with probability p(u | st ). Using both matrices
munities and the attribute-value clusters simultaneously.□
                                                                                   U (t ) and V (t ) , probability p(u, x | st ), which is the existence
                                                                                              (t )
   Solving these sub-problems simultaneously is more beneficial                    of edge eu,x ∈ Et in terms of attribute-value cluster st , is cal-
than evaluating each one independently because, in many cases,                                     (t )     (t )
                                                                                   culated as Uu,st Vx,st . Moreover, probability p(u, x) is derived
communities and attribute-value clusters are mutually correlated.                     Í          (t ) (t )
                                                                                   as st ∈St Uu,st Vx,st . Therefore, when U (t ) , V (t ) minimize loss
Solving the problems simultaneously exploits this correlation,
leading to improved results.                                                       function, U (t ) , V (t ) represent the best approximation of the edges
                                                                                   in the graph.
4    CARNMF – ALGORITHM FOR                                                                                                                          2
     CAR-CLUSTERING                                                                                       arg min       X (t ) − U (t ) (V (t ) )T          (2)
                                                                                                     U (t ),V (t ) ≥0                                F
In this section, we propose an NMF (non-negative matrix factorization)-
                                                                                                                                      (t )
based algorithm, called CARNMF, for CAR-clustering. CARNMF                                             s.t . ∀1 ≤ r ≤ kt , V·,r = 1
                                                                                                                                            1
models communities and attribute-value clusters. Additionally,
                                                                           Loss function for relationship detection. In CARNMF, the
we introduce an auxiliary matrix to maintain the relationship
                                                                        relationships between communities and attribute-value clusters
between the communities and the attribute-value clusters. A
                                                                        of attribute t ∈ T are represented as a matrix R (t ) ∈ Rℓ×kt , where
unified loss function is used to solve the different NMFs in a
                                                                        each row and column corresponds to a community c ∈ C and an
unified manner. It is assumed that the user gives the number ℓ
                                                                        attribute-value cluster st ∈ St , respectively. The cell contains the
of communities and the number kt of clusters for each attribute
t ∈ T.                                                                  probability p(st | c). We assume R (t ) is a linear transformation
                                                                        that maps U ∗ into U (t ) , where U ∗ and U (t ) derived by Equation 1
                                                                        and Equation 2, respectively. Therefore, when R (t ) minimizes
4.1 Matrix representation
                                                                        the loss function, R (t ) represents the relationships between the
We represent a multi-attributed graph by two sorts of matri-            communities and the attribute-value clusters.
ces: an adjacency matrix A ∈ R |V |× |V| and attribute matrices
X (t ) ∈ R |V|× |Xt | for t ∈ T. An element Au,v of A corresponds
                                                          Í                           arg min           U (t ) − U ∗ R (t )
                                                                                                                               2
                                                                                                                                                              (3)
to an edge eu,v = (u, v) ∈ E. Au,v = W(eu,v )/ ei, j ∈E W(ei, j ),                                                              F
                                                                                    (t )   ∗
                                                                                  U ,U , R ≥0  (t )
indicating the joint probability for the presence of edge eu,v .
                                                                                                                                                (t )
                                         (t )
Similarly, for t ∈ T, an element Xu,x in X (t ) corresponds to an                               s.t . ∀1 ≤ p ≤ ℓ, U ·,p      ∗
                                                                                                                                      = 1, Rp, · = 1
                                    (t ) Í
                                                                                                                                    1                 1
        (t )          (t )                                (t )
edge eu,x ∈ Et . Xu,x = Wt (eu,x )/ v,y ∈Et Wt (ev,y ), indicating      Equation 3 can be regarded as an NMF that decomposes the ma-
the joint probability of the presence of edge eu,x .
                                                     (t )               trix of the node-by-attribute value cluster into node-by-community
                                                                        and community-by-attribute value cluster matrices. In other
4.2 Loss Function                                                       words, Equation. 3 indicates the effect of the relationship be-
                                                                        tween nodes and attribute-value clusters against communities.
We achieve CAR-clustering in terms of several NMFs, which cor-
                                                                           Unified loss function. To achieve CAR-clustering, the afore-
respond to the aforementioned sub-problems. To achieve CAR-
                                                                        mentioned three sub-problems must be solved. In this work, we
clustering, we introduce loss functions for the sub-problems fol-
                                                                        attempt to solve them simultaneously by introducing a unified
lowed by a unified loss function.
                                                                        loss function, which is expressed as
   Loss function for community detection. In CARNMF, com-
                                                                                                                                             2
munities C are denoted by a matrix U ∗ ∈ R |V |×ℓ , where each row          L=              arg min                    A − U ∗ (U ∗ )T
and column correspond to a node u ∈ V and a community c ∈ C,                     U ∗, {U (t ), V (t ), R (t ) } t ∈T                         F
respectively. A cell Uu,c represents probability p(u | c). In proba-
                            ∗
                                                                                 Õ                                                                         
                                                                                                                         2                                2
                                                                                                        (t ) (t ) T
bility p(u, v | c), u and v are connected through community c, and            +            X (t )
                                                                                                    − U      (V      )     +   λ  t   U  (t )
                                                                                                                                              − U  ∗ (t )
                                                                                                                                                     R        (4)
                                                                                                                         F                                F
is represented by Uu,c     ∗ U ∗ . Moreover, joint probability p(u, v),
                                                          Í
                                v,c                                              t ∈T
or the existence of edge eu,v ∈ E, is expressed as c ∈C Uu,c   ∗ U∗ .
                                                                  v,c           s.t . ∀1 ≤ r ≤ kt , ∀1 ≤ p ≤ ℓ, ∀t ∈ T,
Therefore, when U minimizes the following loss function, U ∗ is
                       ∗                                                                                 (t )              (t )
                                                                                     ∗
                                                                                 ∥U ·,p   ∥1 = 1, ∥V·,r ∥1 = 1, ∥Rp, · ∥1 = 1
the best approximation of the edges in the graph.
                                                                        where λt for attribute t ∈ T is a user-defined parameter to control
                                                                        the effect of attribute-value clusters for community detection.
                                               2
                    arg min A − U ∗ (U ∗ )T                         (1) Higher λt yields a stronger effect of the attribute-value clusters
                      U ∗ ≥0                   F                        in community detection.
                                               ∗
                          s.t . ∀1 ≤ c ≤ ℓ, U ·,c 1 = 1
                                                                                   4.3     Optimization
where ∥·∥F2 and ∥·∥1 represents the Frobenius norm and the ℓ1                      Similar to the ordinary NMF, the loss function in Equation 4 is
norm, respectively.                                                                not simultaneously convex for all variables. Hence, we consider
   Loss function for attribute-value clustering. In CARNMF,                        the NMF to be a Frobenius norm optimization, where update
attribute-value clusters St of attribute t ∈ T are represented as                  equations are derived based on [14].


                                                                              4
   Considering the Karush-Kuhn-Tucker (KKT) first-order condi-                                                          Algorithm 1 Optimization Algorithm
tions applied to our problem, we derive:
                                                                                                                        Input: A, {X (t ) }t ∈T , {λt }t ∈T , δ
                                                                                                                        Output: U ∗ , {U (t ) , V (t ) , R (t ) }t ∈T
                   U ∗ ≥ 0, U (t ) ≥ 0, V (t ) ≥ 0, R (t ) ≥ 0                                            (5)            1: U ∗ , {U (t ) , V (t ) , R (t ) }t ∈T ← random non-negative init
                                                                                                                         2: ϵ ′ ← maxFloat, ϵ ← 2
                                                                                                                                                                ϵ′
           ∇U ∗ L ≥ 0, ∇U (t ) L ≥ 0, ∇V (t ) L ≥ 0, ∇R (t ) L ≥ 0                                        (6)
                                                                                                                         3: while abs(ϵ ′ − ϵ) ≥ δ do
                       ∗
                    U ⊙ ∇U ∗ L = 0, U                  (t )
                                                              ⊙ ∇U (t ) L = 0,                                                                                     Í
                                                                                                                                                         AT U ∗ +             λ U (t ) (R (t ) )T
                                                                                                                            4:     U ∗ ← U ∗ ⊙ 2U ∗ (U ∗ )T U ∗ +  Í t ∗ (t ) (t ) T
                                                                                                                                                                t ∈T
                                                                                                                                                                      U R (R )
                   V   (t )
                              ⊙ ∇V (t ) L = 0, R         (t )
                                                                ⊙ ∇R (t ) L = 0                           (7)                                                              t ∈T
                                                                                                                            5:     U ∗ ← U ∗ (Q ∗ )−1
   where ⊙ is the element-wise product. From the Karush-Kuhn-                                                               6:     for t ∈ T do
Tucker (KKT) conditions, we derive derivatives corresponding to
                                                                                                                                                               (t ) (t )         ∗ (t )
                                                                                                                            7:        U (t ) ← U (t ) ⊙ U (tX) (VV(t ) )T+λ tU R
                                                                                                                                                                         V (t ) +λ U (t )
the variables:                                                                                                                                                                           t
                                                                                                                                                                 (X (t ) )T U (t )
                                                                                                                            8:        V (t ) ← V (t ) ⊙ (V (t ) )T (U (t ) )T U (t )
                                                                                                                                      U (t ) ← U (t )Q (t )
                                                                                                                                                             −1
            ∇U ∗ L = − 2AT RU ∗ + 2U ∗ (U ∗ )T U ∗                                                                          9:
                       Õ
                     +    λt (−U (t ) (R (t ) )T + U ∗ R (t ) (R (t ) )T )                                (8)            10:          V (t ) ← V (t ) Q (t )
                                                                                                                                                             (U ∗ )T U (t )
                               t ∈T                                                                                      11:       R (t ) ← R (t ) ⊙ (U ∗ )T U ∗ R (t )
           ∇U (t ) L = − X        (t ) (t )
                                        V     + U (t ) (V (t ) )T V (t )                                                                             −1
                                                                                                                         12:       R (t ) ← R (t ) Q R
                            + λt (U (t ) − U ∗ R (t ) )                                                   (9)
                                                                                                                         13:   end for
           ∇V (t ) L = − (X         (t ) T
                                        ) U    (t )
                                                      + (V    (t ) T
                                                                    ) (U   (t ) T
                                                                               ) U      (t )
                                                                                                         (10)            14:   ϵ ′ ← ϵ                                    
                                    ∗ T                   ∗ T                                                            15:   ϵ ← L U ∗ , {U (t ) , V (t ) , R (t ) }t ∈T
                ∇Ri L = − (U ) U             (t )
                                                    + (U ) U R        ∗ (t )
                                                                                                         (11)
                                                                                                                         16: end while
  By substituting the corresponding gradients in Equation 4, we
derive the following update rules:
                                                                                                                        4.4        Complexity Analysis
                               Í
                    AT U ∗ + t ∈T λt U (t ) (R (t ) )T                                                                  Here, we analyze the computational complexity of the proposed
                                  Í                                                                      (12)
          ∗  ∗
         U ←U ⊙
                2U ∗ (U ∗ )T U ∗ + t ∈T U ∗ R (t ) (R (t ) )T                                                           algorithm. The equations in our algorithm have the following
                                                                                                                        complexities:
     U (t ) ← U (t ) ⊙
                                     X (t )V (t ) + λt U ∗ R (t )
                                                                                                         (13)                                                                    Í
                                  U (t ) (V (t ) )T V (t ) + λt U (t )                                                      • Updating U ∗ (Eqs. 12 and 16) needs O(|E|ℓ + |V|ℓ 2 t kt ).
                                                                                                                            • Updating U (Eqs. 13 and 18) and
                                                                                                                                          (t )
                                                                                                                                                                  V (Eqs. 14 and 18)
                                                                                                                                                                      (t )
                                        (X (t ) )T U (t )
     V (t ) ← V (t ) ⊙                                                                                   (14)                  needs O (|V| + |Xt |)kt2 + |Et |kt .
                                  (V (t ) )T (U (t ) )T U (t )                                                                                                                        
                                                                                                                            • Updating R (t ) (Eqs. 15 and 19) needs O |V|(ℓkt + ℓ 2 ) .
                                  (U ∗ )T U (t )                                                                           In summary, the time complexity of our algorithm is follows,
      R (t ) ← R (t ) ⊙                                                                                  (15)
                                 (U ∗ )T U ∗ R (t )                                                                     where iter is the number of outer iterations (lines 3–16 in our
   The aforementioned update rules monotonically decrease Equa-                                                         algorithm).
                                                                                                                                                                                     !
tion 4. However, these variables may violate the probability defi-                                                                Õ                                               
nition (i.e., their sum does not equal one). To satisfy constraints,                                                                         2      2           2
                                                                                                                           O iter       |V|(ℓ kt + kt ) + |Xt |kt + |E|ℓ + |Et |kt     (21)
    ∗ ∥ = 1, ∥V (t ) ∥ = 1 and ∥R (t ) ∥ = 1, the variables are
∥U ·,p                                                                                                                                    t
       1            ·,r 1            p, · 1
normalized immediately after updating. The normalization is
expressed as
                                                                                                                        5        EXPERIMENTAL EVALUATIONS
                                                                                                                        To demonstrate the applicability and effectiveness of CARNMF,
                                                                                                                        we conducted a set of experiments using real-world datasets.
            U ∗ ← U ∗ (Q ∗ )−1                 (16)                  U (t ) ← U (t )Q (t )               (18)           Specifically, the performance of the proposed scheme was com-
                                                                                          R −1                          pared to simple baseline and state-of-the-art methods.
     V   (t )
                ←V   (t )
                            (Q   (t ) −1
                                    )          (17)             R   (t )
                                                                           ←R   (t )
                                                                                       (Q )              (19)
                                                                                                                           The experiments were performed on a PC with an Intel Core i7
                                                                                                                        (3.3 GHz) CPU with 16 GB RAM running Ubuntu14.04. CARNMF
where Q ∗ = Diaдonalize(U ∗ ), Q (t ) = Diaдonize(V (t ) ), and Q R
                                                                                                         (t )
                                                                                                                =       was implemented by Python 2.7.6 with Numpy 1.9.0.
Diaдinalize(R (t ) ).
                                                                                                                        5.1        Datasets
                                                                                                     !
                                                              Õ
                                                              a                      Õ
                                                                                     a                                  We used two datasets: DBLP and arXiv.
   Diaдonalize(Z ∈ Ra×b ) = Diaд                                     Z i,1 · · · ,           Z i,b       (20)                    • DBLP: Digital Bibliography Project1 is a bibliographic data-
                                                              i=1                      i=1
                                                                                                                                   base in the computer science area. DBLP contains publi-
Diaд(·) provides a diagonal matrix where the diagonals are the                                                                     cation information, such as authors and conferences. We
input sequence.                                                                                                                    used a part of the dataset by extracting conferences simi-
   Algorithm 1 shows the optimization algorithm based on the                                                                       lar to [8]. We extracted four research areas: data mining,
aforementioned update rules. Matrix normalization is applied                                                                       databases, machine learning, and information retrieval,
after the updates. Without normalization, each matrix may have                                                                     and five major conferences for each area. Consequently,
significantly different values, leading to inconsistent results. Al-
gorithm 1 describes the order of update rules and normalizations.                                                       1 http://dblp.uni-trier.de/


                                                                                                                    5
   Table 1: Selected conferences on four research areas.                  Each rectangle shows the top contributing nodes in the com-
          DB               DM            ML             IR                munity/cluster, and the edge weights show the strength of the
     SIGMOD, VLDB       KDD, ICDM     NIPS, ICML    SIGIR, ECIR           relationship between the community and the corresponding clus-
       PODS, EDBT       PKDD, SDM     ECML, UAI     JCDL, ECDL            ter. We chose famous researchers in different research domains
         ICDT             PAKDD         COLT           TREC
                                                                          (i.e., Jiawei Han, Michael Stonebraker, and Michael I. Jordan).
      Table 2: Selected journals on four research areas.                      Figure 1(a) show the community and the correlated attribute-
                                math-ph
                                                                          value clusters of Jiawei Han, who is a leading researcher in data
               Communications in Mathematical Physics                     mining and database areas. The results show that (1) he collab-
                   Reviews in Mathematical Physics                        orates with Chinese researchers, (2) he publishes many papers
                    Letters in Mathematical Physics
                   Journal of Mathematical Physics                        related to data mining and database conferences (i.e., KDD, ICDM,
                                nucl-th                                   SDM, PAKDD, and VLDB), and (3) his researches are highly cor-
             Annual Review of Nuclear and Particle Science                related with topics in data mining, such as clustering and classifi-
                Progress in Particle and Nuclear Physics
                 Atomic Data and Nuclear Data Tables                      cation on large graph.
                     Journal of Nuclear Materials                             Similarly, Figure 1(b) shows the result for Michael Stonebraker,
                                astro-ph                                  a renowned database researcher. His community is strongly re-
                         Astronomical Journal
             Annual Review of Astronomy and Astrophysics                  lated to conferences in databases (SIGMOD, VLDB, PODS, EDBT,
                       New Astronomy Reviews                              and ICDT ). Topics such as view management, distance metric, and
                         Space Science Review                             query evaluation are detected. Figure 1(c) shows the result for
                               cond-mat
                        Nature Nanotechnology                             Michael I. Jordan, an expert in machine learning research. This
                    Smart Materials and Structures                        community is strongly related to the conferences of machine
                        Semi Conductor Science                            learning, (NIPS, ICML, UAI, COLT, and ECML) and the topics like
                      Journal of Materials Science
                                                                          learn network, expert model, and prediction.
                                                                              The detected communities and the associated attribute-value
       10,491 papers in 20 conferences (shown in Table 1) were            clusters seem to be reasonable.
       selected.
     • arXiv: arXiv2 is a repository of electronic preprints in           5.3     Accuracy Comparison
       various scientific fields. Similar to above, we chose four         The proposed scheme is compared to a baseline method as well
       research areas: mathematical physics (math-ph), nuclear            as state-of-the-art methods to quantitatively evaluate the perfor-
       (nucl-th), astrophysics (astro-ph), and materials (part of         mance of community detection and attribute-value clustering.
       cond-mat), and four major journals for each area. Conse-           The comparison methods include:
       quently, 12,497 papers in 16 journals (shown in Table 2)
                                                                                • NMF [15]: Baseline approaches that apply NMF for bi-
       were selected.
                                                                                  nary relationships between graph components, including
   Multi-attributed graphs were constructed from the datasets as                  author-term (A-T), author-paper (A-P), author-conference
follows: The nodes correspond to the authors. If two authors co-                  (A-C), term-paper (T-P), and term-conference (T-C)3 .
author a paper, we placed a weighted edge between the authors.                  • LCTA [26]: A probabilistic generative model for commu-
The weighting denotes the number of co-authored papers. Each                      nities, topics of textual attributes, and their relationships.
author has attributes term, paper, and conference/journal, which                • SCI [23]: An NMF based method for detecting commu-
are defined below:                                                                nities as well as their semantic descriptions via node’s
     • term: Each term is regarded as a node. An edge is generated                attribute values.
       between an author and a term if the author uses the term                 • HINMF [16]: A model that clusters objects and attributes
       in the titles of at least one paper. The edge weight denotes               simultaneously and takes the consensus among the binary
       the term frequency for each author. As a preprocessing,                    NMFs. This work is the most similar to our proposal.
       we applied stop-word elimination and stemming.                         Note that, LCTA and SCI deal with a single concatenated
     • paper: Each paper is regarded as a node. An edge is gen-           feature of multiple attributes. Therefore, we prepare concatenated
       erated if the author publishes the paper. The edge weight          feature consisting of term, document and conference/journal, and
       is always 1.0 because each paper can only be published             apply these approaches on the feature.
       once.                                                                  To evaluate the qualities of these methods, we compared the
     • conference/journal: Each conference or journal corresponds         accuracy [26] w.r.t. community and attribute-value clustering
       with a node. An edge is created between an author and              w.r.t. paper and conference/journal. We designed a ground truth
       a conference/journal if the author publishes at least one          to measure the accuracy. To derive the ground truth, each author
       paper at the conference/journal. The edge weight is the            is labeled based on research areas of their papers, in other words,
       total number of publications at the conference/journal.            if the author mostly published papers for the specific area, the
                                                                          author is labeled as that area. Similarly, the labels for confer-
5.2 Results of CAR-clustering                                             ence/journal and paper were manually given by referring to the
Figure 1 shows examples of the detected communities and their             conference categories.
associated attribute-value clusters in DBLP. The number of com-
munities and the number of term clusters were each 50, whereas               Definition 5 (Accuracy). Given a set S of elements, for each
the number of conference clusters and the number paper clusters           element n ∈ S, the true label and the cluster label generated by a
were each 4. The red, blue and gray rectangles correspond to              method are denoted by sn and r n , respectively. Then, the accuracy
communities, term clusters, and conference clusters, respectively.        3 Because NMF assumes the co-occurrences of binary relationships, paper-
2 https://arxiv.org                                                       conference (one-to-one relationship) is excluded.


                                                                      6
                                                                                                CARNMF achieved the best performance for community de-
                                                                    cluster   0.123          tection (author) and attribute-value clustering (paper and confer-
                                                                                             ence/journal) with significant gaps for DBLP dataset (respectively
                                                                    data      0.063
                                                                    base      0.056

                                                                                             11%, 22% and 7%) and for arXiv dataset (respectively 3%, 4% and
                                                                    eﬃci      0.047
                                                          0.044     dimension 0.047


                                    jiawei han    0.223
                                                                                             2%) relative to the comparative methods. In particular, CARNMF
           KDD
           ICDM
                   0.304
                    0.202
                                    xifeng yan    0.049
                                    hong cheng 0.026
                                                                    larg
                                                                    classif
                                                                                 0.092
                                                                                 0.085
                                                                                             has an improved clustering quality compared to NMF by tak-
           SDM      0.096
                            0.999
                                    ke wang    0.018
                                                            0.039
                                                                    graph        0.079       ing the relationships between communities and attribute-value
                                                                                             clusters into account.
           PAKDD 0.094              xiaoxin yin   0.011             scale        0.067
           VLDB  0.073              feida zhu     0.010             attribut     0.051

                                                                                                Table 5 summarizes evaluation of effects from taking multiple
                                    chen chen     0.010


                                                                    approach
                                                                    method
                                                                                 0.244
                                                                                 0.116
                                                                                             attributes into account. The table showcases results where differ-
                                                           0.037    support      0.097       ent combinations of attributes are used, e.g., “(T-C)” means term
                                                                                             and conference attributes were used. This result shows that the
                                                                    structur     0.074
                                                                    select       0.051

                                                                                             proposed method works the best when taking as many attributes
                        (a) Communities of “jiawei han”.
                                                                                             as possible. As expected, the basic tendency is that as the number
                                                                                             of attributes increases, the accuracy increases.
                                                                    system
                                                                    view
                                                                                0.104
                                                                                0.062
                                                                                                Table 3 lists the detected topics from DBLP using CARNMF
                                                          0.046
                                                                    updat
                                                                    mainten
                                                                                0.061
                                                                                0.0545
                                                                                             when the number of topics is set to four. Our method success-
                                                                    manag       0.0488       fully detects the four major research topics. Specifically, Topic
                                    michael stonebraker
                                                                                             1 containing retriev, inform, search, queri and web seems to cor-
           SIGMOD 0.420
           VLDB   0.391
                                       0.123
                                                                    algorithm 0.176
                                                                    process     0.097        respond to information retrieval, and Topic 2 containing mine,
                                                                                             pattern, cluster, graph and frequent correspond to data mining.
                            0.999   samuel madden           0.041 distanc       0.074
           PODS     0.103
                                       0.018
           EDBT     0.064                                           metric      0.0385

                                                                                             Topic3 contains words “learn, network, kernel, bayesian, reinforc”,
                                    lawrence a. rowe
           ICDT     0.020                                           control     0.0367
                                       0.014

                                                                                             which are typical words of machine learning. Topic4 is a topic of
                                                                                             database containing words “query, databas, optim, xml, manag”,
                                                                    queri       0.116
                                                                    evalu       0.051
                                                           0.040
                                                                    web
                                                                    integr
                                                                                0.039
                                                                                0.0369       which are popular topics on database researches.
                                                                    distribut   0.0348


                                                                                                   Table 3: Detected topics via CARNMF from DBLP.
                  (b) Communities of “michael Stonebraker”.

                                                                                                     Topic 1                Topic 2              Topic 3             Topic 4
                                                                    learn       0.113
                                                                    model       0.049
                                                                                              (Information Retrieval)    (Data Mining)      (Machine Learning)     (Database)
                                                                    network     0.042
                                                                                                 retriev    0.068          mine     0.076       learn   0.063      queri    0.046
                                                          0.064     bayesian 0.025
                                                                    decis    0.0234                 trec    0.043        pattern    0.038      model    0.036       data    0.043
                                    michael i. jordan
                                       0.216                                                     inform     0.032          data     0.037    network    0.018    databas    0.043
           NIPS    0.478
                                    satinder p. singh
                                                                    model       0.079              model    0.028        cluster    0.026   algorithm   0.016     optim     0.017
                                       0.020
           ICML    0.217
                            0.999   lawrence k. saul
                                                                    expert
                                                           0.044 glasgow
                                                                                0.057
                                                                                              document      0.027         graph     0.023       infer   0.015       xml     0.015
           UAI     0.210                                                        0.051
           COLT    0.078
                                       0.017
                                                                    terrier     0.042              track    0.024          base     0.021      kernel   0.015    manag      0.015
                                    xuanlong nguyen
                                                                    nonneg      0.037
           ECML 0.017
                                       0.011                                                     search     0.021       frequent    0.019    bayesian   0.014      effici   0.014
                                                                                                   queri    0.021       databas     0.019     process   0.013      base     0.013
                                    jiawei han
                                        0.011
                                                                    process     0.117
                                                                                                    text    0.019          effici   0.017    reinforc   0.012    system     0.012
                                                                    predict     0.080
                                                          0.042
                                                                    dynam       0.068               web     0.017           larg    0.016       decis   0.012     object    0.012
                                                                    state       0.048
                                                                    gener       0.030


                   (c) Communities of “michael i. jordan”.
                                                                                             5.4      Insights on Parameters
Figure 1: Example communities with attribute-value clus-                                     This section discusses the effect of parameter λt for each attribute.
ters. The red, blue and gray rectangles correspond to com-                                   The larger the λt value, the greater the influence of the attribute-
munities, term clusters, and conference clusters, respec-                                    value cluster for t ∈ T is on the community. Therefore, optimal
tively.                                                                                      parameter setting should result in better results. Figures 2 shows
                                                                                             the behavior of the accuracy with different values with respect
                                                                                             to different attributes. For each evaluation, λs (s , t) of the other
is defined as:                                                                               attributes were fixed. In most cases, the accuracy shows a convex
                                         Í                                                   form and the peak is around 10−2 . More importantly, the accuracy
                                            n ∈S δ (sn , map(r n ))
                    Accuracy =
                                                          |S|                                is insensitive to the setting, making tuning easier.
where | · | is the cardinality of a set; δ (x, y) is a delta function                        5.5      Convergence Analysis
which returns 1 if x = y, otherwise 0; and map(r n ) is a mapping
                                                                                             In this section, we experimentally provide convergence analysis
function that maps r n to the equivalent label in the dataset. The
                                                                                             to optimize the proposed loss function in Equation 4. Figures 3(a)
best mapping can be found by Kuhn-Munkres algorithm [13]. □
                                                                                             and (b) show the convergence curve of the loss function for DBLP
   Table 4 summarizes the evaluation results. The number of                                  and arXiv, respectively. In addition, the accuracy of each iteration
communities and the number of attribute-value clusters for each                              is plotted. The black line shows the value of the loss function.
attribute are each four. Each cell shows the mean and the standard                           The red, green, and blue lines show the accuracy of community
deviation of the accuracies for 20 trials. N/A denotes that the                              detection and attribute-value clustering for author, paper, and
method does not support the category. Values in bold indicate a                              conference/journal, respectively. As the number of iterations in-
significant improvement using the Student-t test, where p < 0.05.                            creases, the loss function decreases while the accuracy improves.


                                                                                         7
                                        Table 4: Quality evaluations of community detection and attribute clustering.

                                                                           DBLP dataset                                                                          arXiv dataset
                                                       Author                Paper                        Conference                  Author                        Paper                    Journal
                     NMF(A-T)                     64.02 ± 5.73                 N/A                           N/A                    32.75 ± 2.30                     N/A                     N/A
                     NMF(A-P)                     43.12 ± 5.17             44.58 ± 5.89                      N/A                    34.19 ± 3.55                 33.53 ± 1.93                N/A
                     NMF(A-C)                     75.35 ± 6.85                 N/A                       87.60 ± 1.73               69.05 ± 7.30                     N/A                 67.81 ± 4.09
                     NMF(T-P)                         N/A                  50.02 ± 7.93                      N/A                        N/A                      28.22 ± 0.90                N/A
                     NMF(T-C)                         N/A                      N/A                       69.88 ± 6.68                   N/A                          N/A                 69.06 ± 1.36
                      LCTA                        48.90 ± 7.57             26.13 ± 4.36                  68.50 ± 12.46              40.18 ± 4.31                 33.88 ± 1.82            61.56 ± 9.94
                       SCI                        54.78 ± 8.79             22.31 ± 1.48                  58.20 ± 7.40               38.68 ± 3.56                 35.36 ± 0.91            42.81 ± 3.58
                      HINMF                       68.90 ± 9.08             56.46 ± 3.08                  90.10 ± 12.63              65.41 ± 5.38                 33.24 ± 2.43            61.25 ± 7.81
                     CARNMF                       86.34 ± 2.39             78.19 ± 9.87                  97.20 ± 5.21               72.65 ± 8.11                 42.23 ± 3.18            71.25 ± 2.53

      Table 5: Quality evaluations of community detection and attribute clustering, changing attributes to be used.

                                                                              DBLP dataset                                                                             arXiv dataset
                                                              Author             Paper                         Conference                  Author                         Paper                Conference
                 CARNMF (T)                            43.29 ± 3.05               N/A                              N/A                 43.19 ± 6.44                        N/A                    N/A
                 CARNMF (P)                            46.84 ± 4.15           41.69 ± 5.97                         N/A                 33.23 ± 2.63                    33.49 ± 1.68               N/A
                 CARNMF (C)                            85.11 ± 2.57               N/A                          92.40 ± 8.50            67.41 ± 6.85                        N/A                69.69 ± 6.64
                CARNMF (T-P)                           43.28 ± 4.44           41.33 ± 3.88                         N/A                 35.32 ± 2.04                    35.79 ± 2.28               N/A
                CARNMF (T-C)                           83.67 ± 6.54               N/A                          95.20 ± 1.99            68.93 ± 7.40                        N/A                66.88 ± 4.88
                CARNMF (P-C)                           86.41 ± 1.93           73.30 ± 11.90                    94.10 ± 3.42            69.71 ± 8.85                    40.15 ± 3.13           68.75 ± 5.59
               CARNMF (T-P-C)                          86.34 ± 2.39           78.19 ± 9.87                     97.20 ± 5.21            72.65 ± 8.11                    42.23 ± 3.18           71.25 ± 2.53


                                100                                                             100                                                            100

                                 80                                                              80                                                             80
                  Accuracy(%)


                                                                                  Accuracy(%)


                                                                                                                                                 Accuracy(%)


                                 60                                                              60                                                             60

                                 40         NMF(A-T)                                             40                                                             40
                                            NMF(A-P)                                                       NMF(A-P)                                                        NMF(A-C)
                                 20         NMF(A-C)                                             20        NMF(T-P)                                             20         NMF(T-C)
                                            CARNMF                                                         CARNMF                                                          CARNMF
                                  0                                                               0                                                              0
                                 10−4    10−3   10−2   10−1    100   101    102                  10−4   10−3   10−2   10−1    100    101   102                  10−4    10−3   10−2   10−1   100   101   102
                                                        λt                                                             λt                                                              λt

                                          (a) DBLP: Author                                              (b) DBLP: Document                                             (c) DBLP: Conference

                                100                                                             100                                                            100
                                                                                                                                NMF(A-P)
                                 80                                                              80                             NMF(T-P)                        80
                  Accuracy(%)


                                                                                  Accuracy(%)


                                                                                                                                                 Accuracy(%)


                                                                                                                                CARNMF
                                 60                                                              60                                                             60

                                 40         NMF(A-T)                                             40                                                             40
                                            NMF(A-P)                                                                                                                       NMF(A-C)
                                 20         NMF(A-C)                                             20                                                             20         NMF(T-C)
                                            CARNMF                                                                                                                         CARNMF
                                  0                                                               0                                                              0
                                 10−4    10−3   10−2   10−1    100   101    102                  10−4   10−3   10−2   10−1    100    101   102                  10−4    10−3   10−2   10−1   100   101   102
                                                        λt                                                             λt                                                              λt

                                          (d) arXiv: Author                                             (e) arXiv: Document                                              (f) arXiv: Journal


                                                                       Figure 2: Accuracy for different λt values.


5.6   Efficiency Analysis                                                                                                       Moreover, we examine the running time of our method by
This section analyzes computational efficiency in terms of the                                                               changing the number of nodes in an input graph. Theoretically,
numbers of communities and attribute clusters. When the num-                                                                 as discussed in Section 4.4, the computational complexity is de-
bers are fixed to four as experiments above, the running times of                                                            pendent on the number of vertices, that of edges, and that of
CARNMF on the DBLP (arXiv) dataset are 1.186 ± 0.253s (0.682                                                                 distinct values of each attribute. As most of real-world graphs
± 0.138s). When changing the numbers of communities and term                                                                 are modeled as scale-free networks, edges in a graph are very
clusters to 50, while those of paper and conference remain four,                                                             sparse, therefore, we examine the sensitivity of processing time
the running times increases to 7.471 ± 0.563s (DBLP) and 6.526                                                               on the proposed method in terms of the number of nodes. In this
± 0.172s (arXiv). These values are still reasonable for various                                                              experiment, we selected all of the papers on DBLP, and construct
applications.                                                                                                                the multi-attributed graph as same manner as described in Sec-
                                                                                                                             tion 5.1. We set the number of communities and clusters are four.


                                                                                                                      8
                                                                                                                                                                                                           REFERENCES
                                                                                                                                                                                                            [1] Edoardo M Airoldi, David M Blei, Stephen E Fienberg, and Eric P Xing. 2008.
                          0.058                                                               100                                        0.044                                        100
                                                                       Loss function value                                                                     Loss function value
                          0.056
                                                                                              80                                         0.042                                        80
                                                                                                                                                                                                                Mixed membership stochastic blockmodels. Journal of Machine Learning
                                                                                                                                                                                                                Research 9, Sep (2008), 1981–2014.
    Loss function value


                                                                                                                   Loss function value
                                                                                                    Accuracy (%)


                                                                                                                                                                                            Accuracy (%)
                                                                                                                                                                                                            [2] Martin Atzmueller, Stephan Doerfel, and Folke Mitzlaff. 2016. Description-
                          0.054
                                                                                                                                         0.040
                                                                                              60                                                                                      60
                          0.052
                                                                                                                                         0.038
                                                                                                                                                                                                                oriented community detection using exhaustive subgroup discovery. Informa-
                                                                              Author          40                                                                                      40                        tion Sciences 329 (2016), 965–984.
                                                                                                                                                                                                            [3] Michele Berlingerio, Michele Coscia, and Fosca Giannotti. 2011. Finding and
                          0.050
                                                                              Paper                                                      0.036                            Author

                                                                                                                                                                                                                characterizing communities in multidimensional networks. In Advances in
                                                                              Conference      20                                                                          Paper       20
                          0.048
                                                                                                                                                                          Journal
                                                                                                                                         0.034
                                                                                                                                                                                                                Social Networks Analysis and Mining (ASONAM), 2011 International Conference
                                                                                                                                                                                                                on. IEEE, 490–494.
                          0.046                                                                0                                                                                       0
                               0                      50         100          150            200                                             0   20    40       60       80          100

                                                                                                                                                                                                            [4] Brigitte Boden, Stephan Günnemann, Holger Hoffmann, and Thomas Seidl.
                                                            # of iteration                                                                            # of iteration

                                                            (a) DBLP                                                                                  (b) arXiv                                                 2012. Mining coherent subgraphs in multi-layer graphs with edge labels. In
                                                                                                                                                                                                                Proceedings of the 18th ACM SIGKDD international conference on Knowledge
                                                                                                                                                                                                                discovery and data mining. ACM, 1258–1266.
Figure 3: Convergence analysis of the proposed algorithm                                                                                                                                                    [5] Cecile Bothorel, Juan David Cruz, Matteo Magnani, and Barbora Micenkova.
                                                                                                                                                                                                                2015. Clustering attributed graphs: models, measures and methods. Network
to optimize a loss function and the corresponding accu-                                                                                                                                                         Science 3, 3 (2015), 408–444.
racy curve.                                                                                                                                                                                                 [6] Santo Fortunato. 2010. Community detection in graphs. Physics reports 486, 3
                                                                                                                                                                                                                (2010), 75–174.
                                                                                                                                                                                                            [7] Mario Frank, Andreas P Streich, David Basin, and Joachim M Buhmann. 2012.
                                                                                                                                                                                                                Multi-assignment clustering for boolean data. Journal of Machine Learning
                                                      10

                                                       8                                                                                                                                                        Research 13, Feb (2012), 459–489.
                                   running time (s)


                                                                                                                                                                                                            [8] Jing Gao, Wei Fan, Yizhou Sun, and Jiawei Han. 2009. Heterogeneous source
                                                       6
                                                                                                                                                                                                                consensus learning via decision propagation and negotiation. In Proceedings
                                                       4                                                                                                                                                        of the 15th ACM SIGKDD international conference on Knowledge discovery and
                                                                                                                                                                                                                data mining. ACM, 339–348.
                                                       2                                                                                                                                                    [9] Michelle Girvan and Mark EJ Newman. 2002. Community structure in social
                                                       0
                                                                                                                                                                                                                and biological networks. Proceedings of the national academy of sciences 99, 12
                                                        0      10000 20000 30000 40000 50000 60000 70000 80000 90000                                                                                            (2002), 7821–7826.
                                                                                             # of nodes
                                                                                                                                                                                                           [10] Junzo Kamahara, Tomofumi Asakawa, Shinji Shimojo, and Hideo Miyahara.
                                                                                                                                                                                                                2005. A community-based recommendation system to reveal unexpected
                                                                                                                                                                                                                interests. In Multimedia Modelling Conference, 2005. MMM 2005. Proceedings of
Figure 4: Time complexity of CARNMF w.r.t. the number                                                                                                                                                           the 11th International. IEEE, 433–438.
of input nodes                                                                                                                                                                                             [11] Denise B Kandel. 1978. Homophily, selection, and socialization in adolescent
                                                                                                                                                                                                                friendships. American journal of Sociology 84, 2 (1978), 427–436.
                                                                                                                                                                                                           [12] Da Kuang, Chris Ding, and Haesun Park. 2012. Symmetric nonnegative matrix
                                                                                                                                                                                                                factorization for graph clustering. In Proceedings of the 2012 SIAM International
Figure 4 shows that the time complexity of our method is almost                                                                                                                                                 Conference on Data Mining. SIAM, 106–117.
linear to the number of nodes. From the figure, we ensure that                                                                                                                                             [13] Harold W Kuhn. 1955. The Hungarian method for the assignment problem.
the time complexity of our method is linear to the numbers of                                                                                                                                                   Naval research logistics quarterly 2, 1-2 (1955), 83–97.
                                                                                                                                                                                                           [14] Daniel D Lee and H Sebastian Seung. 1999. Learning the parts of objects by
nodes and edges (as shown on Equation21). Therefore, when the                                                                                                                                                   non-negative matrix factorization. Nature 401, 6755, 788–791.
input graph is sparse, our method is highly efficient.                                                                                                                                                     [15] Daniel D Lee and H Sebastian Seung. 2001. Algorithms for non-negative
                                                                                                                                                                                                                matrix factorization. In Advances in neural information processing systems.
                                                                                                                                                                                                                556–562.
6                          CONCLUSION                                                                                                                                                                      [16] Jialu Liu and Jiawei Han. 2013. HINMF: A Matrix Factorization Method for
In this paper we have proposed CAR-clustering, which includes                                                                                                                                                   Clustering in Heterogeneous Information Networks. In Proceedings of the
                                                                                                                                                                                                                international joint conference on artificial intelligence workshop.
community detection, attribute-value clustering, and extraction                                                                                                                                            [17] Peter V Marsden. 1988. Homogeneity in confiding relations. Social networks
of their relationships, for clustering over multi-attributed graphs.                                                                                                                                            10, 1 (1988), 57–76.
                                                                                                                                                                                                           [18] Ioannis Psorakis, Stephen Roberts, Mark Ebden, and Ben Sheldon. 2011. Over-
We have also proposed a novel algorithm CARNMF based on NMF.                                                                                                                                                    lapping community detection using bayesian non-negative matrix factoriza-
CARNMF employs a unified loss function to simultaneously solve                                                                                                                                                  tion. Physical Review E 83, 6 (2011), 066114.
different NMFs. This approach is better than the state-of-the-art                                                                                                                                          [19] Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. 2004.
                                                                                                                                                                                                                The author-topic model for authors and documents. In Proceedings of the 20th
methods in that it can exploit the correlation between communi-                                                                                                                                                 conference on Uncertainty in artificial intelligence. AUAI Press, 487–494.
ties and attribute-value clusters for enhancing the quality of the                                                                                                                                         [20] Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation.
result. Our experiments have demonstrated that CARNMF suc-                                                                                                                                                      IEEE Transactions on pattern analysis and machine intelligence 22, 8 (2000),
                                                                                                                                                                                                                888–905.
cessfully achieves CAR-clustering. CARNMF has detected reason-                                                                                                                                             [21] Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of
able communities with meaningful semantic descriptions via the                                                                                                                                                  heterogeneous information networks with star network schema. In Proceedings
                                                                                                                                                                                                                of the 15th ACM SIGKDD international conference on Knowledge discovery and
relationship between communities and attribute-value clusters                                                                                                                                                   data mining. ACM, 797–806.
for real-world datasets. These results are useful for many appli-                                                                                                                                          [22] Lei Tang and Huan Liu. 2010. Community detection and mining in social
cations such as node property estimations [7, 9, 24], community-                                                                                                                                                media. Synthesis Lectures on Data Mining and Knowledge Discovery 2, 1 (2010),
                                                                                                                                                                                                                1–137.
wise information recommendations [10], and semantic reasoning                                                                                                                                              [23] Xiao Wang, Di Jin, Xiaochun Cao, Liang Yang, and Weixiong Zhang. 2016.
for nodes/edges [1]. Additionally, CARNMF has achieved higher                                                                                                                                                   Semantic community identification in large attribute networks. In AAAI. 265–
accuracy than comparative methods, including a baseline and                                                                                                                                                     271.
                                                                                                                                                                                                           [24] Jaewon Yang and Jure Leskovec. 2013. Overlapping community detection at
the state-of-the-art methods. Our future work includes several                                                                                                                                                  scale: a nonnegative matrix factorization approach. In Proceedings of the sixth
directions. First, we will extend the proposed method for chrono-                                                                                                                                               ACM international conference on Web search and data mining. ACM, 587–596.
                                                                                                                                                                                                           [25] Jaewon Yang, Julian McAuley, and Jure Leskovec. 2013. Community detection
logical analysis over temporal multi-attributed graphs. Second,                                                                                                                                                 in networks with node attributes. In 2013 IEEE 13th International Conference
we plan to automate the parameter tuning (e.g., the numbers of                                                                                                                                                  on Data Mining. IEEE, 1151–1156.
communities/clusters, λt , etc.).                                                                                                                                                                          [26] Zhijun Yin, Liangliang Cao, Quanquan Gu, and Jiawei Han. 2012. Latent
                                                                                                                                                                                                                community topic analysis: integration of community discovery with topic
                                                                                                                                                                                                                modeling. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 4
ACKNOWLEDGMENT                                                                                                                                                                                                  (2012), 63.
                                                                                                                                                                                                           [27] Haizheng Zhang, Baojun Qiu, C Lee Giles, Henry C Foley, and John Yen. 2007.
This research was partly supported by Japan Agency for Medical                                                                                                                                                  An LDA-based Community Structure Discovery Approach for Large-Scale
Research and Development (AMED).                                                                                                                                                                                Social Networks. ISI 200 (2007).


                                                                                                                                                                                            9