Community Detection and Correlated Attribute Cluster Analysis on Multi-Attributed Graphs Hiroyoshi Ito† , Takahiro Komamizu‡ , Toshiyuki Amagasa‡ , and Hiroyuki Kitagawa‡ † Graduate School of Systems and Information Engineering, University of Tsukuba ‡ Center for Computational Sciences, University of Tsukuba hiro.3188@kde.cs.tsukuba.ac.jp,taka-coma@acm.org,{amagasa,kitagawa}@cs.tsukuba.ac.jp ABSTRACT However, community detection and extraction of semantics Multi-attributed graphs, in which each node is characterized by in multi-attributed graphs remain challenging due to difficul- multiple types of attributes, are ubiquitous in the real world. De- ties on integrating graph structures and multiple attributes of tection and characterization of communities of nodes could have different types. Community detection and extraction of seman- a significant impact on various applications. Although previous tics involves multiple steps. First, useful information from each studies have attempted to tackle this task, it is still challenging attribute must be extracted because certain node attributes de- due to difficulties in the integration of graph structures with mul- scribe different aspects. Second, all extracted information must tiple attributes and the presence of noises in the graphs. There- be exploited to enhance community detection by effectively in- fore, in this study, we have focused on clusters of attribute val- tegrating heterogeneous information. Notice that the previous ues and strong correlations between communities and attribute- works [19, 23, 25, 26] do not differentiate multiple attributes, that value clusters. The graph clustering methodology adopted in the is, they consider multiple attributes equally. Moreover, real-world proposed study involves Community detection, Attribute-value graphs are often incomplete and noisy. That is, some edges or clustering, and deriving Relationships between communities and nodes may be missing or attribute values may contain incorrect attribute-value clusters (CAR for short). Based on these concepts, values, leading to inappropriate results. the proposed multi-attributed graph clustering is modeled as To overcome these difficulties, we propose a novel clustering CAR-clustering. To achieve CAR-clustering, a novel algorithm scheme based on the following two assumptions: named CARNMF is developed based on non-negative matrix (1) Relevant attribute values form clusters by attribute type. This factorization (NMF) that can detect CAR in a cooperative man- is based on the observation that an attribute reflects a node’s ner. Results obtained from experiments using real-world datasets interests in a network. Hence, an attribute tends to be associated show that the CARNMF can detect communities and attribute- to a specific group of values related to an interest. For example, in value clusters more accurately than existing comparable methods. a co-author network where the nodes correspond to the authors Furthermore, clustering results obtained using the CARNMF indi- (researchers), each author typically has specific research interests cate that CARNMF can successfully detect informative communi- (e.g., AI, data mining, and database). Thus, attributes (e.g., paper ties with meaningful semantic descriptions through correlations title and conference) present biased values according to interests. between communities and attribute-value clusters. Consequently, it is possible to identify clusters of attributes values (attribute-value clusters) reflecting a node’s interests. (2) Communities are strongly correlated with attribute-value clus- 1 INTRODUCTION ters. This is related to the previous assumption. Consider the Community detection is a task to detect densely connected sub- example above. The nodes in a community share similar inter- graphs as communities. Nodes in a community tend to share same ests (e.g., research interests) and consequently, similar attribute- or similar properties, such phenomenon is called homophily ef- value clusters (e.g., research topics, and conferences). Conversely, fect [11, 17], meaning that nodes having similar properties tend if some nodes (researchers) have similar attribute values, they to link together. Because diverse applications are derived from should share similar interests and can be grouped in the same the nature of real communities, community detection is impor- community. tant in graph/network analyses. Examples include node property Exploiting the correlation between communities and multiple estimations [7, 9, 24], community-wise information recommen- attributes should improve the quality of community detection dations [10], and semantic reasoning for nodes/edges [1]. as well as attribute-value clustering. Using the information from Moreover, using the attributes in a graph is advantageous to different sources (attributes) to alleviate the effect of noise (e.g., realize high-quality community detection as well as to under- missing values and errors), we simultaneously implement com- stand the characteristics of communities. Multi-attributed graphs munity detection and attribute-value clustering. are reasonable models of real-world networks such as social net- Based on the aforementioned ideas, we study a novel clustering works, co-author networks, protein-protein interaction networks, scheme for multi-attributed graphs, called CAR-clustering. CAR etc. In fact, several works have proposed algorithms that employ includes Community detection, Attribute-value clustering, and attribute information (i.e., shared interests or functional behav- deriving Relationships between communities and attribute-value iors of each community) to detect not only communities but also clusters for multi-attributed graphs. Additionally, we develop a their semantic meanings [19, 23, 25, 26]. novel clustering algorithm called CARNMF, which employs a non-negative matrix factorization (NMF). © 2018 Copyright held by the owner/author(s). Published in the Workshop The contributions of this paper are summarized as follows: Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018, Vienna, Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted • We propose a novel clustering scheme CAR-clustering to under the terms of the Creative Commons license CC-by-nc-nd 4.0. address two technical questions. (i) Given a multi-attributed 2 graph, how can community detection and attribute-value individually, but solves community detection and attribute-value clustering be performed for different types of attributes in clustering in a unified manner. a cooperative manner? (ii) How should reasonable relation- ships be determined between communities and attribute- 3 PROBLEM STATEMENT value clusters for each type of attribute? In this work, we deal with multi-attributed graphs, where each • We develop a novel algorithm CARNMF, which achieves node is characterized by two or more attributes. Given such a CAR-clustering. Specifically, a dedicated loss function is graph, CAR-clustering is used to solve the following three sub- designed to perform multiple NMFs simultaneously. problems: community detection, attribute-value clustering, and • We conduct experiments using real-world datasets (DBLP derivation of relationships between communities and attribute- computer science bibliography and arXiv physics bibliog- value clusters, which have been independently studied. Below, raphy). The accuracy of CARNMF with respect to commu- we provide the formal definitions which are necessary to define nity detection and attribute-value clustering and a compar- the clustering scheme. ison to other methods are examined. Relative to compara- tive methods, CARNMF achieves a better accuracy of up to 3.1 Multi-Attributed Graph 11% for community detection and up to 22% for attribute- value clustering. Furthermore, CARNMF detects informa- Multi-attributed graph G is defined by extending weighted graph tive communities and their rich semantic descriptions by G ′ with several attributed graphs Gt for attribute t ∈ T. The correlating multiple types of attribute-value clusters. following are formal definitions. Definition 1 (Weighted graph). Weighted graph G ′ is de- 2 RELATED WORK fined by a triplet, ⟨V, E, W⟩, where V is a set of nodes, E(⊆ V × V) is a set of edges, and W : E → R+ is a map of edge weights. □ Community detection in graphs is a current topic of interest in graph analysis and AI research. Existing works for non-attributed Definition 2 (Attributed garaph). Attributed graph Gt = graphs can be categorized according to the techniques used: ⟨V ∪ Xt , Et , Wt ⟩ of attribute t ∈ T is a bipartite graph consisting graph separation [9, 20], probabilistic generative model [27], and of set V of nodes, set Xt of attribute-values, a set of edges Et ⊆ matrix factorization [12, 18, 24]. [9] defined modularity, which V × Xt , and Wt : Et → R+ is a map of edge weights. □ indicates how separated a community is from other nodes. More comprehensive surveys can be found in [6, 22]. Definition 3 (Multi-attributed graph). Given weighted Recently, several works have addressed the problem of de- graph G ′ = ⟨V, E, W⟩ and a set of attributed graphs {Gt }t ∈T tecting communities and their semantic descriptions on node- where Gt = ⟨V ∪ Xt , Et , Wt ⟩, multi-attributed graph attributed graphs. [25] proposed CESNA, where communities and G = ⟨G ′, {Gt }t ∈T ⟩ is a union of these graphs. □ their attributes are simultaneously detected in an efficient man- ner. [23] proposed SCI to detect communities and their semantics 3.2 CAR-clustering using NMF. [19] proposed a probabilistic generative model called Given a multi-attributed graph, information can be extracted the author-topic model to model communities and related top- from different perspectives. In this work, we extract communities, ics. [2] proposed COMODO to detect communities with shared attribute-value clusters, and the relationship between them. properties using subgroup discovery techniques. Likewise, [26] Community. For a multi-attributed graph, a set of nodes with proposed LCTA, where communities and their topics are mod- the following properties is regarded as a community. (1) Nodes in eled separately, and then their relationships are modeled using a community are densely connected with each other and sparsely a probabilistic generative model. A comprehensive survey over connected with other nodes. (2) Nodes in a community tend to these works can be found in [5]. share common values in distinct attributes. This study assumes The aforementioned works only consider single textual at- that communities can overlap. That is, each node belongs to tributes or uniformly handle multiple attributes without any dis- more than one community. This assumption is reasonable for tinction. In reality, each attribute represents different aspects of real applications. Formally, given the number of communities the nodes. In our research, we deal with heterogeneous attributes ℓ, node n ∈ V belonging to community c ∈ C is described by individually. In addition to community detection, we perform probability distribution p(n | c), where |C| = ℓ. clustering over attribute values for each attribute, which, in turn, Attribute-value cluster. For attribute t ∈ T in a multi-attributed can be used to improve the quality of communities detected. graph, similar or highly correlated attribute values can be grouped Some works have investigated clustering over networks con- into attribute-value clusters. Herein, we assume overlapping clus- taining different types of nodes and/or edges. [3] studied com- ters. That is, each attribute-value belongs to more than one cluster. munity detection with characterization from multidimensional Formally, given the number of clusters kt of attribute t ∈ T, clus- networks, which is defined as a graph consisting of a set of nodes ter member x ∈ Xt for attribute-value cluster st ∈ St is described and multiple types of edges. [4] studied subgraph detection from by probability distribution p(x | st ), where |St | = kt . multi-layer graphs with edge labels. In contrast, we assume a Relationship between a community and an attribute- different model where each node is characterized by multiple value cluster. Nodes in a community often share common attribute- attributes. [21] proposed a scheme of ranking-based clustering value clusters. Detecting such relationship is useful in many ap- for multi-typed heterogeneous networks, where two or more plications. Given community c ∈ C and attribute-value cluster types of nodes are included. Similarly, [16] proposed an NMF- st ∈ St of attribute t ∈ T, the probability that c is related to st based method for such networks. These methods differ from ours is described as the relationship between c and st . In this work, in that they define a cluster consisting of all types of nodes. In a community may be related to more than one attribute-value other words, these methods cannot handle each attribute in a cluster. Formally, this is described by probability distribution unique way. In contrast, our work deals with different attributes p(st | c). 3 CAR-clustering. CAR-clustering is formally defined by Def- a matrix V (t ) ∈ R |Xt |×kt , where each row and column corre- inition 4. spond to an attribute x ∈ Xt and an attribute cluster st ∈ St , (t ) respectively. A cell Vx,st represents probability p(x | st ). Definition 4 (CAR-clustering). Given a multi-attributed To derive V from X (t ) , we introduce a matrix U (t ) ∈ R |V|×kt , (t ) graph G, CAR-clustering is to perform community detection, attribute- which denotes the relationships between the nodes and attribute- value clustering, and detection of the relationship between the com- value clusters with probability p(u | st ). Using both matrices munities and the attribute-value clusters simultaneously.□ U (t ) and V (t ) , probability p(u, x | st ), which is the existence (t ) Solving these sub-problems simultaneously is more beneficial of edge eu,x ∈ Et in terms of attribute-value cluster st , is cal- than evaluating each one independently because, in many cases, (t ) (t ) culated as Uu,st Vx,st . Moreover, probability p(u, x) is derived communities and attribute-value clusters are mutually correlated. Í (t ) (t ) as st ∈St Uu,st Vx,st . Therefore, when U (t ) , V (t ) minimize loss Solving the problems simultaneously exploits this correlation, leading to improved results. function, U (t ) , V (t ) represent the best approximation of the edges in the graph. 4 CARNMF – ALGORITHM FOR 2 CAR-CLUSTERING arg min X (t ) − U (t ) (V (t ) )T (2) U (t ),V (t ) ≥0 F In this section, we propose an NMF (non-negative matrix factorization)- (t ) based algorithm, called CARNMF, for CAR-clustering. CARNMF s.t . ∀1 ≤ r ≤ kt , V·,r = 1 1 models communities and attribute-value clusters. Additionally, Loss function for relationship detection. In CARNMF, the we introduce an auxiliary matrix to maintain the relationship relationships between communities and attribute-value clusters between the communities and the attribute-value clusters. A of attribute t ∈ T are represented as a matrix R (t ) ∈ Rℓ×kt , where unified loss function is used to solve the different NMFs in a each row and column corresponds to a community c ∈ C and an unified manner. It is assumed that the user gives the number ℓ attribute-value cluster st ∈ St , respectively. The cell contains the of communities and the number kt of clusters for each attribute t ∈ T. probability p(st | c). We assume R (t ) is a linear transformation that maps U ∗ into U (t ) , where U ∗ and U (t ) derived by Equation 1 and Equation 2, respectively. Therefore, when R (t ) minimizes 4.1 Matrix representation the loss function, R (t ) represents the relationships between the We represent a multi-attributed graph by two sorts of matri- communities and the attribute-value clusters. ces: an adjacency matrix A ∈ R |V |× |V| and attribute matrices X (t ) ∈ R |V|× |Xt | for t ∈ T. An element Au,v of A corresponds Í arg min U (t ) − U ∗ R (t ) 2 (3) to an edge eu,v = (u, v) ∈ E. Au,v = W(eu,v )/ ei, j ∈E W(ei, j ), F (t ) ∗ U ,U , R ≥0 (t ) indicating the joint probability for the presence of edge eu,v . (t ) (t ) Similarly, for t ∈ T, an element Xu,x in X (t ) corresponds to an s.t . ∀1 ≤ p ≤ ℓ, U ·,p ∗ = 1, Rp, · = 1 (t ) Í 1 1 (t ) (t ) (t ) edge eu,x ∈ Et . Xu,x = Wt (eu,x )/ v,y ∈Et Wt (ev,y ), indicating Equation 3 can be regarded as an NMF that decomposes the ma- the joint probability of the presence of edge eu,x . (t ) trix of the node-by-attribute value cluster into node-by-community and community-by-attribute value cluster matrices. In other 4.2 Loss Function words, Equation. 3 indicates the effect of the relationship be- tween nodes and attribute-value clusters against communities. We achieve CAR-clustering in terms of several NMFs, which cor- Unified loss function. To achieve CAR-clustering, the afore- respond to the aforementioned sub-problems. To achieve CAR- mentioned three sub-problems must be solved. In this work, we clustering, we introduce loss functions for the sub-problems fol- attempt to solve them simultaneously by introducing a unified lowed by a unified loss function. loss function, which is expressed as Loss function for community detection. In CARNMF, com- 2 munities C are denoted by a matrix U ∗ ∈ R |V |×ℓ , where each row L= arg min A − U ∗ (U ∗ )T and column correspond to a node u ∈ V and a community c ∈ C, U ∗, {U (t ), V (t ), R (t ) } t ∈T F respectively. A cell Uu,c represents probability p(u | c). In proba- ∗ Õ   2 2 (t ) (t ) T bility p(u, v | c), u and v are connected through community c, and + X (t ) − U (V ) + λ t U (t ) − U ∗ (t ) R (4) F F is represented by Uu,c ∗ U ∗ . Moreover, joint probability p(u, v), Í v,c t ∈T or the existence of edge eu,v ∈ E, is expressed as c ∈C Uu,c ∗ U∗ . v,c s.t . ∀1 ≤ r ≤ kt , ∀1 ≤ p ≤ ℓ, ∀t ∈ T, Therefore, when U minimizes the following loss function, U ∗ is ∗ (t ) (t ) ∗ ∥U ·,p ∥1 = 1, ∥V·,r ∥1 = 1, ∥Rp, · ∥1 = 1 the best approximation of the edges in the graph. where λt for attribute t ∈ T is a user-defined parameter to control the effect of attribute-value clusters for community detection. 2 arg min A − U ∗ (U ∗ )T (1) Higher λt yields a stronger effect of the attribute-value clusters U ∗ ≥0 F in community detection. ∗ s.t . ∀1 ≤ c ≤ ℓ, U ·,c 1 = 1 4.3 Optimization where ∥·∥F2 and ∥·∥1 represents the Frobenius norm and the ℓ1 Similar to the ordinary NMF, the loss function in Equation 4 is norm, respectively. not simultaneously convex for all variables. Hence, we consider Loss function for attribute-value clustering. In CARNMF, the NMF to be a Frobenius norm optimization, where update attribute-value clusters St of attribute t ∈ T are represented as equations are derived based on [14]. 4 Considering the Karush-Kuhn-Tucker (KKT) first-order condi- Algorithm 1 Optimization Algorithm tions applied to our problem, we derive: Input: A, {X (t ) }t ∈T , {λt }t ∈T , δ Output: U ∗ , {U (t ) , V (t ) , R (t ) }t ∈T U ∗ ≥ 0, U (t ) ≥ 0, V (t ) ≥ 0, R (t ) ≥ 0 (5) 1: U ∗ , {U (t ) , V (t ) , R (t ) }t ∈T ← random non-negative init 2: ϵ ′ ← maxFloat, ϵ ← 2 ϵ′ ∇U ∗ L ≥ 0, ∇U (t ) L ≥ 0, ∇V (t ) L ≥ 0, ∇R (t ) L ≥ 0 (6) 3: while abs(ϵ ′ − ϵ) ≥ δ do ∗ U ⊙ ∇U ∗ L = 0, U (t ) ⊙ ∇U (t ) L = 0, Í AT U ∗ + λ U (t ) (R (t ) )T 4: U ∗ ← U ∗ ⊙ 2U ∗ (U ∗ )T U ∗ + Í t ∗ (t ) (t ) T t ∈T U R (R ) V (t ) ⊙ ∇V (t ) L = 0, R (t ) ⊙ ∇R (t ) L = 0 (7) t ∈T 5: U ∗ ← U ∗ (Q ∗ )−1 where ⊙ is the element-wise product. From the Karush-Kuhn- 6: for t ∈ T do Tucker (KKT) conditions, we derive derivatives corresponding to (t ) (t ) ∗ (t ) 7: U (t ) ← U (t ) ⊙ U (tX) (VV(t ) )T+λ tU R V (t ) +λ U (t ) the variables: t (X (t ) )T U (t ) 8: V (t ) ← V (t ) ⊙ (V (t ) )T (U (t ) )T U (t ) U (t ) ← U (t )Q (t )   −1 ∇U ∗ L = − 2AT RU ∗ + 2U ∗ (U ∗ )T U ∗ 9: Õ + λt (−U (t ) (R (t ) )T + U ∗ R (t ) (R (t ) )T ) (8) 10: V (t ) ← V (t ) Q (t ) (U ∗ )T U (t ) t ∈T 11: R (t ) ← R (t ) ⊙ (U ∗ )T U ∗ R (t ) ∇U (t ) L = − X (t ) (t ) V + U (t ) (V (t ) )T V (t )   −1 12: R (t ) ← R (t ) Q R + λt (U (t ) − U ∗ R (t ) ) (9) 13: end for ∇V (t ) L = − (X (t ) T ) U (t ) + (V (t ) T ) (U (t ) T ) U (t ) (10) 14: ϵ ′ ← ϵ  ∗ T ∗ T 15: ϵ ← L U ∗ , {U (t ) , V (t ) , R (t ) }t ∈T ∇Ri L = − (U ) U (t ) + (U ) U R ∗ (t ) (11) 16: end while By substituting the corresponding gradients in Equation 4, we derive the following update rules: 4.4 Complexity Analysis Í AT U ∗ + t ∈T λt U (t ) (R (t ) )T Here, we analyze the computational complexity of the proposed Í (12) ∗ ∗ U ←U ⊙ 2U ∗ (U ∗ )T U ∗ + t ∈T U ∗ R (t ) (R (t ) )T algorithm. The equations in our algorithm have the following complexities: U (t ) ← U (t ) ⊙ X (t )V (t ) + λt U ∗ R (t ) (13) Í U (t ) (V (t ) )T V (t ) + λt U (t ) • Updating U ∗ (Eqs. 12 and 16) needs O(|E|ℓ + |V|ℓ 2 t kt ). • Updating U (Eqs. 13 and 18) and (t )  V (Eqs. 14 and 18) (t ) (X (t ) )T U (t ) V (t ) ← V (t ) ⊙ (14) needs O (|V| + |Xt |)kt2 + |Et |kt . (V (t ) )T (U (t ) )T U (t )  • Updating R (t ) (Eqs. 15 and 19) needs O |V|(ℓkt + ℓ 2 ) . (U ∗ )T U (t ) In summary, the time complexity of our algorithm is follows, R (t ) ← R (t ) ⊙ (15) (U ∗ )T U ∗ R (t ) where iter is the number of outer iterations (lines 3–16 in our The aforementioned update rules monotonically decrease Equa- algorithm). ! tion 4. However, these variables may violate the probability defi- Õ  nition (i.e., their sum does not equal one). To satisfy constraints, 2 2 2 O iter |V|(ℓ kt + kt ) + |Xt |kt + |E|ℓ + |Et |kt (21) ∗ ∥ = 1, ∥V (t ) ∥ = 1 and ∥R (t ) ∥ = 1, the variables are ∥U ·,p t 1 ·,r 1 p, · 1 normalized immediately after updating. The normalization is expressed as 5 EXPERIMENTAL EVALUATIONS To demonstrate the applicability and effectiveness of CARNMF, we conducted a set of experiments using real-world datasets. U ∗ ← U ∗ (Q ∗ )−1 (16) U (t ) ← U (t )Q (t ) (18) Specifically, the performance of the proposed scheme was com- R −1 pared to simple baseline and state-of-the-art methods. V (t ) ←V (t ) (Q (t ) −1 ) (17) R (t ) ←R (t ) (Q ) (19) The experiments were performed on a PC with an Intel Core i7 (3.3 GHz) CPU with 16 GB RAM running Ubuntu14.04. CARNMF where Q ∗ = Diaдonalize(U ∗ ), Q (t ) = Diaдonize(V (t ) ), and Q R (t ) = was implemented by Python 2.7.6 with Numpy 1.9.0. Diaдinalize(R (t ) ). 5.1 Datasets ! Õ a Õ a We used two datasets: DBLP and arXiv. Diaдonalize(Z ∈ Ra×b ) = Diaд Z i,1 · · · , Z i,b (20) • DBLP: Digital Bibliography Project1 is a bibliographic data- i=1 i=1 base in the computer science area. DBLP contains publi- Diaд(·) provides a diagonal matrix where the diagonals are the cation information, such as authors and conferences. We input sequence. used a part of the dataset by extracting conferences simi- Algorithm 1 shows the optimization algorithm based on the lar to [8]. We extracted four research areas: data mining, aforementioned update rules. Matrix normalization is applied databases, machine learning, and information retrieval, after the updates. Without normalization, each matrix may have and five major conferences for each area. Consequently, significantly different values, leading to inconsistent results. Al- gorithm 1 describes the order of update rules and normalizations. 1 http://dblp.uni-trier.de/ 5 Table 1: Selected conferences on four research areas. Each rectangle shows the top contributing nodes in the com- DB DM ML IR munity/cluster, and the edge weights show the strength of the SIGMOD, VLDB KDD, ICDM NIPS, ICML SIGIR, ECIR relationship between the community and the corresponding clus- PODS, EDBT PKDD, SDM ECML, UAI JCDL, ECDL ter. We chose famous researchers in different research domains ICDT PAKDD COLT TREC (i.e., Jiawei Han, Michael Stonebraker, and Michael I. Jordan). Table 2: Selected journals on four research areas. Figure 1(a) show the community and the correlated attribute- math-ph value clusters of Jiawei Han, who is a leading researcher in data Communications in Mathematical Physics mining and database areas. The results show that (1) he collab- Reviews in Mathematical Physics orates with Chinese researchers, (2) he publishes many papers Letters in Mathematical Physics Journal of Mathematical Physics related to data mining and database conferences (i.e., KDD, ICDM, nucl-th SDM, PAKDD, and VLDB), and (3) his researches are highly cor- Annual Review of Nuclear and Particle Science related with topics in data mining, such as clustering and classifi- Progress in Particle and Nuclear Physics Atomic Data and Nuclear Data Tables cation on large graph. Journal of Nuclear Materials Similarly, Figure 1(b) shows the result for Michael Stonebraker, astro-ph a renowned database researcher. His community is strongly re- Astronomical Journal Annual Review of Astronomy and Astrophysics lated to conferences in databases (SIGMOD, VLDB, PODS, EDBT, New Astronomy Reviews and ICDT ). Topics such as view management, distance metric, and Space Science Review query evaluation are detected. Figure 1(c) shows the result for cond-mat Nature Nanotechnology Michael I. Jordan, an expert in machine learning research. This Smart Materials and Structures community is strongly related to the conferences of machine Semi Conductor Science learning, (NIPS, ICML, UAI, COLT, and ECML) and the topics like Journal of Materials Science learn network, expert model, and prediction. The detected communities and the associated attribute-value 10,491 papers in 20 conferences (shown in Table 1) were clusters seem to be reasonable. selected. • arXiv: arXiv2 is a repository of electronic preprints in 5.3 Accuracy Comparison various scientific fields. Similar to above, we chose four The proposed scheme is compared to a baseline method as well research areas: mathematical physics (math-ph), nuclear as state-of-the-art methods to quantitatively evaluate the perfor- (nucl-th), astrophysics (astro-ph), and materials (part of mance of community detection and attribute-value clustering. cond-mat), and four major journals for each area. Conse- The comparison methods include: quently, 12,497 papers in 16 journals (shown in Table 2) • NMF [15]: Baseline approaches that apply NMF for bi- were selected. nary relationships between graph components, including Multi-attributed graphs were constructed from the datasets as author-term (A-T), author-paper (A-P), author-conference follows: The nodes correspond to the authors. If two authors co- (A-C), term-paper (T-P), and term-conference (T-C)3 . author a paper, we placed a weighted edge between the authors. • LCTA [26]: A probabilistic generative model for commu- The weighting denotes the number of co-authored papers. Each nities, topics of textual attributes, and their relationships. author has attributes term, paper, and conference/journal, which • SCI [23]: An NMF based method for detecting commu- are defined below: nities as well as their semantic descriptions via node’s • term: Each term is regarded as a node. An edge is generated attribute values. between an author and a term if the author uses the term • HINMF [16]: A model that clusters objects and attributes in the titles of at least one paper. The edge weight denotes simultaneously and takes the consensus among the binary the term frequency for each author. As a preprocessing, NMFs. This work is the most similar to our proposal. we applied stop-word elimination and stemming. Note that, LCTA and SCI deal with a single concatenated • paper: Each paper is regarded as a node. An edge is gen- feature of multiple attributes. Therefore, we prepare concatenated erated if the author publishes the paper. The edge weight feature consisting of term, document and conference/journal, and is always 1.0 because each paper can only be published apply these approaches on the feature. once. To evaluate the qualities of these methods, we compared the • conference/journal: Each conference or journal corresponds accuracy [26] w.r.t. community and attribute-value clustering with a node. An edge is created between an author and w.r.t. paper and conference/journal. We designed a ground truth a conference/journal if the author publishes at least one to measure the accuracy. To derive the ground truth, each author paper at the conference/journal. The edge weight is the is labeled based on research areas of their papers, in other words, total number of publications at the conference/journal. if the author mostly published papers for the specific area, the author is labeled as that area. Similarly, the labels for confer- 5.2 Results of CAR-clustering ence/journal and paper were manually given by referring to the Figure 1 shows examples of the detected communities and their conference categories. associated attribute-value clusters in DBLP. The number of com- munities and the number of term clusters were each 50, whereas Definition 5 (Accuracy). Given a set S of elements, for each the number of conference clusters and the number paper clusters element n ∈ S, the true label and the cluster label generated by a were each 4. The red, blue and gray rectangles correspond to method are denoted by sn and r n , respectively. Then, the accuracy communities, term clusters, and conference clusters, respectively. 3 Because NMF assumes the co-occurrences of binary relationships, paper- 2 https://arxiv.org conference (one-to-one relationship) is excluded. 6 CARNMF achieved the best performance for community de- cluster 0.123 tection (author) and attribute-value clustering (paper and confer- ence/journal) with significant gaps for DBLP dataset (respectively data 0.063 base 0.056 11%, 22% and 7%) and for arXiv dataset (respectively 3%, 4% and effici 0.047 0.044 dimension 0.047 jiawei han 0.223 2%) relative to the comparative methods. In particular, CARNMF KDD ICDM 0.304 0.202 xifeng yan 0.049 hong cheng 0.026 larg classif 0.092 0.085 has an improved clustering quality compared to NMF by tak- SDM 0.096 0.999 ke wang 0.018 0.039 graph 0.079 ing the relationships between communities and attribute-value clusters into account. PAKDD 0.094 xiaoxin yin 0.011 scale 0.067 VLDB 0.073 feida zhu 0.010 attribut 0.051 Table 5 summarizes evaluation of effects from taking multiple chen chen 0.010 approach method 0.244 0.116 attributes into account. The table showcases results where differ- 0.037 support 0.097 ent combinations of attributes are used, e.g., “(T-C)” means term and conference attributes were used. This result shows that the structur 0.074 select 0.051 proposed method works the best when taking as many attributes (a) Communities of “jiawei han”. as possible. As expected, the basic tendency is that as the number of attributes increases, the accuracy increases. system view 0.104 0.062 Table 3 lists the detected topics from DBLP using CARNMF 0.046 updat mainten 0.061 0.0545 when the number of topics is set to four. Our method success- manag 0.0488 fully detects the four major research topics. Specifically, Topic michael stonebraker 1 containing retriev, inform, search, queri and web seems to cor- SIGMOD 0.420 VLDB 0.391 0.123 algorithm 0.176 process 0.097 respond to information retrieval, and Topic 2 containing mine, pattern, cluster, graph and frequent correspond to data mining. 0.999 samuel madden 0.041 distanc 0.074 PODS 0.103 0.018 EDBT 0.064 metric 0.0385 Topic3 contains words “learn, network, kernel, bayesian, reinforc”, lawrence a. rowe ICDT 0.020 control 0.0367 0.014 which are typical words of machine learning. Topic4 is a topic of database containing words “query, databas, optim, xml, manag”, queri 0.116 evalu 0.051 0.040 web integr 0.039 0.0369 which are popular topics on database researches. distribut 0.0348 Table 3: Detected topics via CARNMF from DBLP. (b) Communities of “michael Stonebraker”. Topic 1 Topic 2 Topic 3 Topic 4 learn 0.113 model 0.049 (Information Retrieval) (Data Mining) (Machine Learning) (Database) network 0.042 retriev 0.068 mine 0.076 learn 0.063 queri 0.046 0.064 bayesian 0.025 decis 0.0234 trec 0.043 pattern 0.038 model 0.036 data 0.043 michael i. jordan 0.216 inform 0.032 data 0.037 network 0.018 databas 0.043 NIPS 0.478 satinder p. singh model 0.079 model 0.028 cluster 0.026 algorithm 0.016 optim 0.017 0.020 ICML 0.217 0.999 lawrence k. saul expert 0.044 glasgow 0.057 document 0.027 graph 0.023 infer 0.015 xml 0.015 UAI 0.210 0.051 COLT 0.078 0.017 terrier 0.042 track 0.024 base 0.021 kernel 0.015 manag 0.015 xuanlong nguyen nonneg 0.037 ECML 0.017 0.011 search 0.021 frequent 0.019 bayesian 0.014 effici 0.014 queri 0.021 databas 0.019 process 0.013 base 0.013 jiawei han 0.011 process 0.117 text 0.019 effici 0.017 reinforc 0.012 system 0.012 predict 0.080 0.042 dynam 0.068 web 0.017 larg 0.016 decis 0.012 object 0.012 state 0.048 gener 0.030 (c) Communities of “michael i. jordan”. 5.4 Insights on Parameters Figure 1: Example communities with attribute-value clus- This section discusses the effect of parameter λt for each attribute. ters. The red, blue and gray rectangles correspond to com- The larger the λt value, the greater the influence of the attribute- munities, term clusters, and conference clusters, respec- value cluster for t ∈ T is on the community. Therefore, optimal tively. parameter setting should result in better results. Figures 2 shows the behavior of the accuracy with different values with respect to different attributes. For each evaluation, λs (s , t) of the other is defined as: attributes were fixed. In most cases, the accuracy shows a convex Í form and the peak is around 10−2 . More importantly, the accuracy n ∈S δ (sn , map(r n )) Accuracy = |S| is insensitive to the setting, making tuning easier. where | · | is the cardinality of a set; δ (x, y) is a delta function 5.5 Convergence Analysis which returns 1 if x = y, otherwise 0; and map(r n ) is a mapping In this section, we experimentally provide convergence analysis function that maps r n to the equivalent label in the dataset. The to optimize the proposed loss function in Equation 4. Figures 3(a) best mapping can be found by Kuhn-Munkres algorithm [13]. □ and (b) show the convergence curve of the loss function for DBLP Table 4 summarizes the evaluation results. The number of and arXiv, respectively. In addition, the accuracy of each iteration communities and the number of attribute-value clusters for each is plotted. The black line shows the value of the loss function. attribute are each four. Each cell shows the mean and the standard The red, green, and blue lines show the accuracy of community deviation of the accuracies for 20 trials. N/A denotes that the detection and attribute-value clustering for author, paper, and method does not support the category. Values in bold indicate a conference/journal, respectively. As the number of iterations in- significant improvement using the Student-t test, where p < 0.05. creases, the loss function decreases while the accuracy improves. 7 Table 4: Quality evaluations of community detection and attribute clustering. DBLP dataset arXiv dataset Author Paper Conference Author Paper Journal NMF(A-T) 64.02 ± 5.73 N/A N/A 32.75 ± 2.30 N/A N/A NMF(A-P) 43.12 ± 5.17 44.58 ± 5.89 N/A 34.19 ± 3.55 33.53 ± 1.93 N/A NMF(A-C) 75.35 ± 6.85 N/A 87.60 ± 1.73 69.05 ± 7.30 N/A 67.81 ± 4.09 NMF(T-P) N/A 50.02 ± 7.93 N/A N/A 28.22 ± 0.90 N/A NMF(T-C) N/A N/A 69.88 ± 6.68 N/A N/A 69.06 ± 1.36 LCTA 48.90 ± 7.57 26.13 ± 4.36 68.50 ± 12.46 40.18 ± 4.31 33.88 ± 1.82 61.56 ± 9.94 SCI 54.78 ± 8.79 22.31 ± 1.48 58.20 ± 7.40 38.68 ± 3.56 35.36 ± 0.91 42.81 ± 3.58 HINMF 68.90 ± 9.08 56.46 ± 3.08 90.10 ± 12.63 65.41 ± 5.38 33.24 ± 2.43 61.25 ± 7.81 CARNMF 86.34 ± 2.39 78.19 ± 9.87 97.20 ± 5.21 72.65 ± 8.11 42.23 ± 3.18 71.25 ± 2.53 Table 5: Quality evaluations of community detection and attribute clustering, changing attributes to be used. DBLP dataset arXiv dataset Author Paper Conference Author Paper Conference CARNMF (T) 43.29 ± 3.05 N/A N/A 43.19 ± 6.44 N/A N/A CARNMF (P) 46.84 ± 4.15 41.69 ± 5.97 N/A 33.23 ± 2.63 33.49 ± 1.68 N/A CARNMF (C) 85.11 ± 2.57 N/A 92.40 ± 8.50 67.41 ± 6.85 N/A 69.69 ± 6.64 CARNMF (T-P) 43.28 ± 4.44 41.33 ± 3.88 N/A 35.32 ± 2.04 35.79 ± 2.28 N/A CARNMF (T-C) 83.67 ± 6.54 N/A 95.20 ± 1.99 68.93 ± 7.40 N/A 66.88 ± 4.88 CARNMF (P-C) 86.41 ± 1.93 73.30 ± 11.90 94.10 ± 3.42 69.71 ± 8.85 40.15 ± 3.13 68.75 ± 5.59 CARNMF (T-P-C) 86.34 ± 2.39 78.19 ± 9.87 97.20 ± 5.21 72.65 ± 8.11 42.23 ± 3.18 71.25 ± 2.53 100 100 100 80 80 80 Accuracy(%) Accuracy(%) Accuracy(%) 60 60 60 40 NMF(A-T) 40 40 NMF(A-P) NMF(A-P) NMF(A-C) 20 NMF(A-C) 20 NMF(T-P) 20 NMF(T-C) CARNMF CARNMF CARNMF 0 0 0 10−4 10−3 10−2 10−1 100 101 102 10−4 10−3 10−2 10−1 100 101 102 10−4 10−3 10−2 10−1 100 101 102 λt λt λt (a) DBLP: Author (b) DBLP: Document (c) DBLP: Conference 100 100 100 NMF(A-P) 80 80 NMF(T-P) 80 Accuracy(%) Accuracy(%) Accuracy(%) CARNMF 60 60 60 40 NMF(A-T) 40 40 NMF(A-P) NMF(A-C) 20 NMF(A-C) 20 20 NMF(T-C) CARNMF CARNMF 0 0 0 10−4 10−3 10−2 10−1 100 101 102 10−4 10−3 10−2 10−1 100 101 102 10−4 10−3 10−2 10−1 100 101 102 λt λt λt (d) arXiv: Author (e) arXiv: Document (f) arXiv: Journal Figure 2: Accuracy for different λt values. 5.6 Efficiency Analysis Moreover, we examine the running time of our method by This section analyzes computational efficiency in terms of the changing the number of nodes in an input graph. Theoretically, numbers of communities and attribute clusters. When the num- as discussed in Section 4.4, the computational complexity is de- bers are fixed to four as experiments above, the running times of pendent on the number of vertices, that of edges, and that of CARNMF on the DBLP (arXiv) dataset are 1.186 ± 0.253s (0.682 distinct values of each attribute. As most of real-world graphs ± 0.138s). When changing the numbers of communities and term are modeled as scale-free networks, edges in a graph are very clusters to 50, while those of paper and conference remain four, sparse, therefore, we examine the sensitivity of processing time the running times increases to 7.471 ± 0.563s (DBLP) and 6.526 on the proposed method in terms of the number of nodes. In this ± 0.172s (arXiv). These values are still reasonable for various experiment, we selected all of the papers on DBLP, and construct applications. the multi-attributed graph as same manner as described in Sec- tion 5.1. We set the number of communities and clusters are four. 8 REFERENCES [1] Edoardo M Airoldi, David M Blei, Stephen E Fienberg, and Eric P Xing. 2008. 0.058 100 0.044 100 Loss function value Loss function value 0.056 80 0.042 80 Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9, Sep (2008), 1981–2014. Loss function value Loss function value Accuracy (%) Accuracy (%) [2] Martin Atzmueller, Stephan Doerfel, and Folke Mitzlaff. 2016. Description- 0.054 0.040 60 60 0.052 0.038 oriented community detection using exhaustive subgroup discovery. Informa- Author 40 40 tion Sciences 329 (2016), 965–984. [3] Michele Berlingerio, Michele Coscia, and Fosca Giannotti. 2011. Finding and 0.050 Paper 0.036 Author characterizing communities in multidimensional networks. In Advances in Conference 20 Paper 20 0.048 Journal 0.034 Social Networks Analysis and Mining (ASONAM), 2011 International Conference on. IEEE, 490–494. 0.046 0 0 0 50 100 150 200 0 20 40 60 80 100 [4] Brigitte Boden, Stephan Günnemann, Holger Hoffmann, and Thomas Seidl. # of iteration # of iteration (a) DBLP (b) arXiv 2012. Mining coherent subgraphs in multi-layer graphs with edge labels. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1258–1266. Figure 3: Convergence analysis of the proposed algorithm [5] Cecile Bothorel, Juan David Cruz, Matteo Magnani, and Barbora Micenkova. 2015. Clustering attributed graphs: models, measures and methods. Network to optimize a loss function and the corresponding accu- Science 3, 3 (2015), 408–444. racy curve. [6] Santo Fortunato. 2010. Community detection in graphs. Physics reports 486, 3 (2010), 75–174. [7] Mario Frank, Andreas P Streich, David Basin, and Joachim M Buhmann. 2012. Multi-assignment clustering for boolean data. Journal of Machine Learning 10 8 Research 13, Feb (2012), 459–489. running time (s) [8] Jing Gao, Wei Fan, Yizhou Sun, and Jiawei Han. 2009. Heterogeneous source 6 consensus learning via decision propagation and negotiation. In Proceedings 4 of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 339–348. 2 [9] Michelle Girvan and Mark EJ Newman. 2002. Community structure in social 0 and biological networks. Proceedings of the national academy of sciences 99, 12 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 (2002), 7821–7826. # of nodes [10] Junzo Kamahara, Tomofumi Asakawa, Shinji Shimojo, and Hideo Miyahara. 2005. A community-based recommendation system to reveal unexpected interests. In Multimedia Modelling Conference, 2005. MMM 2005. Proceedings of Figure 4: Time complexity of CARNMF w.r.t. the number the 11th International. IEEE, 433–438. of input nodes [11] Denise B Kandel. 1978. Homophily, selection, and socialization in adolescent friendships. American journal of Sociology 84, 2 (1978), 427–436. [12] Da Kuang, Chris Ding, and Haesun Park. 2012. Symmetric nonnegative matrix factorization for graph clustering. In Proceedings of the 2012 SIAM International Figure 4 shows that the time complexity of our method is almost Conference on Data Mining. SIAM, 106–117. linear to the number of nodes. From the figure, we ensure that [13] Harold W Kuhn. 1955. The Hungarian method for the assignment problem. the time complexity of our method is linear to the numbers of Naval research logistics quarterly 2, 1-2 (1955), 83–97. [14] Daniel D Lee and H Sebastian Seung. 1999. Learning the parts of objects by nodes and edges (as shown on Equation21). Therefore, when the non-negative matrix factorization. Nature 401, 6755, 788–791. input graph is sparse, our method is highly efficient. [15] Daniel D Lee and H Sebastian Seung. 2001. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems. 556–562. 6 CONCLUSION [16] Jialu Liu and Jiawei Han. 2013. HINMF: A Matrix Factorization Method for In this paper we have proposed CAR-clustering, which includes Clustering in Heterogeneous Information Networks. In Proceedings of the international joint conference on artificial intelligence workshop. community detection, attribute-value clustering, and extraction [17] Peter V Marsden. 1988. Homogeneity in confiding relations. Social networks of their relationships, for clustering over multi-attributed graphs. 10, 1 (1988), 57–76. [18] Ioannis Psorakis, Stephen Roberts, Mark Ebden, and Ben Sheldon. 2011. Over- We have also proposed a novel algorithm CARNMF based on NMF. lapping community detection using bayesian non-negative matrix factoriza- CARNMF employs a unified loss function to simultaneously solve tion. Physical Review E 83, 6 (2011), 066114. different NMFs. This approach is better than the state-of-the-art [19] Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. 2004. The author-topic model for authors and documents. In Proceedings of the 20th methods in that it can exploit the correlation between communi- conference on Uncertainty in artificial intelligence. AUAI Press, 487–494. ties and attribute-value clusters for enhancing the quality of the [20] Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. result. Our experiments have demonstrated that CARNMF suc- IEEE Transactions on pattern analysis and machine intelligence 22, 8 (2000), 888–905. cessfully achieves CAR-clustering. CARNMF has detected reason- [21] Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of able communities with meaningful semantic descriptions via the heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and relationship between communities and attribute-value clusters data mining. ACM, 797–806. for real-world datasets. These results are useful for many appli- [22] Lei Tang and Huan Liu. 2010. Community detection and mining in social cations such as node property estimations [7, 9, 24], community- media. Synthesis Lectures on Data Mining and Knowledge Discovery 2, 1 (2010), 1–137. wise information recommendations [10], and semantic reasoning [23] Xiao Wang, Di Jin, Xiaochun Cao, Liang Yang, and Weixiong Zhang. 2016. for nodes/edges [1]. Additionally, CARNMF has achieved higher Semantic community identification in large attribute networks. In AAAI. 265– accuracy than comparative methods, including a baseline and 271. [24] Jaewon Yang and Jure Leskovec. 2013. Overlapping community detection at the state-of-the-art methods. Our future work includes several scale: a nonnegative matrix factorization approach. In Proceedings of the sixth directions. First, we will extend the proposed method for chrono- ACM international conference on Web search and data mining. ACM, 587–596. [25] Jaewon Yang, Julian McAuley, and Jure Leskovec. 2013. Community detection logical analysis over temporal multi-attributed graphs. Second, in networks with node attributes. In 2013 IEEE 13th International Conference we plan to automate the parameter tuning (e.g., the numbers of on Data Mining. IEEE, 1151–1156. communities/clusters, λt , etc.). [26] Zhijun Yin, Liangliang Cao, Quanquan Gu, and Jiawei Han. 2012. Latent community topic analysis: integration of community discovery with topic modeling. ACM Transactions on Intelligent Systems and Technology (TIST) 3, 4 ACKNOWLEDGMENT (2012), 63. [27] Haizheng Zhang, Baojun Qiu, C Lee Giles, Henry C Foley, and John Yen. 2007. This research was partly supported by Japan Agency for Medical An LDA-based Community Structure Discovery Approach for Large-Scale Research and Development (AMED). Social Networks. ISI 200 (2007). 9