Spectral Co-Clustering for Dynamic Bipartite Graphs

Spectral Co-Clustering for Dynamic Bipartite Graphs DerekGreene derek.greene@ucd.ie School of Computer Science & Informatics University College Dublin PádraigCunningham padraig.cunningham@ucd.ie School of Computer Science & Informatics University College Dublin Spectral Co-Clustering for Dynamic Bipartite Graphs 255DD0B60537E84DE41A1AB949212C61 GROBID - A machine learning software for extracting information from scholarly documents

A common task in many domains with a temporal aspect involves identifying and tracking clusters over time. Often dynamic data will have a feature-based representation. In some cases, a direct mapping will exist for both objects and features over time. But in many scenarios, smaller subsets of objects or features alone will persist across successive time periods. To address this issue, we propose a dynamic spectral co-clustering method for simultaneously clustering objects and features over time, as represented by successive bipartite graphs. We evaluate the method on a benchmark text corpus and Web 2.0 bookmarking data.

Introduction

In many domains, where the data has a temporal aspect, it will be useful to analyse the formation and evolution of patterns in the data over time. For instance, researchers may be interested in tracking evolving communities of social network users, such as clusters of frequently interacting authors in the blogosphere, or circles of users with shared interests on social media sites. In the case of online news sources, producing large volumes of articles on a daily basis, it will often be useful to chart the development of individual news stories over time.

For many of these problems it may be of interest to simultaneously identify clusters of both data objects and features. This task, often referred to as coclustering, has been formulated as the problem of partitioning a bipartite graph, where the two types of nodes correspond to objects and features [1]. However, to the best of our knowledge, this work has been limited to static applications, where temporal information is unavailable or has been disregarded.

A popular recent approach to the problem of clustering dynamic data has been to use an "offline" strategy, where the dynamic data is divided into discrete time steps. Sets of step clusters are identified on the individual time steps using a suitable clustering algorithm, and these step clusters are associated with one another over successive time steps [2]. However, clusters may change considerably between time steps. This can be problematic, both for the purpose of matching clusters between time steps, and for supporting users to follow and understand how groups are changing over time. To address this problem, both current and historic information can be incorporated into the objective of the Fig. 1. A dynamic co-clustering scenario where 2 clusters appear in 2 time steps. Note that a subset of both objects (bookmarks) and features (tags) persists across time.

clustering process [3]. Benefits of this approach include increasing the smoothness of transitions between clusterings over time, and improving cluster quality by incorporating historic information to reduce the effects of noisy data.

A number of additional considerations arise when tracking dynamic data represented in feature spaces. Notably, a set of objects or features will not always persist in the data across steps. In general, three different scenarios are possible:

1. Data objects alone persist across time steps. For instance, in bibliographic networks, papers are only published at a single point in time, whereas authors will generally be present in the network over an extended period of time. 2. Features alone persist across time. In a news collection, articles will appear once, whereas terms may continue to appear as topics extend over time. 3. Both objects and features persist across time. For example, in the case of Web 2.0 tagging portals, both the individual tags and the objects being tagged (e.g. bookmarks, images) will appear in multiple time steps. A simple example with just two clusters is shown in Figure 1.

Here we consider the problem of tracking nodes in multiple related dynamic bipartite graphs. In Section 3 we describe the main contribution of this papera dynamic spectral co-clustering algorithm for simultaneously grouping objects and features over time, in any of the above scenarios. This algorithm takes into account both information from the current time step, together with historic information from the previous step. In our evaluations in Section 4 we show that the proposed algorithm works both in the case where features alone persist over time, and when objects and features persist. These evaluations are performed on a labelled benchmark news corpus and Web 2.0 tagging data.

Related Work

Co-clustering

In certain problems it may be useful to perform co-clustering, where both objects and features are assigned to groups simultaneously. One approach to the co-clustering problem is to view it as the task of partitioning a weighted bipartite graph. Dhillon [1] proposed a spectral approach to approximate the optimal normalised cut of a bipartite graph, which was applied for document clustering. This involved computing a truncated singular value decomposition (SVD) of a suitably normalised term-document matrix, constructing an embedding of both terms and documents, and applying k-means to this embedding to produce a simultaneous k-way partitioning of both documents and terms. Mirzal & Furukawa [4] provided a further theoretical grounding for spectral co-clustering, demonstrating that simultaneous row and column clustering is equivalent to solving the separate row and column clustering problems.

Dynamic Clustering

The general problem of identifying clusters in dynamic data has been studied by a number of authors. Early work on the unsupervised analysis of temporal data focused on the problems of topic tracking and event detection in document collections [5]. More recently, Chakrabarti et al. [3] proposed a general framework for "evolutionary clustering", where both current and historic information was incorporated into the objective of the clustering process. The authors used this to formulate dynamic variants of common agglomerative and partitional clustering algorithms. In the latter case, related clusters were tracked over time by matching similar centroids across time steps. Two evolutionary versions of spectral partitioning for classical (unipartite) graphs were proposed by Chi et al. [6]. The first version (PCQ) involved applying spectral clustering to produce a partition that also accurately clusters historic data. The second version (PCM) involved measuring historic quality based on the chi-square distance between current and previous partition memberships.

The application of dynamic clustering methods has been particularly prevalent in the realm of social network analysis, where the goal is to identify communities of users in dynamic networks. Palla et al. [7] proposed an extension of the popular CFinder algorithm to identify community-centric evolution events in dynamic graphs, based on an offline strategy. This extension involved applying community detection to composite graphs constructed from pairs of consecutive time step graphs. Another life-cycle model was proposed in [2], where the dynamic community finding approach was formulated as a graph colouring problem. The authors proposed a heuristic solution to this problem, by greedily matching pairs of node sets between time steps. The problem of clustering data over time has also been considered in the temporal analysis domain. Kalnis et al. [8] described a density-based clustering approach where clusters persist over time, despite continuous changes in cluster memberships. This corresponds closely to the "assembly line" dynamic clustering scenario described in [2].

Methods

Problem Definition

We represent a dynamic feature-based dataset as a set of l bipartite graphs {G 1 , . . . , G l }. Each step graph G t consists of two sets of nodes, representing the n t data objects, and m t features present in the data at time t. Edges exist only between nodes of different types, corresponding to non-zero feature values. We can conveniently represent each step graph using a feature-object matrix A t of size m t × n t .

In the offline formulation of the dynamic co-clustering problem, the overall goal is to identify a set of dynamic clusters of objects and features, which appear in the data across one or more time steps. We refer to step clusters that are identified on individual step graphs, which represent specific observations of dynamic clusters at a given point in time. The formulation therefore has two key requirements: a suitable clustering algorithm to cluster individual time step graphs (ideally in a way that incorporates historic information), and an approach to track these clusters across time steps. While our primary focus here is on the former aspect, in Section 3.3 we also briefly discuss the latter aspect.

Dynamic Spectral Co-clustering

We now introduce a dynamic co-clustering algorithm that considers both historic information from the previous time step, and the internal quality of the clustering in the current time step. The algorithm consists of three phases: bipartite spectral embedding, cluster initialisation, and a cluster assignment phase. Spectral embedding. Following normalised cut optimisation via spectral coclustering described in [1], for a given time step feature-object matrix A t , we construct the degree-normalised matrix Ât = D 1

− 1 2 A t D 2 − 1 2

, where D 1 and D 2 are diagonal column and row degree matrices respectively. We then apply SVD to Ât , computing the leading left and right singular vectors corresponding to the largest singular values. Following the choice made by many authors in the spectral clustering literature, we use k t dimensions corresponding to the expected number of clusters. Although the issue of selecting the number of clusters is not discussed in this paper, one potential approach is to choose k t based on the eigengap method [9]. The truncated SVD yields matrices U kt and V kt . A unified embedding of size (m t + n t ) × k t is constructed by normalising and stacking the truncated factors as follows:

Z t = D 1 −1/2 U kt D 2 −1/2 V kt(1)

Prior to clustering, the rows of Z t are subsequently re-normalised to have unit length, as proposed for spectral partitioning in [9]. This process provides us with a k t -dimensional embedding of all nodes of both types in G t .

Cluster initialisation. At time t = 1, we have no historic information. Therefore to seed the clustering process, we use a variant of orthogonal initialisation as proposed by Ng et al. [9] for spectral graph partitioning. This operates using a "farthest-first" strategy as follows. The first cluster centroid is chosen to be the mean vector of the rows in Z t . We then repeatedly select the next centroid to be the row in Z t that is closest to being 90 • from those that have been previously selected. This process continues until k t centroids have been chosen.

For each time step t > 1, we initialise using clusters from the previous time step. A simple approach is to map the clusters generated on the embedding for time t − 1 to Z t . However, as noted previously, not all features and objects will persist between time steps. To produce an initial clustering at time t, we identify the intersection of the sets of nodes present in the graphs G t−1 and G t . The clusters containing these are mapped to the embedding Z t , and we compute the resulting centroids. If less than k t centroids are produced, the remaining centroids are chosen from the rows of Z t using orthogonal selection as above. We can then predict memberships for each unassigned row z i of Z t , using a simple nearest centroid classifier to maximise the similarity:

max C∈Ct z T i µ c (2)

where µ c is the centroid of cluster C c . This classification procedure yields a predicted clustering for rows in Z t (i.e. a co-clustering of all objects and features present at time t), which we denote P t .

Cluster assignment. To recover a clustering from Z t , we apply a constrained version of k-means clustering to the rows of the embedding, which takes into account both the internal quality of the current partition, and agreement with the predicted partition P t . We distinguish the latter from the membership preservation objective described in [6] -here we use predicted memberships for missing objects and features missing from the previous step.

As a measure of current cluster quality, we use vector-centroid similarities as in Eqn. 2. Historical quality is calculated based on the quantity pred(P t , C t ), which denotes the degree to which the predicted cluster assignments in P t agree with those in the current clustering C t . To quantify this agreement, we use a variant of pairwise prediction strength [10]:

pred(P t , C t ) = C∈Ct 1 |C| (|C| − 1) (zi,zj )∈C co(z i , z j )(3)

where co(z i , z j ) = 1 if both rows were predicted to be coassigned in P t , or c(z i , z j ) = 0 otherwise.

To combine both sources of information, the clustering objective then becomes a weighted combination of two objectives:

J(C t ) = (1 − α) • k c=1 zi∈Cc z T i µ c + α • (pred(P t , C t ))(4)

This type of aggregation approach has been widely used for combining sources of information, such as in dynamic clustering [3] and semi-supervised learning [11].

1. Build spectral embedding -Construct the normalised feature-object matrix Ât = D1 − 1 2 AD2 − 1 2 . -Compute Zt from the truncated SVD of Ât according to Eqn. 1.

-Normalise the rows of Zt to unit length.

Initialisation and prediction

-If t = 1, apply orthogonal initialisation to select a set of kt representative centroids from the representations of the objects in the embedded space. -For t > 1, recompute the kt−1 centroids based on last clustering but including only the embedding of the relevant set of objects/features in the current space.

-If not all rows of the embedding have been assigned, apply nearest centroid classification to compute the predicted clustering Pt.

Compute clustering

-Apply constrained k-means to rows in Zt, initialised by centroids from Pt.

Fig. 2. Dynamic spectral co-clustering process, as applied for each time step t.

The parameter α ∈ [0, 1] controls the balance between the influence of historical information and the information present in the current spectral embedding. A higher value of α allows information from the previous time step to have a greater influence, yielding a smoother transition between clusterings at successive time steps. Naturally at time t = 1, the right-hand term in Eqn. 4 will be zero. Eqn. 4 can be viewed as the standard spherical k-means objective [12], augmented by a constraint reward term. We can find a local solution for this problem by using an approach analogous to the semi-supervised PCKMeans algorithm proposed by Basu et al. [11] for clustering with pairwise constraints. Specifically, we apply an iterative k-means-like assignment process, re-assigning each row vector z i from Z t to maximise:

max C∈Ct (1 − α) • z T i µ c + α • pred(z i , C)(5)

where the quantity pred(z i , C) represents the degree to which the predicted assignment for the row z i in P t agrees with the assignment of z i to cluster C. This is given by the proportion of rows in C that were co-assigned with z i in P t :

pred(z i , C) = 1 |C| (|C| − 1) (zi,zj )∈C co(z i , z j )(6)

Once the algorithm has converged to a local solution, C t provides us with a k-way partitioning of all nodes in the graph G t (i.e. features and objects). An overview of the complete co-clustering process is shown in Figure 2.

Tracking Clusters Over Time

In the previous section we proposed an approach for co-clustering individual time step graphs. The second aspect of the offline approach to dynamic clustering involves identifying dynamic clusters composed from clusters associated across time steps. We suggest that previous frameworks for tracking evolving dynamic communities [2,13] can be readily adapted to the dynamic bipartite case. In brief, we construct a set of dynamic cluster timelines, each consisting of a set of clusters identified at different time steps and ordered by time. At each step in the dynamic co-clustering process, we match the predicted clusters (corresponding to clusters from the previous time step) with the actual output of the co-clustering process outlined in Figure 2. Matches are made based on the step cluster memberships for subsets of objects and/or features persisting between pairs of consecutive steps. This matching process will result in a set of dynamic clusters persisting across multiple steps.

Evaluation

Benchmark Evaluation

To evaluate the performance of the algorithm proposed in Section 3.2, we required an annotated dataset with temporal information. For this purpose we consider the bipartite document clustering problem, and use a subset of the widely-used Reuters RCV1 corpus [14]. The RCV1-5topic dataset1 consists of 10,116 news articles covering a seven month period. Each article is annotated with a single ground truth topical label: health, religion, science, sport, weather. These topics are present across the entire time period of the corpus. We considered a number of different time step durations to split the seven month period -one month, a fortnight, and one week -yielding 7, 14, and 28 step graphs respectively. Naturally for this type of data, a subset of features (terms) will persist across time, while objects (documents) appear in only one time step. Our evaluations focused on the performance of the dynamic spectral coclustering algorithm on each time step graph in the RCV1-5topic dataset, using a range of values α ∈ [0.1, 0.5] for the balance parameter. As a baseline competitor, we used multi-partition spectral co-clustering as proposed by Dhillon [1]. To provide a fair comparison, we use orthogonal initialisation for both algorithms, and set the number of clusters k t at time t to the number of ground truth topics.

Temporal smoothness. One of the primary motivations for dynamic coclustering is to increase smoothness in the transitions between time step clusterings. To quantify the degree to which the proposed algorithm can enforce temporal smoothness, we measure the agreement between successive clusterings in terms of their normalised mutual information (NMI) [15]. Note that NMI values were calculated only over the terms common to each pair of consecutive time steps -documents are not considered as they do not persist.

Figure 3 shows a comparison of agreement values for the three different time window sizes. Dynamic co-clustering leads to a higher level of agreement than standard spectral co-clustering for all three time window sizes. The effect becomes significantly more pronounced as α increases. This is to be expected, as increasing the parameter leads to a higher weighting for the historic information in Eqn. 4. For α ≥ 0.5, the resulting co-clusterings are often almost identical to the predicted co-clustering P t , with the constrained k-means process converging to a solution after 2-5 iterations.

Clustering accuracy. To quantify algorithm accuracy, we calculated the NMI between clusterings and the relevant annotated document label information for each time step. Figure 4 illustrates a comparison of the accuracy achieved by traditional spectral co-clustering and dynamic co-clustering on the RCV1-5topic dataset for the three different time step sizes. We observed that, for monthly and fortnightly time steps, the accuracy achieved by dynamic co-clustering was not significantly higher. However, for the weekly case, there was a noticeable increase in accuracy. In the case of α = 0.5, dynamic co-clustering lead to higher accuracy on 21 out of 28 of the weekly graphs. These results could appear surprising given the increases in temporal smoothness demonstrated Figure 3. However, on closer inspection, it is apparent that there is a strong concept drift effect in the data, as the composition of topics changes over seven months. Therefore, for longer time periods, there is a greater change in the clusters identified in successive time periods. In such cases we expect historic information to be less useful. For the shorter weekly time windows, where there is less scope for drift between steps, we expect the use of historic information to improve accuracy. These results highlight the importance of selecting an appropriate time step size for offline dynamic clustering.

Evaluation on Web 2.0 Data

For the second phase of our evaluation, we applied the proposed co-clustering algorithm to a Web 2.0 data exploration problem. Unlike the RCV1 data, subsets of both objects (bookmarks) and features (tags) persist over time. We use a subset of the most recent data from a collection harvested by Görlitz et al. [16] from the Del.icio.us web bookmarking portal. The subset covers the 2000 top tags and 5000 top bookmarks across an eleven month period from January-November 2006. We divided this period into 44 weekly time steps, and for each time step we constructed a bipartite graph -the nodes represent tags and bookmarks, and the edges between them denote the number of times each bookmark was assigned a given tag during the time step. On average, each graph contained approximately 3750 bookmarks and 1760 tags. For each time step, we applied dynamic co-clustering for k t = 20 to identify high-level topical clusters. Figure 5 illustrates the agreement between both tag and bookmark clusterings identified by dynamic co-clustering for a balance parameter range α ∈ [0.1, 0.5]. As with the RCV1-topic data, an increase in the value of α leads to clusters that are considerably more similar to those produced in the previous step, yielding smoother transitions between both feature and object clusters across time. In the extreme case of α = 0.5, there is effectively no change between the predicted memberships and the final output of the co-clustering algorithm.

A number of authors (e.g. [17]) have suggested analysing the stability or "loyalty" of object member-cluster memberships across time. In the bipartite case, we can quantify this for both objects and features -we suggest the latter can be used to generate meaningful labels for dynamic clusters. For the tagged data, we score the fraction of time steps at which a tag is assigned to a given dynamic cluster. Over a sufficiently large number of steps, for each dynamic cluster we can produce a robust ranking of tags based on their respective membership stability scores. Examining the range of α parameters, we found the trade-off afforded by α = 0.1 lead to the most interpretable label sets. In Table 1 we show the resulting descriptive labels selected for the dynamic clusters that exhibited the highest average tag membership stability, together with a suggested topic based on the tags. These descriptions highlight a range of general areas of interest covering sites frequently bookmarked by users during 2006.

Conclusion

In this work, we have described a spectral co-clustering algorithm for simultaneously clustering both objects and features in dynamic feature-based data, represented as a sequence of bipartite graphs. The co-clustering algorithm incorporates both current and historic information into the clustering process. A key aspect of the approach is that it is applicable in domains where objects or features alone persist across time steps. In applications on both dynamic text and bookmark tagging data, the proposed approach was successful in identifying coherent clusters, while also ensuring a consistent transition between clusterings in successive time steps.

Fig. 3 .3Fig. 3. Comparison of agreement (in terms of NMI) between successive feature clusterings, generated by spectral co-clustering and dynamic co-clustering (α ∈ [0.1, 0.5]), on the RCV1-5topic dataset for monthly, fortnightly, and weekly time steps.

Fig. 4 .4Fig. 4. Comparison of accuracy (in terms of NMI) for document clusterings generated by spectral co-clustering and dynamic co-clustering (α ∈ [0.1, 0.5]), on the RCV1-5topic dataset for monthly, fortnightly, and weekly time steps.

Fig. 5 .5Fig. 5. Agreement between successive object and feature clusterings, identified by dynamic co-clustering (α ∈ [0.1, 0.5]), on the Del.icio.us dataset across 44 weeks.

Table 1 .1Top 10 tags for 5 most stable clusters (in terms of tag memberships over time) identified on the Del.icio.us dataset by dynamic co-clustering (α = 0.1).TopicTop 10 TagsITshortcuts, tweaks, opensource, security, troubleshooting, system,livecd, keyboard, sysadmin, sshEducationacademic, school, mathematics, education, spanish, grammar,elearning, learning, math, slangMusic & Videopodcasts, mp3blog, youtube, television, movie, bittorrent, divx,torrent, p2p, nprNews & Medianewspapers, culture, opinion, society, iraq, news, journalism, envi-ronment, activism, politicalWeb Browsingexplorer, thunderbird, browser, opera, firefox, extensions, mozilla,greasemonkey, computing, plugin

Datasets for this paper are available at http://mlg.ucd.ie/datasets/dynak.html

Acknowledgments. This work is supported by Science Foundation Ireland Grant No. 08/SRC/I140 (Clique: Graph & Network Analysis Cluster)

Co-clustering documents and words using bipartite spectral graph partitioning ISDhillon Proc null 2001 A framework for community identification in dynamic social networks CTantipathananandh TBerger-Wolf DKempe Proc. 13th International conference on Knowledge Discovery and Data mining (KDD '07) 13th International conference on Knowledge Discovery and Data mining (KDD '07) 2007 Evolutionary clustering DChakrabarti RKumar ATomkins Proc. 12th Int. Conf. on Knowledge Discovery and Data Mining 12th Int. Conf. on Knowledge Discovery and Data Mining 2006 Eigenvectors for clustering: Unipartite, bipartite, and directed graph cases AMirzal MFurukawa 2010 arXiv A study of retrospective and on-line event detection YYang TPierce JCarbonell Proc. 21st International ACM SIGIR Conference on Research and development in information retrieval 21st International ACM SIGIR Conference on Research and development in information retrieval 1998 Evolutionary spectral clustering by incorporating temporal smoothness YChi XSong DZhou KHino BTseng Proc. 13th SIGKDD Int. Conf. on Knowledge Discovery and Data Mining 13th SIGKDD Int. Conf. on Knowledge Discovery and Data Mining 2007 Quantifying social group evolution GPalla ABarabási TVicsek Nature 446 664 7136. 2007 On discovering moving clusters in spatiotemporal data PKalnis NMamoulis SBakiras Proc. SSTD 2005. 2005 On Spectral Clustering: Analysis and an Algorithm ANg MJordan YWeiss Advances in Neural Information Processing 14 2 2001 Cluster validation by prediction strength RTibshirani GWalther DBotstein PBrown 2001 Dept. Statistics, Stanford University Technical report Active semi-supervision for pairwise constrained clustering SBasu ABanerjee RMooney Proc. SIAM Int. Conf. on Data Mining SIAM Int. Conf. on Data Mining 2004 Concept decompositions for large sparse text data using clustering ISDhillon DSModha Machine Learning 42 1-2 January 2001 Tracking the evolution of communities in dynamic social networks DGreene DDoyle PCunningham Proc. International Conference on Advances in Social Networks Analysis and Mining (ASONAM'10) International Conference on Advances in Social Networks Analysis and Mining (ASONAM'10) 2010 RCV1: A New Benchmark Collection for Text Categorization Research DDLewis YYang TGRose FLi JMLR 5 2004 Cluster ensembles -a knowledge reuse framework for combining multiple partitions AStrehl JGhosh JMLR 3 December 2002 Pints: Peer-to-peer infrastructure for tagging systems OGörlitz SSizov SStaab Proc. 7th International Workshop on Peer-to-Peer Systems 7th International Workshop on Peer-to-Peer Systems 2008 A framework for analysis of dynamic social networks TBerger-Wolf JSaia Proc. 12th Int. Conf. on Knowledge Discovery and Data Mining 12th Int. Conf. on Knowledge Discovery and Data Mining 2006