Automatic Time-Series Clustering via Network Inference Kohei Obata1 , Yasuko Matsubara1 , Koki Kawabata1 and Yasushi Sakurai1 1 SANKEN, Osaka University Abstract Given a collection of multidimensional time-series that contains an unknown type and number of network structures between variables, how efficiently can we find typical patterns and their points of variation? How can we interpret important relationships with obtained patterns? In this paper, we propose a new method of model-based clustering, which we call network clustering via graphical lasso (NGL). Our method has the following properties: (a) Interpretable: it provides interpretable network structures and cluster assignments for the data; (b) Automatic: it determines the optimal cut points and the number of clusters automatically; (c) Accurate: it provides reliable clustering performance thanks to the automated algorithm. We evaluate our NGL algorithm on both real and synthetic datasets, obtaining interpretable network structure results and outperforming state-of-the-art baselines in terms of accuracy. Keywords time-series, network structure, graphical lasso, 1. Introduction in most cases, we do not know the optimal number of clusters in advance. Many applications generate time-series data including In this paper, we propose an automatic algorithm, those used in automobiles [1], biology, social networks called network clustering via graphical lasso (NGL), and in relation to financial data. In most cases, these which enables us to summarize multidimensional time- data are multidimensional, and it is important to find series into meaningful patterns efficiently based on the typical patterns, which have a specific network structure. graphical lasso problem. Intuitively, the problem we wish In practice, real-life data have multiple distinct patterns, to solve is as follows. which differentiate their network structures. For exam- InformalProblem 1. Given a large collection of mul- ple, automobile sensor data from a driving session can be tidimensional time-series data with underlying network composed of some basic actions and some abrupt actions structures 𝑋, Find a compact description of 𝑋, which (i.e., going straight, turning right, turning left, slowing consists of: down, sudden braking, sudden turning). The network structure is equal to the graph structure. In this case, sen- 1. a set of segments and their cut points sors can be represented as nodes, and sensor interactions 2. a set of segment groups (i.e., clusters) of similar can be represented as edges. For a turning action, lateral network structures acceleration and steering angle may have an edge and for a braking action, brake pedal stroke and longitudinal 3. the optimal number of clusters acceleration may have an edge. In this paper, we focus on finding a network structure Contrast with Competitors. We will compare NGL automatically from multidimensional time-series data. with existing methods from the viewpoint of network Understanding the structure of these networks is useful inference. Network estimation with time-series informa- because it allows us to devise models of sensor interac- tion has been studied as a method for analyzing eco- tion, which can be used to analyze such behaviours as nomic data and biological signal data because of the fossil-efficient driving. However, there are many network high interpretability of its graphical model [2]. Graphical structures in the data, which change over time, and it is lasso [3, 4] is a network estimation method that provides difficult to find a meaningful segmentation point since no an interpretable sparse inverse covariance matrix due to one knows how data change. Moreover, the number of the β„“1 -norm. Time varying graphical lasso (TVGL) [5] clusters should be selected automatically to find abrupt is a network estimation method that takes time informa- changes or for an extension to online learning, because tion into account. Although this method can find change points by comparing the network structure before and VLDB’22, Sep 05–9, 2022, Sydney, Australia $ obata88@sanken.oska-u.ac.jp (K. Obata); after a change, it can’t find clusters. Toeplitz inverse yasuko@sanken.osaka-u.ac.jp (Y. Matsubara); covariance-based clustering (TICC) [6] and time adaptive koki@sanken.osaka-u.ac.jp (K. Kawabata); Gaussian model (TAGM) [7] are clustering methods based yasushi@sanken.osaka-u.ac.jp (Y. Sakurai) on network structure. TICC uses Markov random fields Β© 2022 Proceedings of the VLDB 2022 PhD Workshop, September 5, 2022. Sydney, Australia. Copyright (C) 2022 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). (MRF) and Toeplitz matrices to capture the inherent rela- CEUR CEUR Workshop Proceedings (CEUR-WS.org) Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 tionships among variables. TAGM is a fusion of a hidden Markov model (HMM) and a Gaussian mixture model for π‘‹π‘˜ in the π‘˜-th cluster, the full parameter set that we (GMM). These methods find clusters depending on the want to estimate consists of 𝑀 = {𝑀1 , 𝑀2 , ..., 𝑀𝐾 } network structure of each subsequence. This provides and the number of clusters 𝐾. clusters with interpretability and allows us to discover patterns that other traditional clustering methods are 2.2. Graphical lasso problem unable to find. Both incorporate a graphical lasso to cap- ture the interaction between variables but require the We first consider the static inference that estimates πœƒπ‘– . number of clusters to be specified as prior information. The optimization problem is written as follows: Consequently, only our approach satisfies the need for interpretability and find the optimal number of clusters minimizeπœƒπ‘– βˆˆπ‘† 𝑝 βˆ’ 𝑙𝑙(π‘₯𝑖 , πœƒπ‘– ) + πœ†||πœƒπ‘– ||π‘œπ‘‘,1 , ++ automatically. 𝑙𝑙(π‘₯𝑖 , πœƒπ‘– ) = |π‘₯𝑖 |(log det πœƒπ‘– βˆ’ 𝑇 π‘Ÿ(𝑆𝑖 πœƒπ‘– )), Contributions. The main contribution of this work is where 𝑆 is the empirical covariance the concept and design of NGL, which has the following βˆ‘οΈ€ 𝑖𝑖 | (1/|π‘₯𝑖 |) |π‘₯ π‘₯𝑗 π‘₯𝑇𝑗 , π‘₯𝑗 are the different samples, desirable properties: 𝑗=1 𝑝 𝑆++ is the space of the positive definite matrices, 1. Interpretable: NGL provides the underlying πœ† determines the sparsity level of the network, and graphical structures and cluster assignments in β€– Β· β€–π‘œπ‘‘,1 is the off-diagonal β„“1 -norm. This is a convex data, which help us to interpret important rela- optimization problem, which imposes the β„“1 -norm tionships between variables. restriction. 2. Automatic: We formulate data encoding schemes to reveal distinct patterns/clusters, each 2.3. TVGL problem of which captures the network structure. The To infer a time-varying sequence of networks, TVGL [5] proposed method requires no parameter tuning extends the above approach, which is designed to infer a to specify the number of clusters. set of inverse covariance matrices Θ. TVGL solves the problem below: 3. Accurate: NGL is a simple yet powerful algo- rithm for time-series segmentation using a graphi- 𝑇 βˆ‘οΈ 𝑇 βˆ‘οΈ cal lasso, which outperforms state-of-the-art com- minimizeπœƒπ‘– βˆˆπ‘†++ 𝑝 βˆ’π‘™π‘™(π‘₯𝑖 , πœƒπ‘– ) + πœ†||πœƒπ‘– ||π‘œπ‘‘,1 + 𝛽 πœ“(πœƒπ‘– βˆ’ πœƒπ‘–βˆ’1 ), petitors in terms of accuracy. 𝑖=1 𝑖=2 where 𝛽 determines how strongly correlated neighboring 2. Preliminary covariance estimations should be. The penalty function πœ“ encourages similarity between πœƒπ‘– and πœƒπ‘–βˆ’1 . Different In this paper we investigate an automatic network struc- types of πœ“ allow us to enforce different restrictions in the ture clustering for large multidimensional time-series time-varying similarity. This problem is solved by em- data. We will describe a few key concepts and back- ploying the alternating direction method of multipliers ground materials in this section. (ADMM) [8], which is a well-established method for solv- ing the convex optimization problem. For more details, please see, e.g., [5]. Although TVGL can find a changing 2.1. Problem definition point by comparing πœƒπ‘– and πœƒπ‘–βˆ’1 , it cannot find a cluster Consider a set of 𝑝-dimensional time-series data con- simultaneously. Throughout this paper our method uses sisting of 𝑇 sequential bundle observations, 𝑋 = TVGL to optimize the graphical lasso problem. {π‘₯1 , π‘₯2 , ..., π‘₯𝑇 } and there are |π‘₯𝑖 | β‰₯ 1 different observa- tions at each time 𝑖. π‘₯𝑖 ∈ R𝑝 is the 𝑖-th multidimensional bundle observation, and each bundle observation vector 3. Algorithms is sampled from any 𝐾 multivariate normal distributions The previous section described how to estimate a set of π‘₯𝑖 ∼ 𝑁 (0, Ξ£π‘˜ ). The network structure is equal to the inverse covariance matrices Θ. Now the questions are graph structure, in which a given π‘₯𝑖 ∼ 𝑁 (0, Ξ£π‘˜ ), each (a) how to describe the model, (b) how to find optimal variable represents a node, and the covariance matrix cut points, and (c) how to assign segments to optimal Ξ£π‘˜ forms an edge. Our goal is to find the cluster assign- clusters. There are three main ideas behind our model: ments of these 𝑇 bundle observations into 𝐾 clusters β„± = {β„±1 , β„±2 , ..., ℱ𝐾 }, where β„±π‘˜ is a cluster assign- 1. Model description cost: We use the minimum ment set of π‘‹π‘˜ βŠ‚ 𝑋 (𝑠.𝑑. π‘˜ = 1, 2, ..., 𝐾) represented description length (MDL) principle as a model by a set of 𝐾 matrices, i.e., Θ = {πœƒ1 , πœƒ2 , ..., πœƒπΎ }. There- selection criterion for choosing between alterna- fore, letting π‘€π‘˜ = {πœƒπ‘˜ , β„±π‘˜ } be a compact description tive segmentation and cluster descriptions. We propose a novel cost function to estimate the de- 3.2.1. MergeSegment (inner loop) scription cost of the graphical lasso model. Assuming that neighboring segments tend to belong to 2. CutPointSearch: We modify the generic bottom- the same cluster, we update cut points through Merge- up algorithm [9] to enhance its ability to han- Segment. We consider given cut points 𝑐𝑝 = {𝑐0 , 𝑐1 , dle time-series data. Initially we adopt short seg- ..., π‘π‘š }, and a set of inverse covariance matrices at each ments and iteratively merge with an adjacent pair segment Ξ˜π‘† = {πœƒ,𝑐0 , πœƒπ‘0 ,𝑐1 , ..., πœƒπ‘π‘š , }, where the num- that satisfies the cost restriction. ber of segments is π‘š + 1. And the set of inverse covari- ance matrices at each segment, which consists of only 3. NGL: We use the EM algorithm to cluster the even/odd-numbered cut points Θ𝐸 = {πœƒ,𝑐0 , πœƒπ‘0 ,𝑐2 , ...} segments obtained by CutPointSearch while de- and Ξ˜π‘‚ = {πœƒ,𝑐1 , πœƒπ‘1 ,𝑐3 , ...}. 𝑋𝑐𝑖 ,𝑐𝑗 , 𝑀𝑐𝑖 ,𝑐𝑗 , and πœƒπ‘π‘– ,𝑐𝑗 termining the optimal number of clusters auto- are the data, model, and inverse covariance matrix from matically. cut points 𝑐𝑖 to 𝑐𝑗 . Our goal is to determine if a seg- ment should be merged with its neighboring segment. As 3.1. Model description cost shown in Figure 1, we have three candidates as updated cut points: (a) Solo has three segments all separated, (b) The MDL explains the model in a parsimonious way that Left and (c) Right have two segments in which one side calculates the required number of bits. Thus, it follows is merged. We compare the MDL costs using Equation the assumption that the more we can compress the data, (1), in these three cases, (a) vs. (b) vs. (c), and select the the more we can generalize its underlying structures. best cut points so that they minimize the local MDL cost. In a nutshell, we want to find the minimum number of For example, if (b) has the lowest cost, 𝑐𝑖+2 is added to graphical lasso models needed to express the data. The the updated cut points. If (a) has the lowest cost, there is goodness of the model 𝑀 can be described as follows: no change from the previous cut points. We iterate this process throughout the whole sequence. < 𝑋; 𝑀 > = < 𝑀 > + < 𝑋|𝑀 >, (1) where < 𝑀 > shows the cost of describing the model 3.2.2. CutPointSearch (outer loop) 𝑀 , and < 𝑋|𝑀 > represents the cost of describing the This algorithm finds the optimal cut points. We are now data 𝑋 given the model 𝑀 . given bundle 𝑋 and initial cut points 𝑐𝑝. The user decides Model Coding Cost. The description complexity of the interval of initial cut points. Since TVGL forces a model 𝑀 is the sum of the following elements: The num- time-varying similarity with neighboring network, we ber of clusters 𝐾 requires log* (𝐾). 1 Theβˆ‘οΈ€ total number of calculate Ξ˜π‘† , Ξ˜π‘‚ , and Θ𝐸 using the TVGL graphical observations of each cluster requires 𝐾 * π‘˜=1 log (|β„±π‘˜ |). lasso optimization method. After obtaining each Θ, we The meanβˆ‘οΈ€value of each cluster which has a size 𝑝 Γ— 1, run the MergeSegment algorithm to update the cut points. requires 𝐾 π‘˜=1 (𝑝 Γ— 𝑐𝐹 ). The inverse covariance ma- We iterate this process until the cut points are stable. trix βˆ‘οΈ€πΎ of each cluster which has a*size 𝑝 Γ— 𝑝, requires π‘˜=1 |πœƒπ‘˜ |ΜΈ=0 (2 log(𝑝)+𝑐𝐹 )+log (|πœƒπ‘˜ |ΜΈ=0 ), where |Β·|ΜΈ=0 describes the number of non-zero elements in a matrix 3.3. Automatic clustering: NGL and 𝑐𝐹 is the floating point cost. 2 Now we have optimal cut points, which means that there Data Coding Cost. Given a model 𝑀 , encoding cost of are a limited number of segments that have enough sam- the data 𝑋 usingβˆ‘οΈ€ Huffman coding [10] is computed by: ples with which to estimate the network structure. Next, < 𝑋|𝑀 >= 𝐾 π‘˜=1 𝑙𝑙(π‘‹π‘˜ , πœƒπ‘˜ ). Our next goal is to find we assign segments to a cluster and find the optimal num- the best model 𝑀 that minimizes Equation (1). ber of clusters. As Algorithm 1 shows, we use the EM algorithm to classify each segment. For each iteration we 3.2. Automatic cut point detection vary 𝐾 = 1, 2, 3, ..., and minimize the function below: So far, we have described how we calculate the MDL 𝐾 βˆ‘οΈ cost for our model. The next question is how to find arg min βˆ’π‘™π‘™(π‘‹π‘˜ , πœƒπ‘˜ ) + πœ†||πœƒπ‘˜ ||π‘œπ‘‘,1 , (2) optimal cut points that minimize MDL cost efficiently; π‘˜=1 we still have numerous candidates with which to merge In the E-step, we assign each segment to the optimal to summarize similar subsequences into a compact model, cluster, so that the log likelihood is maximized. In the and thus we modify the bottom-up algorithm to prevent a M-step, we calculate the πœƒπ‘˜ value of each cluster using a pattern explosion. We answer this question in two steps, normal graphical lasso optimization algorithm. Until the MergeSegment and CutPointSearch. cost function increases, we vary 𝐾 so as to minimize the 1 Here, log* is the universal code length for integers cost function. 2 We used 4 Γ— 8 bits in our setting Baseline Methods. We compare our method to three state-of-the-art methods and one ablation method. TICC [6] and TAGM [7] take network structure into ac- Figure 1: Illustration of the three candidates. We compare count. Since both methods need to specify the number the each MDL costs of these candidates. of clusters, we gave the true number of clusters only to Algorithm 1 NGL(𝑋, 𝑐𝑝) these methods. AutoPlait [12] is multi-level HMM based 1: Input: Bundle 𝑋 , initial cut point set 𝑐𝑝 automatic method for time-series clustering. NGL no-cps 2: Output: Cluster parameters Θ and cluster assignments is our NGL method without CutPointSearch. Our method, β„± including NGL no-cps, requires us to specify initial cut 3: π‘π‘π‘œπ‘π‘‘ = CutPointSearch(𝑋, 𝑐𝑝); 𝐾 = 1; points. We set its interval at every 5 points throughout 4: while improving the total cost < 𝑋; 𝑀 > do the synthetic experiments. 5: Θ = ModelInitialization(π‘π‘π‘œπ‘π‘‘ , 𝐾); Clustering Accuracy. We set each segment in each of 6: repeat the examples to have 100 observations in R5 (for exam- 7: β„± = AssignToCluster(𝑋, Θ, π‘π‘π‘œπ‘π‘‘ ); /* E-step, ple, "1,2,1" has a total of 300 observations). Table 1 shows Equation (2) */ the clustering accuracy for the macro-F1 scores for each 8: Θ = GraphicalLasso(𝑋, β„± ); /* M-step */ 9: until convergence; dataset. As shown, NGL significantly outperforms the 10: Compute < 𝑋; 𝑀 >; // 𝑀 = {Θ, β„± } baselines. Our method consistently achieves the highest 11: 𝐾 = 𝐾 + 1; accuracy and lowest standard deviation. AutoPlait does 12: end while not consider network structure, so it does not find any 13: return 𝑀 = {Θ, β„± }; clusters. Although we gave the true number of clusters 𝐾 to TICC and TAGM, the average accuracy of our method is more than 10% higher. NGL no-cps shows that find- 4. Experiments ing a large segment by CutPointSearch has meaning of grouping adjacent observations into the same cluster. We evaluate our method on both synthetic and real Effect of Total Number of Samples. We next focus on datasets. the number of samples required for each method to accu- rately find clusters. We take the "1,2,3,4,1,2,3,4" example and vary the number of samples. As shown in Figure 2, 4.1. Accuracy on synthetic data our method outperforms the baselines for almost all seg- In this section, we demonstrate the accuracy of NGL on ment lengths. Our method has a constantly high average, synthetic data. We do so because there are clear ground even for relatively small segment lengths. This is because truth networks with which to test the accuracy. our CutPointSearch algorithm correctly find cut points Experimental Setup. We randomly generate synthetic even if the sample size is small. multidimensional data in R5 , which follows a multivari- ate normal distribution 𝑋 ∼ 𝑁 (0, πœƒβˆ’1 ). Each of the 4.2. Case study 𝐾 clusters has a mean of βƒ—0, so that the clustering re- sult is based entirely on the structure of the data. For Here, we show that our NGL provides an interpretable each cluster, we generate a random ground truth inverse result with real-world financial data. In general, stocks, covariance as follows [11]: bonds, and currency prices are correlated. By examining historical financial data, we can infer a financial network 1. Set 𝐴 ∈ R5Γ—5 equal to the adjacency matrix of structure to reveal the relationships between them. We an ErdΕ‘s-RΓ©nyi directed random graph, where use hourly currency exchange rate data 3 of AUD/USD, every edge has a 20% chance of being selected. EUR/USD, GBP/USD, and USD/CAD from 2005 to 2018. Assuming that the underlying network structure is con- 2. For every selected edge in 𝐴 set 𝐴𝑖𝑗 ∼ sistent for a week, we normalized the data for each week. Uniform([βˆ’0.6, βˆ’0.3] βˆͺ [0.3, 0.6]). We enforce a We also set the initial cut points at a week to capture the symmetry constraint whereby every 𝐴𝑖𝑗 = 𝐴𝑗𝑖 . weekly correlation trend. The top of Figure 3 shows the 3. Let 𝑐 = πœ†min (𝐴) be the smallest eigenvalue of clustering result obtained with NGL. During the global 𝐴, and set πœƒ = 𝐴 + (0.1 + |𝑐|)𝐼, where 𝐼 is an financial crisis (from mid-2007 to early 2009), we found identity matrix. that the network structure changed. There are abrupt changes on 2016/5/16 ∼ 2016/6/5, the bottom of Fig- We run our experiments on four different temporal se- ure 3 shows how the correlation changed during this quences: "1,2,1","1,2,3,2,1","1,2,3,4,1,2,3,4","1,2,2,1,3,3,3,1". period. As we can see, a correlation related to the United We generate each dataset 10 times and report the mean and standard deviation of the macro-F1 score. 3 https://github.com/FutureSharks/financial-data Table 1 Macro-F1 score of clustering accuracy for four different temporal sequences, comparing NGL with state-of-the-art methods (higher is better). Model NGL TAGM (KDD’21) TICC (KDD’18) AutoPlait (SIGMOD’14) NGL no-cps 1,2,1 0.93 Β± 0.05 0.83 Β± 0.25 0.85 Β± 0.26 0.67 0.62 Β± 0.13 1,2,3,2,1 0.96 Β± 0.03 0.74 Β± 0.21 0.89 Β± 0.18 0.40 0.66 Β± 0.15 1,2,3,4,1,2,3,4 0.94 Β± 0.03 0.78 Β± 0.26 0.82 Β± 0.21 0.25 0.66 Β± 0.11 1,2,2,1,3,3,3,1 0.93 Β± 0.05 0.89 Β± 0.17 0.83 Β± 0.26 0.38 0.62 Β± 0.07 Acknowledgments The authors would like to thank the anonymous ref- erees for their valuable comments and helpful sug- gestions. This work was supported by JSPS KAK- ENHI Grant-in-Aid for Scientific Research Number JP20H00585, JP21H03446, JP22K17896, NICT 21481014, MIC/SCOPE 192107004, JST-AIP JPMJCR21U4, ERCA- Environment Research and Technology Development Figure 2: Plot of clustering accuracy macro-F1 score vs. num- Fund JPMEERF20201R02, ber of samples for NGL and two other state-of-the-art meth- ods. References [1] C. Miyajima, Y. Nishiwaki, K. Ozawa, T. Wakita, K. Itou, K. Takeda, F. Itakura, Driver modeling based on driving behavior and its evaluation in driver identi- fication, IEEE 95 (2007) 427–437. [2] F. Tomasi, V. Tozzo, A. Barla, Temporal pattern detection in time-varying graphical models, in: ICPR, 2021, pp. 4481–4488. doi:10.1109/ICPR48806. 2021.9413203. [3] J. Friedman, T. Hastie, R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9 (2008) 432–441. [4] F. Tomasi, V. Tozzo, S. Salzo, A. Verri, Latent variable time-varying network inference, in: KDD, 2018, pp. 2338–2346. URL: https://doi.org/10.1145/3219819. 3220121. doi:10.1145/3219819.3220121. [5] D. Hallac, Y. Park, S. P. Boyd, J. Leskovec, Network inference via the time- varying graphical lasso, in: KDD, 2017, pp. 205–213. URL: https://doi.org/10. 1145/3097983.3098037. doi:10.1145/3097983.3098037. Figure 3: Clustering result of NGL using currency datasets. [6] D. Hallac, S. Vare, S. P. Boyd, J. Leskovec, Toeplitz inverse covariance-based clustering of multivariate time series data, in: KDD, 2017, pp. 215–223. URL: https://doi.org/10.1145/3097983.3098060. doi:10.1145/3097983.3098060. Kingdom changed significantly. This was in response to [7] V. Tozzo, F. Ciech, D. Garbarino, A. Verri, Statistical models coupling allows for complex local multivariate time series analysis, in: KDD, 2021, pp. 1593– the United Kingdom European Union membership ref- 1603. URL: https://doi.org/10.1145/3447548.3467362. doi:10.1145/3447548. erendum on 2016/6/23, which may have caused public 3467362. [8] S. P. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimiza- concern. tion and statistical learning via the alternating direction method of multipli- ers, Found. Trends Mach. Learn. 3 (2011) 1–122. URL: https://doi.org/10.1561/ 2200000016. doi:10.1561/2200000016. [9] E. J. Keogh, S. Chu, D. M. Hart, M. J. Pazzani, An online algorithm for segment- 5. Conclusion and Future work ing time series, in: Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November - 2 December 2001, San Jose, California, USA, IEEE In this paper, we presented NGL, which is an interpretable Computer Society, 2001, pp. 289–296. URL: https://doi.org/10.1109/ICDM.2001. 989531. doi:10.1109/ICDM.2001.989531. clustering algorithm. We focused on the problem of the [10] C. BΓΆhm, C. Faloutsos, J.-Y. Pan, C. Plant, Ric: Parameter-free noise-robust interpretable clustering of multidimensional time-series clustering, TKDD 1 (2007) 10–es. [11] K. Mohan, P. London, M. Fazel, D. Witten, S.-I. Lee, Node-based learning of data with underlying network structures. Our proposed multiple gaussian graphical models, J. Mach. Learn. Res. 15 (2014) 445–488. NGL indeed exhibits all the desirable properties; it is [12] Y. Matsubara, Y. Sakurai, C. Faloutsos, Autoplait: Automatic mining of co-evolving time sequences, in: Proceedings of the 2014 ACM SIGMOD Interpretable and Automatic and Accurate. International Conference on Management of Data, SIGMOD ’14, Associa- In future work, we will focus on the following direc- tion for Computing Machinery, New York, NY, USA, 2014, p. 193–204. URL: https://doi.org/10.1145/2588555.2588556. doi:10.1145/2588555.2588556. tion: Online learning. In several situations, network inference needs to operate in an online fashion. And to the best of our knowledge, no study has dealt with online clustering based on network structure. In this context, we will develop an extension of our methods by utilizing the novel sliding window and bottom-up (SWAB) algorithm [9].