=Paper= {{Paper |id=Vol-3186/paper6 |storemode=property |title=Automatic Time-Series Clustering via Network Inference |pdfUrl=https://ceur-ws.org/Vol-3186/paper_6.pdf |volume=Vol-3186 |authors=Kohei Obata |dblpUrl=https://dblp.org/rec/conf/vldb/Obata22 }} ==Automatic Time-Series Clustering via Network Inference== https://ceur-ws.org/Vol-3186/paper_6.pdf
Automatic Time-Series Clustering via Network Inference
Kohei Obata1 , Yasuko Matsubara1 , Koki Kawabata1 and Yasushi Sakurai1
1
    SANKEN, Osaka University


                                       Abstract
                                       Given a collection of multidimensional time-series that contains an unknown type and number of network structures
                                       between variables, how efficiently can we find typical patterns and their points of variation? How can we interpret important
                                       relationships with obtained patterns? In this paper, we propose a new method of model-based clustering, which we call network
                                       clustering via graphical lasso (NGL). Our method has the following properties: (a) Interpretable: it provides interpretable
                                       network structures and cluster assignments for the data; (b) Automatic: it determines the optimal cut points and the number
                                       of clusters automatically; (c) Accurate: it provides reliable clustering performance thanks to the automated algorithm. We
                                       evaluate our NGL algorithm on both real and synthetic datasets, obtaining interpretable network structure results and
                                       outperforming state-of-the-art baselines in terms of accuracy.

                                       Keywords
                                       time-series, network structure, graphical lasso,



1. Introduction                                                                            in most cases, we do not know the optimal number of
                                                                                           clusters in advance.
Many applications generate time-series data including                                         In this paper, we propose an automatic algorithm,
those used in automobiles [1], biology, social networks called network clustering via graphical lasso (NGL),
and in relation to financial data. In most cases, these which enables us to summarize multidimensional time-
data are multidimensional, and it is important to find series into meaningful patterns efficiently based on the
typical patterns, which have a specific network structure. graphical lasso problem. Intuitively, the problem we wish
In practice, real-life data have multiple distinct patterns, to solve is as follows.
which differentiate their network structures. For exam-
                                                                                           InformalProblem 1. Given a large collection of mul-
ple, automobile sensor data from a driving session can be
                                                                                           tidimensional time-series data with underlying network
composed of some basic actions and some abrupt actions
                                                                                           structures 𝑋, Find a compact description of 𝑋, which
(i.e., going straight, turning right, turning left, slowing
                                                                                           consists of:
down, sudden braking, sudden turning). The network
structure is equal to the graph structure. In this case, sen-                                   1. a set of segments and their cut points
sors can be represented as nodes, and sensor interactions                                       2. a set of segment groups (i.e., clusters) of similar
can be represented as edges. For a turning action, lateral                                         network structures
acceleration and steering angle may have an edge and
for a braking action, brake pedal stroke and longitudinal                                       3. the optimal number of clusters
acceleration may have an edge.
    In this paper, we focus on finding a network structure Contrast with Competitors. We will compare NGL
automatically from multidimensional time-series data. with existing methods from the viewpoint of network
Understanding the structure of these networks is useful inference. Network estimation with time-series informa-
because it allows us to devise models of sensor interac- tion has been studied as a method for analyzing eco-
tion, which can be used to analyze such behaviours as nomic data and biological signal data because of the
fossil-efficient driving. However, there are many network high interpretability of its graphical model [2]. Graphical
structures in the data, which change over time, and it is lasso [3, 4] is a network estimation method that provides
difficult to find a meaningful segmentation point since no an interpretable sparse inverse covariance matrix due to
one knows how data change. Moreover, the number of the β„“1 -norm. Time varying graphical lasso (TVGL) [5]
clusters should be selected automatically to find abrupt is a network estimation method that takes time informa-
changes or for an extension to online learning, because tion into account. Although this method can find change
                                                                                           points by comparing the network structure before and
VLDB’22, Sep 05–9, 2022, Sydney, Australia
$ obata88@sanken.oska-u.ac.jp (K. Obata);                                                  after a change, it can’t find clusters. Toeplitz inverse
yasuko@sanken.osaka-u.ac.jp (Y. Matsubara);                                                covariance-based clustering (TICC) [6] and time adaptive
koki@sanken.osaka-u.ac.jp (K. Kawabata);                                                   Gaussian model (TAGM) [7] are clustering methods based
yasushi@sanken.osaka-u.ac.jp (Y. Sakurai)                                                  on network structure. TICC uses Markov random fields
          Β© 2022 Proceedings of the VLDB 2022 PhD Workshop, September 5, 2022. Sydney,
          Australia. Copyright (C) 2022 for this paper by its authors. Use permitted under
          Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                           (MRF) and Toeplitz matrices to capture the inherent rela-
    CEUR

          CEUR Workshop Proceedings (CEUR-WS.org)
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                                                                           tionships among variables. TAGM is a fusion of a hidden
Markov model (HMM) and a Gaussian mixture model              for π‘‹π‘˜ in the π‘˜-th cluster, the full parameter set that we
(GMM). These methods find clusters depending on the          want to estimate consists of 𝑀 = {𝑀1 , 𝑀2 , ..., 𝑀𝐾 }
network structure of each subsequence. This provides         and the number of clusters 𝐾.
clusters with interpretability and allows us to discover
patterns that other traditional clustering methods are       2.2. Graphical lasso problem
unable to find. Both incorporate a graphical lasso to cap-
ture the interaction between variables but require the       We first consider the static inference that estimates πœƒπ‘– .
number of clusters to be specified as prior information.     The optimization problem is written as follows:
Consequently, only our approach satisfies the need for
interpretability and find the optimal number of clusters            minimizeπœƒπ‘– βˆˆπ‘† 𝑝     βˆ’ 𝑙𝑙(π‘₯𝑖 , πœƒπ‘– ) + πœ†||πœƒπ‘– ||π‘œπ‘‘,1 ,
                                                                                   ++

automatically.                                                        𝑙𝑙(π‘₯𝑖 , πœƒπ‘– ) = |π‘₯𝑖 |(log det πœƒπ‘– βˆ’ 𝑇 π‘Ÿ(𝑆𝑖 πœƒπ‘– )),
Contributions. The main contribution of this work is where           𝑆      is     the   empirical    covariance
the concept and design of NGL, which has the following            βˆ‘οΈ€ 𝑖𝑖 |
                                                         (1/|π‘₯𝑖 |) |π‘₯     π‘₯𝑗 π‘₯𝑇𝑗 , π‘₯𝑗 are the different samples,
desirable properties:                                                 𝑗=1
                                                           𝑝
                                                         𝑆++    is the space of the positive definite matrices,
     1. Interpretable: NGL provides the underlying πœ† determines the sparsity level of the network, and
        graphical structures and cluster assignments in β€– Β· β€–π‘œπ‘‘,1 is the off-diagonal β„“1 -norm. This is a convex
        data, which help us to interpret important rela- optimization problem, which imposes the β„“1 -norm
        tionships between variables.                     restriction.

    2. Automatic: We formulate data encoding
       schemes to reveal distinct patterns/clusters, each
                                                             2.3. TVGL problem
       of which captures the network structure. The          To infer a time-varying sequence of networks, TVGL [5]
       proposed method requires no parameter tuning          extends the above approach, which is designed to infer a
       to specify the number of clusters.                    set of inverse covariance matrices Θ. TVGL solves the
                                                             problem below:
    3. Accurate: NGL is a simple yet powerful algo-
       rithm for time-series segmentation using a graphi-                  𝑇
                                                                          βˆ‘οΈ                                    𝑇
                                                                                                               βˆ‘οΈ
       cal lasso, which outperforms state-of-the-art com- minimizeπœƒπ‘– βˆˆπ‘†++
                                                                       𝑝      βˆ’π‘™π‘™(π‘₯𝑖 , πœƒπ‘– ) + πœ†||πœƒπ‘– ||π‘œπ‘‘,1 + 𝛽     πœ“(πœƒπ‘– βˆ’ πœƒπ‘–βˆ’1 ),
       petitors in terms of accuracy.                                     𝑖=1                                  𝑖=2

                                                                where 𝛽 determines how strongly correlated neighboring
2. Preliminary                                                  covariance estimations should be. The penalty function
                                                                πœ“ encourages similarity between πœƒπ‘– and πœƒπ‘–βˆ’1 . Different
In this paper we investigate an automatic network struc- types of πœ“ allow us to enforce different restrictions in the
ture clustering for large multidimensional time-series time-varying similarity. This problem is solved by em-
data. We will describe a few key concepts and back- ploying the alternating direction method of multipliers
ground materials in this section.                               (ADMM) [8], which is a well-established method for solv-
                                                                ing the convex optimization problem. For more details,
                                                                please see, e.g., [5]. Although TVGL can find a changing
2.1. Problem definition
                                                                point by comparing πœƒπ‘– and πœƒπ‘–βˆ’1 , it cannot find a cluster
Consider a set of 𝑝-dimensional time-series data con- simultaneously. Throughout this paper our method uses
sisting of 𝑇 sequential bundle observations, 𝑋 = TVGL to optimize the graphical lasso problem.
{π‘₯1 , π‘₯2 , ..., π‘₯𝑇 } and there are |π‘₯𝑖 | β‰₯ 1 different observa-
tions at each time 𝑖. π‘₯𝑖 ∈ R𝑝 is the 𝑖-th multidimensional
bundle observation, and each bundle observation vector 3. Algorithms
is sampled from any 𝐾 multivariate normal distributions
                                                                The previous section described how to estimate a set of
π‘₯𝑖 ∼ 𝑁 (0, Ξ£π‘˜ ). The network structure is equal to the
                                                                inverse covariance matrices Θ. Now the questions are
graph structure, in which a given π‘₯𝑖 ∼ 𝑁 (0, Ξ£π‘˜ ), each
                                                                (a) how to describe the model, (b) how to find optimal
variable represents a node, and the covariance matrix
                                                                cut points, and (c) how to assign segments to optimal
Ξ£π‘˜ forms an edge. Our goal is to find the cluster assign-
                                                                clusters. There are three main ideas behind our model:
ments of these 𝑇 bundle observations into 𝐾 clusters
β„± = {β„±1 , β„±2 , ..., ℱ𝐾 }, where β„±π‘˜ is a cluster assign-              1. Model description cost: We use the minimum
ment set of π‘‹π‘˜ βŠ‚ 𝑋 (𝑠.𝑑. π‘˜ = 1, 2, ..., 𝐾) represented                  description length (MDL) principle as a model
by a set of 𝐾 matrices, i.e., Θ = {πœƒ1 , πœƒ2 , ..., πœƒπΎ }. There-          selection criterion for choosing between alterna-
fore, letting π‘€π‘˜ = {πœƒπ‘˜ , β„±π‘˜ } be a compact description                  tive segmentation and cluster descriptions. We
        propose a novel cost function to estimate the de- 3.2.1. MergeSegment (inner loop)
        scription cost of the graphical lasso model.
                                                            Assuming that neighboring segments tend to belong to
     2. CutPointSearch: We modify the generic bottom- the same cluster, we update cut points through Merge-
        up algorithm [9] to enhance its ability to han- Segment. We consider given cut points 𝑐𝑝 = {𝑐0 , 𝑐1 ,
        dle time-series data. Initially we adopt short seg- ..., π‘π‘š }, and a set of inverse covariance matrices at each
        ments and iteratively merge with an adjacent pair segment Ξ˜π‘† = {πœƒ,𝑐0 , πœƒπ‘0 ,𝑐1 , ..., πœƒπ‘π‘š , }, where the num-
        that satisfies the cost restriction.                ber of segments is π‘š + 1. And the set of inverse covari-
                                                            ance matrices at each segment, which consists of only
     3. NGL: We use the EM algorithm to cluster the even/odd-numbered cut points Θ𝐸 = {πœƒ,𝑐0 , πœƒπ‘0 ,𝑐2 , ...}
        segments obtained by CutPointSearch while de- and Ξ˜π‘‚ = {πœƒ,𝑐1 , πœƒπ‘1 ,𝑐3 , ...}. 𝑋𝑐𝑖 ,𝑐𝑗 , 𝑀𝑐𝑖 ,𝑐𝑗 , and πœƒπ‘π‘– ,𝑐𝑗
        termining the optimal number of clusters auto- are the data, model, and inverse covariance matrix from
        matically.                                          cut points 𝑐𝑖 to 𝑐𝑗 . Our goal is to determine if a seg-
                                                            ment should be merged with its neighboring segment. As
3.1. Model description cost                                 shown in Figure 1, we have three candidates as updated
                                                            cut points: (a) Solo has three segments all separated, (b)
The MDL explains the model in a parsimonious way that Left and (c) Right have two segments in which one side
calculates the required number of bits. Thus, it follows is merged. We compare the MDL costs using Equation
the assumption that the more we can compress the data, (1), in these three cases, (a) vs. (b) vs. (c), and select the
the more we can generalize its underlying structures. best cut points so that they minimize the local MDL cost.
In a nutshell, we want to find the minimum number of For example, if (b) has the lowest cost, 𝑐𝑖+2 is added to
graphical lasso models needed to express the data. The the updated cut points. If (a) has the lowest cost, there is
goodness of the model 𝑀 can be described as follows:        no change from the previous cut points. We iterate this
                                                            process throughout the whole sequence.
          < 𝑋; 𝑀 > = < 𝑀 > + < 𝑋|𝑀 >,                   (1)

where < 𝑀 > shows the cost of describing the model              3.2.2. CutPointSearch (outer loop)
𝑀 , and < 𝑋|𝑀 > represents the cost of describing the           This algorithm finds the optimal cut points. We are now
data 𝑋 given the model 𝑀 .                                      given bundle 𝑋 and initial cut points 𝑐𝑝. The user decides
Model Coding Cost. The description complexity of                the interval of initial cut points. Since TVGL forces a
model 𝑀 is the sum of the following elements: The num-          time-varying similarity with neighboring network, we
ber of clusters 𝐾 requires log* (𝐾). 1 Theβˆ‘οΈ€ total number  of   calculate Ξ˜π‘† , Ξ˜π‘‚ , and Θ𝐸 using the TVGL graphical
observations of each cluster requires 𝐾              *
                                            π‘˜=1 log (|β„±π‘˜ |).    lasso optimization method. After obtaining each Θ, we
The meanβˆ‘οΈ€value of each cluster which has a size 𝑝 Γ— 1,         run the MergeSegment algorithm to update the cut points.
requires 𝐾    π‘˜=1 (𝑝 Γ— 𝑐𝐹 ). The inverse covariance ma-         We iterate this process until the cut points are stable.
trix
βˆ‘οΈ€πΎ of each cluster which has a*size 𝑝 Γ— 𝑝, requires
   π‘˜=1 |πœƒπ‘˜ |ΜΈ=0 (2 log(𝑝)+𝑐𝐹 )+log (|πœƒπ‘˜ |ΜΈ=0 ), where |Β·|ΜΈ=0
describes the number of non-zero elements in a matrix           3.3. Automatic clustering: NGL
and 𝑐𝐹 is the floating point cost. 2                            Now we have optimal cut points, which means that there
Data Coding Cost. Given a model 𝑀 , encoding cost of            are a limited number of segments that have enough sam-
the data 𝑋 usingβˆ‘οΈ€ Huffman coding [10] is computed by:          ples with which to estimate the network structure. Next,
< 𝑋|𝑀 >= 𝐾         π‘˜=1 𝑙𝑙(π‘‹π‘˜ , πœƒπ‘˜ ). Our next goal is to find   we assign segments to a cluster and find the optimal num-
the best model 𝑀 that minimizes Equation (1).                   ber of clusters. As Algorithm 1 shows, we use the EM
                                                                algorithm to classify each segment. For each iteration we
3.2. Automatic cut point detection                              vary 𝐾 = 1, 2, 3, ..., and minimize the function below:
So far, we have described how we calculate the MDL                                 𝐾
                                                                                  βˆ‘οΈ
cost for our model. The next question is how to find                    arg min         βˆ’π‘™π‘™(π‘‹π‘˜ , πœƒπ‘˜ ) + πœ†||πœƒπ‘˜ ||π‘œπ‘‘,1 ,   (2)
optimal cut points that minimize MDL cost efficiently;                            π‘˜=1
we still have numerous candidates with which to merge
                                                                In the E-step, we assign each segment to the optimal
to summarize similar subsequences into a compact model,
                                                                cluster, so that the log likelihood is maximized. In the
and thus we modify the bottom-up algorithm to prevent a
                                                                M-step, we calculate the πœƒπ‘˜ value of each cluster using a
pattern explosion. We answer this question in two steps,
                                                                normal graphical lasso optimization algorithm. Until the
MergeSegment and CutPointSearch.
                                                                cost function increases, we vary 𝐾 so as to minimize the
    1
        Here, log* is the universal code length for integers    cost function.
    2
        We used 4 Γ— 8 bits in our setting
                                                                  Baseline Methods. We compare our method to
                                                                  three state-of-the-art methods and one ablation method.
                                                                  TICC [6] and TAGM [7] take network structure into ac-
Figure 1: Illustration of the three candidates. We compare
                                                                  count. Since both methods need to specify the number
the each MDL costs of these candidates.
                                                                  of clusters, we gave the true number of clusters only to
Algorithm 1 NGL(𝑋, 𝑐𝑝)                                            these methods. AutoPlait [12] is multi-level HMM based
  1: Input: Bundle 𝑋 , initial cut point set 𝑐𝑝                   automatic method for time-series clustering. NGL no-cps
  2: Output: Cluster parameters Θ and cluster assignments is our NGL method without CutPointSearch. Our method,
     β„±                                                            including NGL no-cps, requires us to specify initial cut
  3: π‘π‘π‘œπ‘π‘‘ = CutPointSearch(𝑋, 𝑐𝑝); 𝐾 = 1;                        points. We set its interval at every 5 points throughout
  4: while improving the total cost < 𝑋; 𝑀 > do                   the synthetic experiments.
  5:    Θ = ModelInitialization(π‘π‘π‘œπ‘π‘‘ , 𝐾);                       Clustering Accuracy. We set each segment in each of
  6:    repeat                                                    the examples to have 100 observations in R5 (for exam-
  7:       β„± = AssignToCluster(𝑋, Θ, π‘π‘π‘œπ‘π‘‘ ); /* E-step, ple, "1,2,1" has a total of 300 observations). Table 1 shows
           Equation (2) */
                                                                  the clustering accuracy for the macro-F1 scores for each
  8:       Θ = GraphicalLasso(𝑋, β„± ); /* M-step */
  9:    until convergence;
                                                                  dataset. As shown, NGL significantly outperforms the
10:     Compute < 𝑋; 𝑀 >; // 𝑀 = {Θ, β„± }                          baselines. Our method consistently achieves the highest
11:     𝐾 = 𝐾 + 1;                                                accuracy and lowest standard deviation. AutoPlait does
12: end while                                                     not consider network structure, so it does not find any
13: return 𝑀 = {Θ, β„± };                                           clusters. Although we gave the true number of clusters 𝐾
                                                                  to TICC and TAGM, the average accuracy of our method
                                                                  is more than 10% higher. NGL no-cps shows that find-
4. Experiments                                                    ing a large segment by CutPointSearch has meaning of
                                                                  grouping adjacent observations into the same cluster.
We evaluate our method on both synthetic and real Effect of Total Number of Samples. We next focus on
datasets.                                                         the number of samples required for each method to accu-
                                                                  rately find clusters. We take the "1,2,3,4,1,2,3,4" example
                                                                  and vary the number of samples. As shown in Figure 2,
4.1. Accuracy on synthetic data
                                                                  our method outperforms the baselines for almost all seg-
In this section, we demonstrate the accuracy of NGL on ment lengths. Our method has a constantly high average,
synthetic data. We do so because there are clear ground even for relatively small segment lengths. This is because
truth networks with which to test the accuracy.                   our CutPointSearch algorithm correctly find cut points
Experimental Setup. We randomly generate synthetic even if the sample size is small.
multidimensional data in R5 , which follows a multivari-
ate normal distribution 𝑋 ∼ 𝑁 (0, πœƒβˆ’1 ). Each of the 4.2. Case study
𝐾 clusters has a mean of βƒ—0, so that the clustering re-
sult is based entirely on the structure of the data. For Here, we show that our NGL provides an interpretable
each cluster, we generate a random ground truth inverse result with real-world financial data. In general, stocks,
covariance as follows [11]:                                       bonds, and currency prices are correlated. By examining
                                                                  historical financial data, we can infer a financial network
      1. Set 𝐴 ∈ R5Γ—5 equal to the adjacency matrix of structure to reveal the relationships between them. We
         an ErdΕ‘s-RΓ©nyi directed random graph, where use hourly currency exchange rate data 3 of AUD/USD,
         every edge has a 20% chance of being selected. EUR/USD, GBP/USD, and USD/CAD from 2005 to 2018.
                                                                  Assuming that the underlying network structure is con-
      2. For every selected edge in 𝐴 set 𝐴𝑖𝑗 ∼
                                                                  sistent for a week, we normalized the data for each week.
         Uniform([βˆ’0.6, βˆ’0.3] βˆͺ [0.3, 0.6]). We enforce a
                                                                  We also set the initial cut points at a week to capture the
         symmetry constraint whereby every 𝐴𝑖𝑗 = 𝐴𝑗𝑖 .
                                                                  weekly correlation trend. The top of Figure 3 shows the
      3. Let 𝑐 = πœ†min (𝐴) be the smallest eigenvalue of clustering result obtained with NGL. During the global
         𝐴, and set πœƒ = 𝐴 + (0.1 + |𝑐|)𝐼, where 𝐼 is an financial crisis (from mid-2007 to early 2009), we found
         identity matrix.                                         that the network structure changed. There are abrupt
                                                                  changes on 2016/5/16 ∼ 2016/6/5, the bottom of Fig-
We run our experiments on four different temporal se- ure 3 shows how the correlation changed during this
quences: "1,2,1","1,2,3,2,1","1,2,3,4,1,2,3,4","1,2,2,1,3,3,3,1". period. As we can see, a correlation related to the United
We generate each dataset 10 times and report the mean
and standard deviation of the macro-F1 score.                         3
                                                                        https://github.com/FutureSharks/financial-data
Table 1
Macro-F1 score of clustering accuracy for four different temporal sequences, comparing NGL with state-of-the-art methods
(higher is better).
      Model                NGL         TAGM (KDD’21)      TICC (KDD’18)           AutoPlait (SIGMOD’14)                 NGL no-cps
      1,2,1             0.93 Β± 0.05      0.83 Β± 0.25       0.85 Β± 0.26                     0.67                         0.62 Β± 0.13
      1,2,3,2,1         0.96 Β± 0.03      0.74 Β± 0.21       0.89 Β± 0.18                     0.40                         0.66 Β± 0.15
      1,2,3,4,1,2,3,4   0.94 Β± 0.03      0.78 Β± 0.26       0.82 Β± 0.21                     0.25                         0.66 Β± 0.11
      1,2,2,1,3,3,3,1   0.93 Β± 0.05      0.89 Β± 0.17       0.83 Β± 0.26                     0.38                         0.62 Β± 0.07


                                                              Acknowledgments
                                                              The authors would like to thank the anonymous ref-
                                                              erees for their valuable comments and helpful sug-
                                                              gestions. This work was supported by JSPS KAK-
                                                              ENHI Grant-in-Aid for Scientific Research Number
                                                              JP20H00585, JP21H03446, JP22K17896, NICT 21481014,
                                                              MIC/SCOPE 192107004, JST-AIP JPMJCR21U4, ERCA-
                                                              Environment Research and Technology Development
Figure 2: Plot of clustering accuracy macro-F1 score vs. num- Fund JPMEERF20201R02,
ber of samples for NGL and two other state-of-the-art meth-
ods.                                                          References
                                                               [1] C. Miyajima, Y. Nishiwaki, K. Ozawa, T. Wakita, K. Itou, K. Takeda, F. Itakura,
                                                                   Driver modeling based on driving behavior and its evaluation in driver identi-
                                                                   fication, IEEE 95 (2007) 427–437.
                                                               [2] F. Tomasi, V. Tozzo, A. Barla, Temporal pattern detection in time-varying
                                                                   graphical models, in: ICPR, 2021, pp. 4481–4488. doi:10.1109/ICPR48806.
                                                                   2021.9413203.
                                                               [3] J. Friedman, T. Hastie, R. Tibshirani, Sparse inverse covariance estimation with
                                                                   the graphical lasso, Biostatistics 9 (2008) 432–441.
                                                               [4] F. Tomasi, V. Tozzo, S. Salzo, A. Verri, Latent variable time-varying network
                                                                   inference, in: KDD, 2018, pp. 2338–2346. URL: https://doi.org/10.1145/3219819.
                                                                   3220121. doi:10.1145/3219819.3220121.
                                                               [5] D. Hallac, Y. Park, S. P. Boyd, J. Leskovec, Network inference via the time-
                                                                   varying graphical lasso, in: KDD, 2017, pp. 205–213. URL: https://doi.org/10.
                                                                   1145/3097983.3098037. doi:10.1145/3097983.3098037.
Figure 3: Clustering result of NGL using currency datasets.    [6] D. Hallac, S. Vare, S. P. Boyd, J. Leskovec, Toeplitz inverse covariance-based
                                                                   clustering of multivariate time series data, in: KDD, 2017, pp. 215–223. URL:
                                                                   https://doi.org/10.1145/3097983.3098060. doi:10.1145/3097983.3098060.
Kingdom changed significantly. This was in response to         [7] V. Tozzo, F. Ciech, D. Garbarino, A. Verri, Statistical models coupling allows
                                                                   for complex local multivariate time series analysis, in: KDD, 2021, pp. 1593–
the United Kingdom European Union membership ref-                  1603. URL: https://doi.org/10.1145/3447548.3467362. doi:10.1145/3447548.
erendum on 2016/6/23, which may have caused public                 3467362.
                                                               [8] S. P. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimiza-
concern.                                                           tion and statistical learning via the alternating direction method of multipli-
                                                                   ers, Found. Trends Mach. Learn. 3 (2011) 1–122. URL: https://doi.org/10.1561/
                                                                   2200000016. doi:10.1561/2200000016.
                                                               [9] E. J. Keogh, S. Chu, D. M. Hart, M. J. Pazzani, An online algorithm for segment-
5. Conclusion and Future work                                      ing time series, in: Proceedings of the 2001 IEEE International Conference on
                                                                   Data Mining, 29 November - 2 December 2001, San Jose, California, USA, IEEE
In this paper, we presented NGL, which is an interpretable         Computer Society, 2001, pp. 289–296. URL: https://doi.org/10.1109/ICDM.2001.
                                                                   989531. doi:10.1109/ICDM.2001.989531.
clustering algorithm. We focused on the problem of the        [10] C. BΓΆhm, C. Faloutsos, J.-Y. Pan, C. Plant, Ric: Parameter-free noise-robust
interpretable clustering of multidimensional time-series           clustering, TKDD 1 (2007) 10–es.
                                                              [11] K. Mohan, P. London, M. Fazel, D. Witten, S.-I. Lee, Node-based learning of
data with underlying network structures. Our proposed              multiple gaussian graphical models, J. Mach. Learn. Res. 15 (2014) 445–488.
NGL indeed exhibits all the desirable properties; it is       [12] Y. Matsubara, Y. Sakurai, C. Faloutsos, Autoplait: Automatic mining of
                                                                   co-evolving time sequences, in: Proceedings of the 2014 ACM SIGMOD
Interpretable and Automatic and Accurate.                          International Conference on Management of Data, SIGMOD ’14, Associa-
   In future work, we will focus on the following direc-           tion for Computing Machinery, New York, NY, USA, 2014, p. 193–204. URL:
                                                                   https://doi.org/10.1145/2588555.2588556. doi:10.1145/2588555.2588556.
tion: Online learning. In several situations, network
inference needs to operate in an online fashion. And to
the best of our knowledge, no study has dealt with online
clustering based on network structure. In this context, we
will develop an extension of our methods by utilizing the
novel sliding window and bottom-up (SWAB) algorithm
[9].