Introduction

Progressive and Iterative Approaches for Time Series Averaging

Saeid Soheily-Khah

Ahlame Douzal-Chouakria

Eric Gaussier

Eric.Gaussierg@image.fr 0 0 Universite Grenoble Alpes , CNRS - LIG/AMA , France

2015

Averaging a set of time series is a major topic for many temporal data mining tasks as summarization, extracting prototype or clustering. Time series averaging should deal with the tricky multiple temporal alignment problem; a still challenging issue in various domains. This work compares the major progressive and iterative averaging time series methods under dynamic time warping (dtw).

Introduction

Time series centroid estimation is a major issue for many temporal data analysis and mining tasks as summarization, extracting temporal prototype or clustering. Estimating the centroid of a set of time series under time warp should deal with the tricky multiple temporal alignment problem [1{4]. Temporal warping alignment of time series has been an active research topic in many scienti c disciplines. To estimate the centroid of two time series under temporal metrics, as the dynamic time warping [5{7], one standard way is to embed the time series into a new Euclidean space de ned by their temporal warping alignment. In this space, the centroid can be estimated as the average of the linked elements. The problem becomes more complex where the number of the time series is more than two, as one needs to determine a multiple alignment that links simultaneously all the time series on their commonly shared similar elements.

A rst manner to determine a multiple alignment is to search, by dynamic programming, the optimal path within an n-dimensional grid that crosses the n time series. The complexity of this approach prevents its use, as it constitutes an NP-complete problem with a complexity of O(T n) that increases exponentially with the number of time series n and the time series length T . A second way, that identi es progressive approaches, is based on combining progressively pairwise time series centroids to estimate the global one. The progressive approaches may su er of the early error propagation through the set of pairwise centroid combinations. The third approach is iterative, it works similarly to the progressive approach, but mainly reduces the error propagation by repeatedly re ning the barycenter and realigning it to the initial time series. In general, the main progressive and iterative approaches are of heuristic nature limited to the dynamic time warping metric, that provide an estimation of the barycenter without guarantee of an optimal solution.

The main contribution of this work is to introduce some major progressive and iterative approaches for time series centroid estimation, prior to present their Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes. characteristics, as well as an extensive comparison between the mentioned methods throughout real and synthetic datasets, where to the best of our knowledge this necessary study is never conducted before.

The rest of this paper is organized as follows: In the next section, di erent approaches are studied and Section 3 presents the conducted experimentation and discuss the results obtained. Lastly, Section 4 concludes the paper. 2

Progressive and iterative approaches

The progressive and iterative methods for averaging a set of time series are mostly derived from the multiple sequence alignment methods to address the tricky multiple temporal alignment problem. In the following, we review the major progressive and iterative approaches for time series averaging under time warp.

Gupta et al. in [ 8 ] used the dtw in the sequence alignment to average a set of time series. The method, called "NonLinear Alignment and Averaging Filters (nlaaf)", uses a tournament scheme averaging approach that it's simplest averaging ordering consists in pairwise averaging sequences following a tournament scheme. That way, N=2 average sequences are created at rst step. Then those N=2 sequences, in turn, arepairwise averaged into N=4 sequences, and so on, until one sequence is obtained. In this approach, the averaging method between two sequence is applied (N 1) times. nlaaf works by placing each element of the average sequence of two time sequences, as the center of each association created by DTW. Its main drawback lies in growth of its resulting length, because each use of the average method can almost double the length of the average sequence. That is why nlaaf is generally used in conjunction with a process reducing the length of the average, leading to a loss of information and thus to an unsatisfactory approximation. Additionally, the average strongly depends on the order of time series sequences and so, di erent orders of sequences give di erent average sequence.

To avoid the bias induced by random selection, Niennattrakul et al. [ 11, 12 ] proposed a framework of shape averaging called "Prioritized Shape Averaging (psa)", which uses hierarchical clustering with a new dtw averaging function, labeled "Scaled Dynamic Time Warping " with extra capability in stretching some parts of warping path so that the result is more similar to a sequence time series with more weight. Niennattrakul used hierarchical clustering as a heuristic to order the priority. In spite of this hierarchical averaging method aims to prevent the order dependency, the length of average sequences remains a problem. Local averaging strategies like nlaaf or psa may let an initial approximation error propagate throughout the averaging process. If the averaging process has to be repeated, the e ects may dramatically alter the quality of the result. This is why a global approach is desirable, where sequences would be averaged all together, with no sensitivity to their order of consideration.

A direct manner to estimate the centroid proposed by Abdulla et al. [ 1 ], called "Cross-Words Reference Template (cwrt)", which uses medoid as the reference time series as follows. First, the time series medoid is selected. The whole time series are then described in the representation space de ned by the reference medoid. In the next step, all sequences are aligned by dtw to a single medoid and then the average is computed by averaging the time-aligned time series across each point. Petitjean et al. [ 3 ] proposed a global averaging method, called "Dtw Barycenter Averaging (dba)", which consists in iteratively re ning an initially average sequence, in order to minimize its distance to the averaged sequence. As a summary, the dba under temporal warping is a global approach that can average a set of sequences all together.

All the methods de ne heuristic approaches, although with no guarantee of optimal solutions, the provided approximations are accurate particularly for time series that behave similarly within the set. However these approaches may fail principally for time series with similar global behavior and local temporal di erences, as one needs to deploy local instead of global averaging process. 3

Experimental study

The experiments are conducted to compare the above approaches on classes of time series composing various datasets. The datasets can be divided into two categories. The rst one is composed of time series that have similar global behavior within the classes, where the time series of the second category may have distinct global behavior, while sharing local characteristics [ 9 ]. For the comparison, the induced inertia reduction rate and the required run time are evaluated as well as the qualitative comparison of the centroids obtained by a visualization. In the following, we rst describe the datasets used, then specify the validation process and discuss the obtained results. 3.1

Data descrpition

The experiments are rst carried out on four well known public datasets cbf, cc, digits and character traj. [ 10 ]. These data de ne a favorable case for the averaging task as time series behave similarly within the classes. Then, we consider more complex datasets: bme1, umd1, spiral [ 4 ], noised spiral1 and consseason [ 10 ]. They are composed of time series that behave di erently within the same classes while sharing several local characteristics. Table 1 indicates for each data set: the number of classes it includes (Nb. Class), the number of instances (Nb. TS), the number of attributes (Nb. Att), the time series length (TS length) and the global or local nature of similarity within the classes (Type). 3.2

Validation process

The four mentioned methods nlaaf, psa, cwrt and dba described in Section 2 is compared together. The performances of these approaches are evaluated through the centroid estimation of each class of the above described datasets. 1 http://ama.liglab.fr/ douzal/data Particularly, the e ciency of each approach is measured through: a) the reduction rate of the inertia criterion; the initial inertia being evaluated around the time series medoid that minimizes the distances to the rest of time series and b) the space and time complexity. The results reported hereafter are averaged through a bootstrap process, with 10 repetitions. Finally for all reported results, the best one which is signi cantly di erent from the rest(t -test at 1% risk) is indicated in bold.

Inertia reduction rate Time series averaging approaches are used to estimate centroid of the time series classes described above, then the inertia w.r.t. the centroids is measured. Lower is the inertia higher representative is the extracted PN centroid. Table 2, gives the obtained inertia reduction rates irr=1 PiNi==11DD((xxii;;mc)) , averaged per dataset; x1; :::; xN being the set of time series, D the metric, c the determined centroid and m the initial medoid. Table 2 shows that the dba provides the highest irr for the most datasets. Some negative rates observed indicate an inertia increase. Time and space complexity In Table 3 the studied approaches are compared w.r.t their space and time complexity. The results, averaged per dataset, reveal almost dba the faster method and psa the slowest one. The cwrt approach is not comparable to the rest of the methods as it performs directly an euclidean distance on the time series once the initial dtw matrix evaluated. Remark that for nlaaf and psa the centroid lengths are very large making these approaches unusable for large time series. The centroid lengths for the remaining methods are equal to the length of the initial medoid. The higher time consumptions observed for nlaaf and psa are mainly explained by the progressive increase of the centroid length during the pairwise combination process. From Table 2, we can see that dba and psa lead to the highest inertia reduction rates, where the best scores (indicated in bold) are reached by dba for almost all datasets. However it is signi cantly lower for some challenging datasets. Finally, cwrt has the lowest inertia reduction rates. The negative rates observed for cwrt indicate an inertia increase. As expected, the dba method that iteratively optimizes an inertia criterion, in general, reaches higher values than the noniterative methods (nlaaf, psa and cwrt).

From Table 3, the results reveal dba the fastest method and the psa the slowest one. For nlaaf and psa the estimated centroids have a drastically large dimension (i.e. a length around 104) making these approaches unusable for large time series datasets. The nlaaf and psa methods are highly time-consuming, largely because of the progressive increase of the centroid length during the pairwise combination process. The centroid lengths for the remaining methods are equal to the length of the initial medoid (Table 3). Finally, psa appears greatly slower than nlaaf; this is due to the hierarchical clustering on the whole time series. We nally visualize here some of the centroids obtained by the di erent methods to compare their shape to the one of the time series they represent. Figure (1) and (2) display the centroids obtained by the mentioned methods respectively for the class "funnel " of cbf and "cyclic" of data set cc. As one can note, for global datasets, almost all the approaches succeed in obtainging centroids more or less similar to the initial time series. However, we observe generally less representative centroids for nlaaf and psa. The dtw is among the most frequently used metrics for time series in several domains as signal processing, temporal data analysis and mining or machine learning. However, for time series clustering, approaches are generally limited to kmedoid to circumvent time series averaging under dtw and tricky multiple temporal alignments problem. The present study compares the major progressive and iterative time series averaging approaches under dynamic time warping. The experimental validation is based on global datasets in which time series share similar behaviors within classes, as well as on more complex datasets exhibiting time series that share only local characteristics, that are multidimensional and noisy. Both the quantitative evaluation, based on an inertia criterion and time and space complexity, and the qualitative one (consisting in the visualization of the centroids obtained by di erent methods) show the e ectiveness of dba approach. In particular, the dba method that iteratively optimizes an inertia criterion, not only, reaches higher values than the non-iterative methods (nlaaf, psa and cwrt), but also provides a fast time series averaging for global and local datasets.

1. Abdulla , W.H.

and

Chow , D. and Sin , G.: Cross-words reference template for DTWbased speech recognition systems . Proc. TENCON, Pages 1576 { 1579 , Vol. 2 ( 2003 )

2. Hautamaki , V. and Nykanen , P. and Franti , P. : Time-series clustering by approximate prototypes . 19th International Conference on Pattern Recognition , ( 2008 ).

3. Petitjean , F. and Ketterlin , A. and GanCarski, P.: A global averaging method for dynamic time warping, with applications to clustering . Pattern Recognition, Pages 678-693 , Vol. 44 ( 2011 )

Zhou and F. De la Torre : Generalized time warping for multi-modal alignment of human motion . IEEE, Computer Vision and Pattern Recognition (CVPR) , Pages 1282 { 1289 ( 2012 )

5. Kruskall , J.B. and Liberman , M.: The symmetric time warping algorithm: From continuous to discrete . Addison-Wesley, Time Warps Journal ( 1983 )

6. Sakoe , H.

and

Chiba , S. : Dynamic programming algorithm optimization for spoken word recognition . IEEE Transactions on Acoustics, Speech, and Signal Processing, Pages 43 { 49 , Vol. 26 ( 1978 )

7. Sanko , D. and Kruskal , J.B. : Time warps, string edits, and macromolecules: the theory and practice of sequence comparison . Cambridge University Press, AddisonWesley, ( 1983 )

Lalit

Gupta ,

Molfese ,

Tammana , P. Simos: Nonlinear alignment and averaging for estimating the evoked potential . IEEE T. on Biomedical Engineering, No. 4, Pages 348 { 356 , Vol. 43 ( 1996 )

Frambourg ,

Douzal-Chouakria and E. Gaussier: Learning Multiple Temporal Matching for Time Series Classi cation . In Advances in Intelligent Data Analysis XII (pp. 198 - 209 ). Springer Berlin Heidelberg. ( 2013 )

10.

UCI

Machine Learning Repository , "http://archive.ics.uci.edu/ml/ "

11.

Niennattrakul , C. Ratanamahatana: On Clustering Multimedia Time Series Data Using K-means and Dynamic Time Warping . Multimedia and Ubiquitous Engineering, MUE' 07 . International Conference on IEEE, Pages 733 { 738 , ( 2007 )

12.

Niennattrakul , C. Ratanamahatana: Shape Averaging under Time Warping . Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON 6th International Conference on IEEE , Vol. 2 , Pages 626 { 629 , May ( 2009 )