Impact of Warping vs Smoothing for Time Series Similarity

Frank Hoppner

0 0 Ostfalia University of Applied Sciences Dept. of Computer Science , D-38302 Wolfenbuttel , Germany

Introduction. When dealing with time series, the application of a smoothing lter (to get rid of random uctuations and better recognise the relevant structure) is usually one of the rst steps. In the literature on time series similarity measures, however, the impact of smoothing is not explicitly or systematically considered { despite extensive experiments in, e.g., [2]. Instead, complex similarity measures are frequently applied (e.g. dynamic time warping (DTW)), which implicitly deal with noise, but mainly with temporal dilation and translation e ects. So up to now it is unclear, to what extent the good performance of DTW is due to its smoothing or warping capabilities. Optimal Filter. In this work we consider a simple Euclidean distance applied to preprocessed (smoothed) time series. It is unlikely that one similarity measure ts all problem types (or data sets), so by choosing an appropriate lter, we may adopt to the problem at hand. The lter is automatically determined given a training set of classi ed series, such that distances between series of the same (di erent) class are minimised (maximised). The obtained similarity measure is then tested in cross-validated 1-NN classi cation for various data sets (as in [2]) and compared against the DTW performance. Starting from Euclidean distance (without any preprocessing) as a baseline, it turns out that for many data sets a substantial fraction of the performance improvement obtained with DTW is also obtained by choosing the appropriate lter. In some cases, the performance is even better than with DTW, which is due to the fact that a lter is a versatile tool: for some problems it may be advantageous to distinguish time series by their derivative rather than the original series and in such cases a lter that estimates the derivative may be retrieved. For further details the reader is referred to [1].

Ho ppner. Optimal ltering for time series classi cation . In Proc. 16th Int. Conf. Intelligent Data Engineering and Automated Learning , 2015 .

Wang ,

Mueen ,

Ding , G. Trajcevski,

Scheuermann , and

Keogh . Experimental comparison of representation methods and distance measures for time series data . Data Mining and Knowledge Discovery , 26 ( 2 ): 275 { 309 , Feb . 2012 .

Copyright c 2015 by the paper's authors. Copying permitted only for private and academic purposes . In: R. Bergmann , S. Gorg, G. Muller (Eds.): Proceedings of the LWA 2015 Workshops: KDML, FGWM, IR, and FGDB . Trier, Germany, 7 .- 9 .

October 2015 , published at http://ceur-ws.org