=Paper=
{{Paper
|id=Vol-2771/paper31
|storemode=property
|title= Using Navigable Small Worlds to Speed Up Time-Series Classification
|pdfUrl=https://ceur-ws.org/Vol-2771/AICS2020_paper_31.pdf
|volume=Vol-2771
|authors=Vivek Mahato,Pádraig Cunningham
|dblpUrl=https://dblp.org/rec/conf/aics/MahatoC20
}}
== Using Navigable Small Worlds to Speed Up Time-Series Classification==
Using Navigable Small Worlds
to Speed Up Time-Series Classification
Vivek Mahato and Pádraig Cunningham
School of Computer Science
University College Dublin
Dublin 4, Ireland
vivek.mahato@ucdconnect.ie, padraig.cunningham@ucd.ie
Abstract. It is not easy to apply standard Machine Learning methods
to time-series classification because of the nature of the data. k-Nearest
Neighbour (k-NN) can be applied because effective distance measures for
time-series are available (e.g. Dynamic Time Warping (DTW)). However,
these effective measures are inclined to be quite computationally expen-
sive – DTW is O(t2 ) in the length of the time-series. This is a problem
because the resultant retrieval time is O(nt2 ) when n is the number of
training samples. Methods to improve k-NN retrieval performance exist,
but some constraints apply. Some methods require that the distance mea-
sure is a proper metric; others are only effective in low dimension spaces.
In this paper, we evaluate the effectiveness of Navigable Small Worlds
(NSW) for time-series classification using k-NN. NSW organises training
samples into a small-world network for efficient retrieval. We show that
this strategy can significantly speed up time-series classification at the
cost of some loss of accuracy.
Keywords: Navigable Small Worlds, Time Series Classification, Near-
est Neighbour
1 Introduction
Time-series classification is a particular challenge for supervised machine learn-
ing because the data is not in a feature vector format. A comprehensive evalu-
ation of ML methods for time-series classification [3] shows that k-NN is very
effective provided a suitable distance measure can be identified. However, k-NN
is not a perfect panacea because effective distance measures for time-series data
are computationally expensive. Methods to speed up k-NN retrieval (both ex-
act and approximate) that exists have constraints attached. Speedup methods
typically require that the distance measure is a proper metric. Some time-series
measures (e.g. DTW) do not meet the triangle inequality requirement to be a
metric. Also, the curse of dimensionality cannot be avoided; it is hard to speed
up k-NN retrieval in high dimension spaces and preserve accuracy.
In this paper, we assess the potential of using Navigable Small Worlds for
k-NN in time-series classification. With NSW, the data is organised into a small-
Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
2 Vivek Mahato and Pádraig Cunningham
world graph that facilitates efficient retrieval. Small-world networks have the spe-
cial characteristic that any pair of nodes is likely to be linked by a small number
of hops. This property is due to the presence of hubs in the network. Hubs are
high-degree nodes that serve as a routing backbone that delivers short shortest
path distances. With NSW this small-world characteristic arises naturally from
the way the network is constructed (see details in section 4.1).
So NSW can be used as a data-structure to provide approximate k-NN re-
trieval (see Figure 3). While most of the research on NSW for approximate
nearest neighbour retrieval is based on proper metrics, the authors do claim
that it might also be effective for non-metric spaces [11]. In this paper, we evalu-
ate three time-series distance measures in combination with NSW. We consider
DTW, which is not a metric and Time-Warp Edit Distance (TWED) which is.
The results in section 5 show that NSW, in combination with DTW, can reduce
retrieval time by a factor of four. The price that is paid for this is a reduction
in accuracy of roughly 2% to 3%.
In the next section, we provide an overview of k-NN and time-series classifi-
cation illustrating the computational challenges. In section 3, we review relevant
research on k-NN speedup. In section 4, we introduce NSW and show how it
can be used for k-NN retrieval. Our results on using NSW to speed up k-NN
time-series classification are presented in section 5.
2 k-NN and Time-Series Classification
k-NN works by finding the nearest neighbours for a query q in training set
D and then a class is assigned to q using these neighbours, typically by ma-
jority voting. If the data is numeric, D is made up of samples of the form
x = {x1 , ..., xj , ..., xm }. For each x ∈ D the Euclidean distance between q and x
is: v
um
uX
d(q, x) = t (qi − xi )2 (1)
i=1
If two time-series are the same length, then Euclidean distance can be used
as a distance measure. However, as we see in Figure 1, if the two time-series are
a little out of phase, then the Euclidean distance can be misleading. The vertical
lines in the plot on the left indicate this. So there has been extensive research
on similarity/distance measures for time-series [10]. In this paper, we consider
three measures:
– DTW allows the time-axes to be warped to allow for more flexible matching
[8]. It is O(t2 ) in the length of the time-series and it is not a proper metric.
– TWED supports inexact matching of time-series by counting edit operations
as is done in string matching [12]. It is O(t2 ) in the length of the time-series
but is a proper metric. As such it is a good candidate to incorporate into an
NSW retrieval framework.
– Euclidean Distance is included as a baseline. It is O(t) and is of course a
proper metric.
NSW Speedup 3
Fig. 1: On the left, we have two time-series that are clearly similar, but the
Euclidean distance between them is large (vertical lines in the graph). The image
on the right shows the non-linear mapping by DTW between these two time-
series (produced using tslearn[14]).
Table 1: 3D printing datasets
Dataset Train Test Total
PLarge 1,794 1,793 3,587
PSmall 1,384 1,382 2,766
2.1 Time-Series Classification in Manufacturing
In our research, we are particularly interested in the use of time-series classifi-
cation for process monitoring in manufacturing. We are concerned with the use
of time-series classification to detect processing anomalies in metal 3D printing
[9]. It is normal to monitor the 3D printing process using sensors. These sen-
sors may monitor conditions in the 3D printing chamber, or they may focus on
the melt-pool itself. To evaluate NSW for time-series classification speedup, we
consider two datasets collected from an Aconity MINI 3D printer.1
The details of the two datasets are summarised in Table 1. Each data sample
is a time-series of melt-pool temperature recorded by a pyrometer. The classifi-
cation task is to detect pores in the part being manufactured. These pores would
be rare but would typically show up as dips in the temperature time-series. The
longest time-series in both datasets contain 300 time points and the shortest
101. When calculating distance the shorter series is post-padded with zeroes to
bring it to the length of the longer one. A trailing zero is added to the longer
series to allow matching with this padding.
To compare the time complexity of the three distance measures, we computed
the average retrieval time of 10 random queries in a k-NN environment using
1
https://aconity3d.com/products/aconitymini/
4 Vivek Mahato and Pádraig Cunningham
the PLarge dataset. The average retrieval time for each measure (given best
parameters) per query are as follows:
– Euclidean2 : 0.02 s
– DTW3 : 1.46 s
– TWED4 : 10.16 s
As expected, Euclidean distance is the fastest by two orders of magnitude,
DTW the second, and TWED the slowest of them all.
3 k-NN Speedup Strategies
These baseline calculations illustrate the run-time problems with k-NN. The
performance with Euclidean distance might be acceptable, but the DTW and
TWED figures are not adequate for real-time analysis. This is a recognised prob-
lem with k-NN and has been the subject of extensive research. This research can
be categorised under exact methods and approximate methods. Exact methods
will guarantee that the neighbours retrieved by brute force search will be precise.
Approximate methods don’t meet this requirement.
Exact Methods The two most popular alternatives to brute force search for
exact k-NN are Ball Trees and Kd-Trees [4] but their applicability is constrained.
Kd-Trees can only be used with feature-vector data and are thus not applicable
for time-series data. Ball Trees require a distance measure that is a proper metric.
Both strategies cease to be effective in high dimension spaces [1, 13].
Approximate Methods Efficient k-NN retrieval is an important considera-
tion, particularly in internet applications such as image or music retrieval. As
such, the focus has shifted to consider approximate nearest neighbour (ANN)
methods [6, 2]. Approximate methods may be adequate where the objective is in-
formation retrieval rather than classification. For nearest neighbour search over
collections with tens or even hundreds of thousands of objects perfect recall is
not essential. ANN strategies can be divided into three broad categories:
– Random Projection Trees (RPT). The main drawback with Kd-Trees
is that search through the tree entails a lot of backtracking to ensure that
the nearest neighbours are retrieved. RPT does not backtrack; instead, the
search is repeated multiple times and the initial candidates identified each
time are assessed to find good nearest neighbours [7].
– Locality Sensitive Hashing (LSH) maps similar items into the same
‘buckets’ with high probability. The strategy is to use several variants of
LSH algorithms to retrieve a candidate set of nearest neighbours. Then the
distance metric can be applied to these candidates to find nearest neighbours
that will be near-optimal [5].
2
https://pypi.org/project/numpy/
3
https://pypi.org/project/tslearn/
4
https://en.wikipedia.org/wiki/Time Warp Edit Distance
NSW Speedup 5
Table 2: Network Statistics
Network Radius Diameter Avg. Hops
k-NN 5 9 3.18
NSW 3 7 2.76
(a) NSW network on PLarge dataset. (b) k-NN network on PLarge dataset.
Fig. 2: The in-degree distribution of a NSW & k-NN network on PLarge dataset
– Neighbourhood-Based Methods index the data by building a proximity
graph where data points are linked to their neighbours. Retrieval entails
navigating this proximity graph to find nearest neighbours for a query. NSW
is such a method but with the special characteristic that the graph is a small-
world graph (see section 4). A recent evaluation by Li et al. [6] shows that
these methods are competitive and, because they can be combined with
DTW and TWED; this is the ANN strategy we consider.
4 NSW & k-NN
NSW is a neighbourhood-based method for ANN that constructs a proximity
graph that has small-world characteristics. This is illustrated in Figure 2 and
Table 2. We compare an NSW graph with a standard k-NN graph. The k-NN
graph is constructed by connecting each node to its f = 233 nearest neighbours
(parameter setting details are in section 4.2). The NSW graph is constructed
using the strategy detailed in section 4.1. The statistics in Table 2 show that the
NSW graph is more compact than the k-NN graph; this is due to the high-degree
hub nodes on the right in Figure 2(a).
4.1 NSW in Operation
NSW uses a small-world network that can be represented by a graph G(V, E),
where vertices from set V are the stored elements, and edges from set E are
6 Vivek Mahato and Pádraig Cunningham
Fig. 3: Navigable Small World (NSW): The nearest neighbours to the query
would be found from the entry point along the path shown in red.
connections linking them with other vertices. The NSW structure is constructed
by inserting elements into the network consecutively[11]. For each insertion of an
element, the algorithm first searches for a set of its nearest neighbours already
present in the network (see Algorithm 1).
Algorithm 1 Insertion Algorithm
procedure NN Insert(element, f, m) . m multi-searches to find f neighbours to
connect with element
neighbours ← N N Search(element, f, m)
for neighbour in neighbours do
element.connect(neighbour)
neighbour.connect(element)
Store the new and updated elements in the network
NSW employs a variant of the greedy search algorithm as its core k-NN
search to fetch approximate neighbours of the object. At first, a random node
is selected from the network as an entry point (see Figure 3), and then search
progress by traversing the network from one element to its unvisited nearest
neighbouring element. The exploration continues within the neighbourhood of
the nearest objects in a greedy style until it no longer improves the already known
k nearest neighbours (stop condition) or has an unvisited object left to explore
(see Algorithm 2). On identifying a set of neighbours, the algorithm inserts the
element into the network by connecting it to its neighbours and vice versa. This
incremental strategy has the important side-effect that long-distance links are
created early on in the construction process because true nearest neighbours are
not present. This results in small-world characteristics.
We used ValueSortedDict from the sortedcollections5 package in the PyPI repos-
itory, as the data-structure for our implementation of NSW for storage and re-
NSW Speedup 7
trieval of neighbours. This significantly improves performance because there is
a lot of sorting required for network traversal.
Algorithm 2 Search Algorithm
procedure NN Search(query, k, m) . m searches to find k neighbours of a query
Set visited set . maintains list of visited nodes
ValueSortedDict candidates, result
for iteration ≤ m do
v ep ← random entry point . Fetch a random node from the network
visited set, candidates ← v ep
ValueSortedDict temp result
while True do
candidate ← nearest candidate(candidates) . from candidates pool
if stop condition = F alse and candidate has neighbours then
for neighbour in candidate.neighbours do
if neighbour not in visited set then
cost ← distance(query, neighbour)
visited set, candidates, temp result ← neighbour
else
break while loop
result ← temp result . result is updated with temp result
Return k nearest neighbours from result
Two parameters f and m dictate the performance of the NSW algorithm.
f determines the maximum number of “friends” or neighbours, a vertex can
have in the network, and m dictates multi-search. m is directly proportional to
the recall of the system, where a high enough m results in the perfect recall;
however, it is then similar to an exhaustive search. Hence, there is a trade-off
between classification accuracy and the speed-up accomplished by the model,
governed by its parameters.
4.2 NSW Performance
Before applying NSW to time-series data, we present a baseline analysis on fea-
ture vector data. For this, we have selected six public datasets from the KEEL6
repository for classification tasks. As previously mentioned, selecting optimal
values for f and m is essential, owing to the trade-off between accuracy and
time complexity of the NSW model. The idea is to obtain a high accuracy while
keeping f and m small for a quicker retrieval time. This is achieved through
a grid-search, as shown in Figure 4. When supplied with a pair of values, the
models undergo 5×5-fold cross-validation over the train set of a dataset, with
accuracy as its scoring method. The heatmap generated illustrates the effect of
5
https://pypi.org/project/sortedcollections/
6
https://sci2s.ugr.es/keel/datasets.php
8 Vivek Mahato and Pádraig Cunningham
(a) Parameter heatmap of NSW over (b) Parameter heatmap of NSW over
newthyroid dataset. pima dataset.
Fig. 4: These heatmaps show the impact of the f and m parameters on accuracy
when using NSW. The selected parameters are highlighted in green.
f and m on the accuracy of the model. Fig. 4 show heatmaps for two datasets. It
is clear that higher values of f and m give better accuracy, but it is possible to
find lower values that achieve high accuracy while achieving significant speedup.
The feature selection process is mostly automated, with these heatmaps al-
lowing the user to select an optimal pair of parameter values for the dataset. The
final evaluation of the model is over the held-out test set of the dataset provided
with the selected pair as its parametric values. The evaluation results on the six
non-time-series datasets are shown in Figure 5. Over the six datasets, the pro-
cessing time is reduced to 42% on average. There is no reduction in classification
accuracy for three of the datasets. The average reduction in accuracy is ≈ 1.3%.
The result illustrates NSW to be a promising strategy to achieve accuracy of full
k-NN comparably and much quicker.
Fig. 5: kNN vs. NSW with Euclidean as their distance metric over six non time-
series datasets. The bars represent the accuracy of the model, and the ‘red’ ×’s
denotes the speed-up (which is a percentage of the total time taken by k-NN
model).
NSW Speedup 9
5 NSW in TS Classification
Building upon our preliminary analysis discussed in 4.2, we extend the evaluation
of k-NN and NSW models to classification tasks on time-series datasets (PLarge
and PSmall ). The final results reported in Figure 6 are on hold-out test sets
as described in Table 1. The f and m parameters were tuned using 5 × 5-fold
cross-validation on the training set. The value of k for exact k-NN was set in the
same manner. The same k value was used with NSW.
(a) Model evaluation over Plarge dataset.
(b) Model evaluation over Psmall dataset.
Fig. 6: k-NN vs. NSW: Evaluation over classification accuracy and time com-
plexity. The “speed-up” gained by NSW is the percentage of total time taken by
the k-NN counterpart.
The main conclusions from this evaluation are as follows:
1. Even if TWED is a proper metric, DTW is still the clear winner by achieving
the highest accuracy for both exact k-NN and using NSW.
2. Using NSW, the processing time is reduced to 23.5% for Plarge and 35% for
Psmall of that of exact k-NN-DTW. The speedup is more significant for the
larger dataset.
10 Vivek Mahato and Pádraig Cunningham
3. NSW achieves accuracy comparable to that of exact k-NN. The drop in
accuracy using NSW with DTW measure is 2.7% on Plarge and 4.2% for
Psmall .
In summary, the speedup is significant – it seems reasonable to expect better
than a four-fold speedup on large datasets. However, the impact on accuracy
would probably not be acceptable in many situations.
6 Conclusions & Future Work
In this paper, we address the challenge of improving the run-time performance
of time-series classification using k-NN. Because effective distance measures for
time-series are computationally expensive, we explore the potential of NSW for
speeding up k-NN using DTW and TWED. TWED is included because it is a
proper metric, whereas DTW is not. Despite its lack of metric pedigree, DTW
performs best. NSW delivers significant speedup but at the cost of classification
accuracy.
So it seems that NSW does not allow us to avoid the curse of dimensionality.
We cannot speed up retrieval without paying the price in terms of accuracy.
In future work, we will explore measures to quantify the intrinsic dimension
of time-series data to see if speedup strategies might work for datasets that have
a lower intrinsic dimension. Furthermore, these methods need to be evaluated
in a realistic context where time-series containing pores would be very rare.
Acknowledgements
This publication has resulted from research supported in part by a grant from
Science Foundation Ireland (SFI) under Grant Number 16/RC/3872 and is co-
funded under the European Regional Development Fund.
References
1. Arya, S., Mount, D.M., Narayan, O.: Accounting for boundary effects in nearest-
neighbor searching. Discrete & Computational Geometry 16(2), 155–176 (1996)
2. Aumüller, M., Bernhardsson, E., Faithfull, A.: Ann-benchmarks: A benchmarking
tool for approximate nearest neighbor algorithms. In: International Conference on
Similarity Search and Applications. pp. 34–49. Springer (2017)
3. Bagnall, A., Bostrom, A., Large, J., Lines, J.: The great time series classification
bake off: An experimental evaluation of recently proposed algorithms. extended
version (2016)
4. Cunningham, P., Delany, S.J.: k-nearest neighbour classifiers 2nd edition (with
python examples). arXiv preprint arXiv:2004.04523 (2020)
5. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the
curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium
on Theory of computing. pp. 604–613 (1998)
NSW Speedup 11
6. Li, W., Zhang, Y., Sun, Y., Wang, W., Li, M., Zhang, W., Lin, X.: Approximate
nearest neighbor search on high dimensional data-experiments, analyses, and im-
provement. IEEE Transactions on Knowledge and Data Engineering (2019)
7. Liu, T., Moore, A.W., Yang, K., Gray, A.G.: An investigation of practical approx-
imate nearest neighbor algorithms. In: Advances in neural information processing
systems. pp. 825–832 (2005)
8. Mahato, V., Johnston, W., Cunningham, P.: Scoring performance on the y-balance
test. In: International Conference on Case-Based Reasoning. pp. 281–296. Springer
(2019)
9. Mahato, V., Obeidi, M.A., Brabazon, D., Cunningham, P.: An evaluation of classi-
fication methods for 3d printing time-series data. arXiv preprint arXiv:2005.09052
(2020)
10. Mahato, V., O’Reilly, M., Cunningham, P.: A comparison of k-nn methods for
time series classification and regression. In: 26th Irish Conference on Artificial
Intelligence and Cognitive Science (AICS) (2018)
11. Malkov, Y., Ponomarenko, A., Logvinov, A., Krylov, V.: Approximate nearest
neighbor algorithm based on navigable small world graphs. Information Systems
45, 61–68 (2014)
12. Marteau, P.F.: Time warp edit distance with stiffness adjustment for time series
matching. IEEE transactions on pattern analysis and machine intelligence 31(2),
306–318 (2008)
13. Ponomarenko, A., Malkov, Y., Logvinov, A., Krylov, V.: Approximate nearest
neighbor search small world approach. In: International Conference on Information
and Communication Technologies & Applications. vol. 17 (2011)
14. Tavenard, R., Faouzi, J., Vandewiele, G., Divo, F., Androz, G., Holtz, C., Payne,
M., Yurchak, R., Rußwurm, M., Kolar, K., et al.: Tslearn, a machine learning
toolkit for time series data. Journal of Machine Learning Research 21(118), 1–6
(2020)