<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Temporal and Frequential Metric Learning for Time Series kNN Classication</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cao-Tri Do</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ahlame Douzal-Chouakria</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sylvain MariØ</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>MichŁle Rombaut</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Schneider Electric</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>GIPSA-Lab, University of Grenoble Alpes</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LIG, University of Grenoble Alpes</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <abstract>
        <p>This work proposes a temporal and frequential metric learning framework for a time series nearest neighbor classication. For that, time series are embedded into a pairwise space where a combination function is learned based on a maximum margin optimization process. A wide range of experiments are conducted to evaluate the ability of the learned metric on time series kNN classication.</p>
      </abstract>
      <kwd-group>
        <kwd>Metric learning</kwd>
        <kwd>Time series</kwd>
        <kwd>kNN</kwd>
        <kwd>Classication</kwd>
        <kwd>Spectral metrics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Due to their temporal and frequential nature, time series constitute complex
data to analyze by standard machine learning approaches [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In order to
classify such challenging data, distance features must be used to bring closer time
series of identical classes and separate those of dierent classes. Temporal data
may be compared on their values. The most frequently used value-based
metrics are the Euclidean distance and the Dynamic Time Warping dtw to cope
with delays [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. They can also be compared on their dynamics and frequential
characteristics [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. Promising approaches aims to learn the Mahalanobis
distance or kernel function for a specic classier [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. Other work investigate the
representation paradigm by representating objects in a dissimilarity space where
are investigated dissimilarity combinations and metric learning [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. The idea
in this paper is to combine basic metrics into a discriminative one for a kNN
classier. In the metric learning context for a metric learning approach driven
by nearest neighbors (Weinberger &amp; Saul [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]), we extend the work of Do &amp; al.
in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to temporal and frequential characteristics. The main idea is to embed
pairs of time series in a space whose dimensions are basic temporal and
frequential metrics, where a combination function is learned based on a large margin
optimization process.
      </p>
      <p>The main contributions of the paper are a) propose a new temporal and
frequential metric learning framework for a time series nearest neighbors
classication, b) learn a combination metric involving amplitude, behavior and
frequential characteristics and c) conduct large experimentations to study the ability of
learned metric. The rest of the paper is organized as follows. Section 2 recalls
briey the major metrics for time series. In Section 3, we present the proposed
Copyright ⃝c2015 for this paper by its authors. Copying permitted for private and academic
purposes.
metric learning approach. Finally, Section 4 presents the experiments conducted
and discusses the results obtained.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Time series metrics</title>
      <p>
        Let xi = (xi1; :::; xiT ) and xj = (xj1; :::; xjT ) be two time series of time length
T . Time series metrics fall at least within three main categories. The rst one
concerns value-based metrics, where time series are compared according to their
values regardless of their behaviors. Among these metrics are the Euclidean
distance (dE ), the Minkowski distance and the Mahalanobis distance [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We
recall the formula of dE :
      </p>
      <p>vu T
dE (xi; xj ) = tu∑(xit − xjt)2</p>
      <p>t=1</p>
      <p>
        The second category relies on metrics in the spectral representations. In some
applications, time series may be similar because they share the same frequency
characteristics. For that, time series xi are rst transformed into their Fourier
representation x~i = [x~i1; :::; x~iF ], where x~if is a complex number (i.e. Fourier
components), with F = 22T + 1 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Then, one may use the Euclidean distance
(dF F T ) between the module of the complex numbers x~if , noted |x~if |:
vu F
dF F T (xi; xj ) = tu∑(|x~if | − |x~jf |)2
f=1
Note that times series of similar frequential characteristics may have distinctive
global behavior. Thus, to compare time series based on their behavior, a third
category of metrics is used. Many applications refer to the Pearson correlation
or its generalization, the temporal correlation coecient [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] dened as:
Cortr(xi; xj ) =
∑(xit − xit′ )(xjt − xjt′ )
t;t′
√∑(xit − xit′ )2√∑(xjt − xjt′ )2
t;t′ t;t′
(1)
(2)
(3)
where |t − t′| ≤ r, r ∈ [1; :::; T − 1] being a parameter that can be learned or xed
a priori. The optimal value of r is noisy dependant. For r = T − 1, Eq. 3 leads
to the Pearson correlation. As Cortr is a similarity measure, it is transformed
into a dissimilarity measure: dCortr (xi; xj ) = 12 (1 − Cortr(xi; xj )).
3
      </p>
      <p>
        Temporal and frequential metric learning for a large
margin k NN
Let X = {xi; yi}iN=1 be a set of N static vector samples, xi ∈ Rp, p being the
number of descriptive features and yi the class labels. Weinberger &amp; Saul
proposed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] an approach to learn a dissimilarity metric D for a large margin
kNN. It is based on two intuitions: rst, each training sample xi should have the
same label yi as its k nearest neighbors; second, training samples with dierent
labels should be widely separated. For this, they introduced the concept of target
for each training sample xi. Target neighbors of xi, noted j i, are the k closest
xj of the same class (yj = yi). The target neighborhood is dened with respect
to an initial metric. The aim is to learn a metric D that pulls the targets and
pushes the ones of dierent class.
      </p>
      <p>
        Let d1; :::; dh:::; dp be p given dissimilarity metrics that allow to compare
samples. The computation of a metric always takes into account a pair of samples.
Therefore, we used the pairwise representation introduced in Do &amp; al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In
this space, a vector xij represents a pair of samples (xi; xj ) described by the p
basics metrics dh: xij = [d1(xi; xj ); :::; dp(xi; xj )]T . If xij = 0 then xj is
identical to xi according to all metrics dh. A combination function D of the metrics
dh can be seen as a function in this space. We propose in the following to use a
linear combination of dh: Dw(xi; xj ) = ∑h wh:dh(xi; xj ). Its pairwise notation
is Dw(xij ) = wT :xij . To ensure that Dw is a valid metric, we set wh ≥ 0 for all
h = 1:::p. The main steps of the proposed approach to learn the metric, detailed
hereafter, can be summarized as follows:
      </p>
    </sec>
    <sec id="sec-3">
      <title>1. Embed each pair (xi; xj ) into the pairwise space Rp.</title>
      <p>2. Scale the data within the pairwise space.
3. Dene for each xi its targets.
4. Scale the neighborhood of each xi.
5. Learn the combined metric Dw.</p>
      <p>Data scaling. This operation is performed to scale the data within the pairwise
space and ensure comparable ranges for the p basic metrics dh. In our experiment,
we use dissimilarity measures with values in [0; +∞[. Therefore, we propose to
Z-normalize their log distributions.</p>
      <p>Target set. For each xi, we dene its target neighbors as the k nearest neighbors
xj (j i) of the same class according to an initial metric. In this paper, we
choose a L2-norm of the pairwise space as the initial metric (√∑h d2h). Other
metrics could be chosen. We emphasize that target neighbors are xed a priori
(at the rst step) and do not change during the learning process.</p>
      <p>Neighborhood scaling . In real datasets, local neighborhoods can have very
different scales. To make the target neighborhood spreads comparable, we propose
for each xi to scale its neighborhood vectors xij such that the L2-norm of the
farthest target is 1.</p>
      <p>N
Learning the combined metric Dw. Let {xij ; yij }i;j=1 be the training set with
yij = −1 if yj = yi and +1 otherwise. Learning Dw for a large margin kNN
classier can be formalized as the following optimization problem:
{z
pull
i; yl ̸= yi;</p>
      <p>} |
Dw(xil) − Dw(xij) ≥ 1 − ijl
ijl ≥ 0
wh &gt; 0 ∀h = 1:::p
Note that the "pull" term ∑Dw(xij ) = ∑wT :xij = N:k:wT :xij is a
L1j i j i
Mahalanobis norm weighted by the average target sample. Therefore, it behaves
like a L1-norm in the optimization problem. The problem is very similar to a
C-SVM classication problem. When C is innite, we have a "strict" problem:
the solver will try to nd a direction in the pairwise space for which only targets
are in the close neighborhood of each xi, and a maximum margin 1 .
||w||2
Let xtest be a new sample to classify and xtest;i (i = 1; :::; N ) the corresponding
vectors into the pairwise embedding space. After xtest;i normalization according
to the Data Scaling step, xtest is classied based on a standard kNN and Dw.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>
        In this section, we compare kNN classier performances for several metrics on
reference time series datasets [1114] described in Table 1. To compare with the
reference results in [
        <xref ref-type="bibr" rid="ref11 ref3">3, 11</xref>
        ], the experiments are conducted with the same
protocols as in Do &amp;. al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]: k is set to 1; train and test set are given a priori. Due to
the current format to store the data, small datasets with short time series were
retained and the experiments are conducted on one runtime.
      </p>
      <p>
        In this experimentation, we consider basic metrics dE , dF F T and dCortr then,
we learn a combined metric Dw according to the procedure described in Section 3.
First, two basic temporal metrics are considered in D2 (dE and dCortr ) as in Do
&amp; al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Second, we consider a combination between temporal and frequential
metrics in D3 (dE , dCortr and dF F T ). Cplex library [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] has been used to solve
the optimization problem in Eq. 4. We learn the optimal parameter values of
these metrics by minimizing a leave-one out cross-validation criterion.
As the training dataset sizes are small, we propose a hierarchical error criterion:
1. Minimize the kNN error rate
2. Minimize ddiinnttrear if several parameter values obtain the minimum kNN error.
where dintra and dinter stands respectively to the mean of all intraclass and
interclass distances according to the metric at hand. Table 2 gives the range of
the grid search considered for the parameters. In the following, we consider only
the raw series and don’t align them using a dtw algorithm for example. For
all reported results (Table 3), the best one is indexed with a star and the ones
signicantly similar from the best one (Z-test at 1% risk) are in bold [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        From Table 3, we can see that temporal metrics dE and dCortr alone
performs better one from the other depending on the dataset. Using a frequential
metric alone such as dF F T brings signicant improvements for some datasets
(SonyAIBO, GunPoint, PowerCons, ECG5Days). It can be observed that one
basic metric is sucient on some databases (MoteStrain, GunPoint, PowerCons,
ECG5Days). In other cases, learning a combination of these basic metrics reach
the same performances on most datasets or achieve better results (UMD). The
new approach allows to extend combination functions to many metrics
without having to cope with additional parameters in grid search and without to
test every basic metrics alone to retained the best one. It also extends the
work done in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for single distance to multiple distances. Adding metrics such
as dF F T improves the performances on some datasets (SonyAIBO, GunPoint,
UMD, FaceFour, Lighting-2, Lighting-7) than considering only temporal metrics
(dE , dCortr ). However, it does not always improve the results (GunPoint,
PowerCons, ECG5Days). This might be caused by the fact that our framework is
sensitive to the choice of the initial metric (L2-norm) or maybe, some steps in
the algorithm should be improved to make the combination better.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>For nearest neighbor time series classication, we propose to learn a metric as a
combination of temporal and frequential metrics based on a large margin
optimization process. The learned metric shows good performances on the conducted
experimentations. For future work, we are looking for some improvements. First,
the choice of the initial metric is crucial. It has been set here as the L2-norm
of the pairwise space but a dierent metric could provide better target sets.
Otherwise, using an iterative procedure (reusing Dw to generate new target sets
and learn Dw again) could be another solution. Second, we note that the
L1norm on the "pull" term leads to sparcity. Changing it into a L2-norm could
allow for non-sparse solutions and also extend the approach to non-linear metric
combination functions thanks to the Kernel trick. Finally, we could extend this
framework to multivariate, regression or clustering problems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>T.C.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <article-title>A review on time series data mining</article-title>
          ,
          <source>Engineering Applications of Articial Intelligence</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>H.</given-names>
            <surname>Sakoe</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chiba</surname>
          </string-name>
          ,
          <article-title>Dynamic Programming Algorithm Optimization for Spoken Word Recognition, IEEE transactions on acoustics, speech</article-title>
          , and signal processing,
          <year>1978</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>H.</given-names>
            <surname>Ding</surname>
          </string-name>
          , G. Trajcevski, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Scheuermann</surname>
          </string-name>
          ,
          <article-title>Querying and Mining of Time Series Data : Experimental Comparison of Representations and Distance Measures</article-title>
          , in
          <string-name>
            <surname>VLDB</surname>
          </string-name>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Douzal-Chouakria</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Amblard</surname>
          </string-name>
          ,
          <article-title>Classication trees for time series</article-title>
          ,
          <source>Pattern Recognition journal</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>S.</given-names>
            <surname>Lhermitte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Verbesselt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.W.</given-names>
            <surname>Verstraeten</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Coppin</surname>
          </string-name>
          ,
          <article-title>A comparison of time series similarity measures for classication and change detection of ecosystem dynamics</article-title>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>K.</given-names>
            <surname>Weinberger</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Saul</surname>
          </string-name>
          ,
          <article-title>Distance Metric Learning for Large Margin Nearest Neighbor Classication</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Gnen</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Alpaydin</surname>
          </string-name>
          ,
          <article-title>Multiple kernel learning algorithms</article-title>
          ,
          <source>The Journal of Machine Learning Research</source>
          , vol.
          <volume>12</volume>
          , pp.
          <fpage>22112268</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>R.</given-names>
            <surname>Duin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bicego</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Orozco-alzate, S. Kim, and M. Loog, Metric Learning in Dissimilarity Space for Improved Nearest Neighbor Performance</article-title>
          , pp.
          <fpage>183192</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>A.</given-names>
            <surname>Ibba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Duin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A study on combining sets of dierently measured dissimilarities</article-title>
          ,
          <source>in Proceedings - International Conference on Pattern Recognition</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>33603363</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>C. Do</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Douzal-Chouakria</surname>
            , S. MariØ, and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Rombaut</surname>
          </string-name>
          ,
          <article-title>Multiple Metric Learning for large margin kNN Classication of time series</article-title>
          , EUSIPCO,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. E.
          <string-name>
            <surname>Keogh</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Hao</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Xi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Wei</surname>
          </string-name>
          , and
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>A. Ratanamahatana, The UCR Time Series Classication/Clustering Homepage (www</article-title>
          .cs.ucr.edu/eamonn/time_series_data/),
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>K.</given-names>
            <surname>Bache</surname>
          </string-name>
          and
          <string-name>
            <surname>M. Lichman,</surname>
          </string-name>
          <article-title>UCI Machine Learning Repository (http://archive</article-title>
          .ics.uci.edu/ml),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>LIG-AMA Machine Learning Datasets Repository</surname>
          </string-name>
          (http://ama.liglab.fr/resourcestools/datasets/),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>C. Frambourg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Douzal-Chouakria</surname>
            ,
            <given-names>and E.</given-names>
          </string-name>
          <string-name>
            <surname>Gaussier</surname>
          </string-name>
          ,
          <article-title>Learning Multiple Temporal Matching for Time Series Classication, Advances in Intelligent Data Analysis XII,</article-title>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>15. IBM Cplex (http://www-01.ibm.com/software/commerce/optimization/cplexoptimizer/), .</mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. T. Dietterich,
          <article-title>Approximate Statistical Tests for Comparing Supervised Classication Learning Algorithms</article-title>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>