<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Effective and Efficient Variable-Length Data Series Analytics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michele Linardi Supervised by: Themis Palpanas</string-name>
          <email>michele.linardi@parisdescartes.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LIPADE, Universit e ́ de Paris</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the last twenty years, data series similarity search has emerged as a fundamental operation at the core of several analysis tasks and applications related to data series collections. Many solutions to di↵erent mining problems work by means of similarity search. In this regard, all the proposed solutions require the prior knowledge of the series length on which similarity search is performed. In several cases, the choice of the length is critical and sensibly influences the quality of the expected outcome. Unfortunately, the obvious brute-force solution, which provides an outcome for all lengths within a given range is computationally untenable. In this Ph.D. work, we present the first solutions that inherently support scalable and variable-length similarity search in data series, applied to sequence/subsequences matching, motif and discord discovery problems. The experimental results show that our approaches are up to orders of magnitude faster than the alternatives. They also demonstrate that we can remove the unrealistic constraint of performing analytics using a predefined length, leading to more intuitive and actionable results, which would have otherwise been missed.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Data series (i.e., ordered sequences of points) are one
of the most common data types1, present in almost
every scientific and social domain (such as meteorology,
astronomy, chemistry, medicine, neuroscience, finance,
agriculture, entomology, sociology, smart cities, marketing,
operation health monitoring, human action recognition and
others) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        Once the data series have been collected, the domain
experts face the arduous tasks of processing and analyzing
them [
        <xref ref-type="bibr" rid="ref30 ref6">30, 6</xref>
        ] in order to gain insights, e.g., by identifying
similar patterns, and performing classification, or clustering. A
core operation that is part of all these analysis tasks is
similarity search, which has attracted lots of attention because
of its importance [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Nevertheless, all existing scalable and
index-based similarity search techniques are restricted in
that they only support queries of a fixed length, and they
1If the dimension that imposes the ordering of the sequence
is time then we talk about time series. Though, a series
can also be defined over other measures (e.g., angle in radial
profiles in astronomy, mass in mass spectroscopy in physics,
position in genome sequences in biology, etc.). We use the
terms data series, time series, and sequence interchangeably.
Proceedings of the VLDB 2019 PhD Workshop, August 26th, 2019. Los
Angeles, California. Copyright (C) 2019 for this paper by its authors.
Copying permitted for private and academic purposes.
require that this length is chosen at index construction [
        <xref ref-type="bibr" rid="ref1 ref10 ref11 ref21 ref24 ref25 ref26 ref28 ref29">10,
24, 1, 25, 28, 29, 26, 21, 11</xref>
        ]. The same observation holds
for techniques proposed to discover motifs [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and discords
(i.e., anomalous subsequences) [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]: they all assume a fixed
sequence length, which has to be predefined.
      </p>
      <p>
        Evidently, this is a constraint that penalizes the flexibility
needed by analysts, who often times need to analyze
patterns of slightly di↵erent lengths (within a given data series
collection) [
        <xref ref-type="bibr" rid="ref16 ref17 ref5 ref7 ref8">7, 8, 5, 17, 16</xref>
        ]. For example, in the
SENTINEL2 mission data, oceanographers are interested in searching
for similar coral bleaching patterns2 of di↵erent lengths; at
Airbus3 engineers need to perform similarity search queries
for patterns of variable length when studying aircraft
takeo↵s and landings [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]; and in neuroscience, analysts need to
search in Electroencephalogram (EEG) recordings for Cyclic
Alternating Patterns (CAP) of di↵erent lengths (duration),
in order to get insights about brain activity during sleep [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
      <p>In our work, we focus on three core problems that are
based on similarity search: subsequence matching, and
motif and discord discovery, organized under the ULISSE and
MAD methods:</p>
      <p>
        1. ULISSE (ULtra compact Index for variable-length
Similarity SEarch in data series) is the first indexing technique
that supports variable-length subsequence matching for non
Z-normalized and Z-normalized data series [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">15, 13, 14</xref>
        ].
      </p>
      <p>
        2. MAD (Motif and Discord discovery framework)
implements two novel algorithms for variable-length motif and
discord discovery in large data series [
        <xref ref-type="bibr" rid="ref16 ref17 ref4">17, 4, 16</xref>
        ].
2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>VARIABLE-LENGTH ANALYTICS</title>
      <p>
        In this section, we describe our proposed approaches to
the aforementioned problems. In the next part we describe
the notions and the elements used in our solutions.
Preliminaries. Let a data series D = d1,...,d|D| be a
sequence of numbers di 2 R, where i 2 N represents the
position in D. We denote the length, or size of the data series
D with |D|. The subsequence Ds,`=ds,...,ds+` 1 of length
`, is a contiguous subset of ` points of D starting at o↵set s,
where 1  s  | D| and 1  `  | D|. A subsequence is itself
a data series. A data series collection, C, is a set of data
series. We say that a data series D is Z-normalized, denoted
Dn, when its mean µ is 0 and its standard deviation is
1. Z-normalization is an essential operation in several
applications, because it allows similarity search irrespective of
shifting and scaling [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The Piecewise Aggregate
Approximation (PAA) of a data series D, P AA(D) = {p1, ..., pw},
2http://www.esa.int/Our_Activities/Observing_the_Earth
3http://www.airbus.com/
represents D in a w-dimensional space by means of w
realvalued segments of length s, where the value of each segment
is the mean of the corresponding values of D [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. We denote
the first k dimensions of P AA(D), (k  w), as P AA(D)1,..,k.
      </p>
      <p>
        The iSAX representation of a data series D, denoted by
iSAX(D, w, |alphabet|), is the representation of P AA(D) by
w discrete coecients, drawn from an alphabet of
cardinality |alphabet| [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. The main idea of the iSAX
representation, is that the real-value space may be segmented by
|alphabet| 1 breakpoints in |alphabet| regions, which are
labeled by discrete symbols (e.g., with |alphabet| = 4 the
available labels may be {00, 01, 10, 11}).
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Subsequence Matching</title>
      <p>The subsequence matching problem is defined as follows:
Given a data series collection C = {D1, ..., DC }, a
series length range [`min, `max], a query data series Q, where
`min  | Q|  `max, and k 2 N, we want to find the
set R = {Doi,`|Di 2 C ^ ` = |Q| ^ (` + o 1)  | Di|},
where |R| = k. We require that 8 Doi,` 2 R @Doi00,`0 s.t.
dist(Doi00,`0 , Q) &lt; dist(Doi,`, Q), where `0 = |Q|, (`0 +o0 1) 
|Di0 | and Di0 2 C. We informally call R, the k nearest
neighbors set of Q. Given two generic series of the same length,
namely D and D0 the function dist(D, D0) can be Euclidean
Distance or Dynamic Time Warping.</p>
      <p>Variable Length Subsequences. In a data series, when
we consider contiguous and overlapping subsequences of
different lengths within the range [`min, `max],we expect the
outcome as a bunch of similar series, whose di↵erences are
a↵ected by the misalignment and the di↵erent number of
points. Given a data series D, and a subsequence length
range [`min, `max], we define the master series as the
subsequences of the form Di,min(|D| i+1,`max), for each i such
that 1  i  | D| (`min 1), where 1  `min  `max  | D|.</p>
      <p>We observe that for any master series of the form Di,`0 , we
have that P AA(Di,`0 )1,..,k = P AA(Di,`00 )1,..,k holds for each
`00 such that `00 `min, `00  `0  `max and `0, `00%k = 0.</p>
      <p>Therefore, by computing only the P AA of the master
series in D, we are able to represent the P AA prefix of any
subsequence of D. When we zero-align the P AA summaries of
the master series, we compute the minimum and maximum
P AA values (over all the subsequences) for each segment:
this forms what we call an Envelope. (When the length of a
master series is not a multiple of the P AA segment length,
we compute the P AA coecients of the longest prefix that is
multiple of a segment.) We call containment area the space
in between the segments that define the Envelope.
PAA Envelope. We formalize the concept of the Envelope,
introducing a new series representation. We denote by L and
U the P AA coecients, which delimit the lower and upper
parts, respectively, of a containment area. Furthermore, we
introduce a parameter , which permits to select the number
of master series we represent by the Envelope. We refer to it
using the following signature: paaEN V[D,`min,`max,a,,s ] =
[L, U ]. It delimits the containment area generated by the
P AA coecients of the master series.</p>
      <p>Indexing the Envelopes. Given a paaEN V , we
can translate its P AA extremes into the
corresponding iSAX representation: uEN VpaaENV[D,`min,`max,a,,s ] =
[iSAX(L), iSAX(U )], where iSAX(L) (iSAX(U )) is the
vector of the minimum (maximum) P AA coecients of all
the segments corresponding to the subsequences of D. The
iTem )sce200</p>
      <p>S
ryeu I(/O150
tcQ isk 100
a d
xvgEA +PCU 500
(a)
(b)
query answering disk i\o
query answering cpu</p>
      <p>Indexing (disk i\o + cpu time)
γ = (% of (lmax - lmin))</p>
      <p>Envelope uEN V represents the principal building block of
the ULISSE Index. In details, ULISSE is a tree structure,
where each internal node stores the Envelope uEN V
representing all the sequences in the subtree rooted at that node.
Leaf nodes contain several Envelopes, which by construction
have the same iSAX(L). On the contrary, their iSAX(U )
varies, since it get updated with every new insertion in the
node. Each Envelope in leaf nodes point the the represented
sequences in the original data series collection.</p>
      <p>Approximate Subsequence Matching. Subsequence
matching performed on ULISSE index relies on the
mindistULiSSE() lower bounding function to prune the
search space. This allows to navigate the tree in order,
visiting first the most promising nodes. As soon as a leaf node
is discovered, we can load the raw data series pointed by
the Envelopes in the leaf. Each time we compute the true
Euclidean or DTW distance between the series in a leaf, the
best-so-far distance (bsf) is updated, along with a vector
containing the k best matches, where k refers to the k
nearest neighbors. Since priority is given to the most promising
nodes, we can terminate our visit, when at the end of a leaf
visit the k bsf’s have not improved.</p>
      <p>Exact Subsequence Matching. Note that the
approximate search described above may not visit leaves that
contain answers better than the approximate answers already
identified, and therefore, it will fail to produce exact, correct
results.</p>
      <p>The exact nearest neighbor search algorithm we propose
finds the k sequences with the absolute smallest distances
to the query. In this case, the search algorithm may visit
several leaves: the process stops after it has either visited,
or pruned (when the lower bounding distance to the node is
greater than the bsf) all the nodes of the index, guaranteeing
the correctness of the results.</p>
      <p>
        Experiments. To evaluate ULISSE, we used synthetic and
real data (but in the interest of space we only report results
with the synthetic data). We record the average CPU time,
disk I/O (time to fetch data from disk (Total time - CPU
time)), for 100 queries, extracted from the datasets with the
addition of Gaussian noise. We compare ULISSE with UCR
suite [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] the non index-based state-of-the-art technique for
answering similarity search queries. Concerning the
competitor indexing techniques, the state-of-the-art is the
Compact Multi Resolution Index [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] CMRI.
      </p>
      <p>In Figure 1, we present results for subsequence matching
queries on ULISSE when we vary , ranging from 0 to its
maximum value in this dataset, i.e., `max `min. In
Figure 1, we report the results concerning non Z-normalized
series. We observe that grouping contiguous and overlapping
subsequences under the same summarization (Envelope) by
increasing , a↵ects positively the performance of index
construction, as well as query answering.</p>
      <p>The latter may seem counterintuitive, since inserting more
master series into a single Envelope is likely to generate large
containment areas, which are not tight representations of the
data series. On the other hand, it leads to an overall number
of Envelope that is several orders of magnitude smaller than
the one for = 0%, where only a single master series is
represented by each Envelope.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Motif and Discord Discovery</title>
      <p>Motif and Discord are data mining primitives that
represent frequent and rare (anomalous) patterns, respectively.
Given a data series D, they are defined as follows:
• Data series motif: Da,` and Db,` is a motif pair i↵
dist(Da,`, Db,`)  dist(Di,`, Dj,`) 8 i, j 2 [1, 2, ..., |D|
` + 1], where a 6= b and i 6= j, and dist is a function
that computes the z-normalized Euclidean distance
between the input subsequences.
• Data series discord: We call the k subsequences of
D, with the k largest distances to their mth Nearest
Neighbor (according the Euclidean distance), the
Topk mth discords.</p>
      <p>Variable length motif and discord discovery. We
provide solutions to the following problems:
• Variable-Length Motif Discovery: Given a data series
D and a subsequence length-range [`min, ..., `max], we
want to find the data series motif pairs of all lengths
in [`min, ..., `max], occurring in D.
• Variable-Length Top-k mth Discord Discovery: Given
a data series D, a subsequence length-range
[`min, ..., `max] and the parameters a, b 2 N+ we want
to enumerate the Top-k mth discords for each k 2
{1, .., a} and each m 2 { 1, .., b}, and for all lengths
in [`min, ..., `max], occurring in D.</p>
      <p>
        Fixed length motif and discord discovery. The
stateof-the art algorithm for fixed length motif and discord
discovery [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] requires the user to define the length of the
desired motif or discord. This mining operation is supported
by computation of the Matrix profile, which is a meta data
series storing the z-normalized Euclidean distance between
each subsequence and its nearest neighbor. The Matrix
profile does not only derive the motif, but also ranks and filters
out the other pairs, giving also a convenient and graphical
representation of their occurrences and proximity.
Unfortunately, this technique comes with an important shortcoming:
it does not provide an e↵ective solution for trying several
different motif lengths. Therefore, the analyst is forced to run
the algorithm using all possible lengths in a range of interest,
and rank the various motifs discovered, picking eventually
the patterns that contain the desired insight. Clearly, this
possibility is not optimal for at least two reasons: the
scalability, since finding motif of one fixed length takes O(|D|2)
time, and also because it does not provide an e↵ective way
to compare motifs of di↵erent lengths.
      </p>
      <p>MAD Framework. Our framework for Variable Length
Motif and Discord Discovery (MAD) works by applying an
incremental computing strategy, which aims to prune
unnecessary distance computations for larger motif and discord
lengths. Hence, given a data series D, we compute the
Matrix profile using the smallest subsequence length, namely
`min, within a specified input range [`min, `max]. The key
idea of our approach is to minimize the work that needs
to be done for succeeding subsequence lengths (`min + 1,
`min + 2, . . ., `max).</p>
      <p>
        Matrix Profile Computation. We start the computation
of the Matrix profile, considering all the contiguous
subsequences of length `min, computing for each one the Distance
profile in O(|D|) time. This latter is a vector that contains
the z-normalized Euclidean distances between a fixed
subsequence and all the other in D (excluding trivial matches).
Lower Bound Subsequences of Di↵erent Length. We
moreover introduce a new lower bounding distance [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ],
which lower bounds (is always smaller than) the true
Euclidean distances between subsequences longer than `min.
We initially compute this lower bound using the true
Euclidean distances computation of subsequences with length
`min. For the larger lengths, we update the lower bound,
considering only the variation generated by the trailing
points in the longer subsequences. This measure enjoys an
important property: if we rank the subsequences according
to this measure, the same rank will be preserved along all the
lower bound updates for the subsequences of greater length.
We exploit this property, in order to prune computations.
Pruning the Search Space. Once we compute motif and
discords, with length greater than `min, instead of
computing from scratch each distance profile, we update the true
distances (in constant time) of the subsequences that have
the p smallest lower bounding distances (computed in the
previous step). These distances form what we call partial
distance profile. In each partial distance profile, we also
update the lower bound. After this operation, we may have two
cases: if in a new computed distance profile the minimum
true distance (minDist ) is shorter than the maximum lower
bound (maxLB ), we know that no distances among those
not computed can be smaller than minDist. In this case, a
partial distance profile becomes a valid distance profile. On
the other hand, when maxLB is smaller than minDist, this
latter is not guaranteed to be the nearest neighbor distance.
For discord discovery, we need to test this condition for the
m smallest true distances in the partial distance profile. In
this case a valid (partial) distance profile must contain the
true mth best match distances, which are smaller than, or
equal to maxLB.
      </p>
      <p>Exact Motif and Discord Discovery. Once the partial
distance profiles are computed, we pick the absolute smallest
lower bounding value from all the non-valid distance profiles,
namely minLBAbs (if any). Therefore, the global minimum
(true) distance of all the valid (partial) distance profiles,
which is smaller than minLBAbs is guaranteed to be the
distance between the motif pair subsequences.
Symmetrically, we consider the valid (partial) distance profiles to find
the true mth best match distances, which are the greatest
nearest neighbor distances that are larger than maxLBAbs.
This latter is the largest lower bounding distance of the
nonvalid distance profiles.</p>
      <p>In the motif discovery task, if no nearest neighbor
distance is smaller than minLBAbs, we recompute only the
distance profiles that have the maxLB distance smaller than
the smallest true distance computed.</p>
      <p>On the other hand, for discord discovery, if no true
nearest neighbor distances are found we need to iterate the non
100 150 200 400 600</p>
      <p>Subsequence length range
100</p>
      <p>MAD (Discord discovery)</p>
      <p>GrammarViz</p>
      <p>Time out after 48h
60
50
40
30
20
10
0 1 3 5 7 10 1 3 5 7 10 1 3 5 7 10 1 3 5 7 10
(b) EMG ASTRO
k (Top-k 1st discords)</p>
      <p>Dataset
0 1 3 5 7 10 1 3 5 7 10 1 3 5 7 10 1 3 5 7 10
(a) EMG ASTRO
m (Top-1 mth discords)</p>
      <p>Dataset
valid (partial) distance profiles, which contain the maxLB
distance greater than the largest mth best match distance.</p>
      <p>We keep extracting in this manner the motif and the
discord subsequences of each length, until `max.</p>
      <p>
        Motif Discovery Experimental Evaluation. To
benchmark the MAD framework, we used several di↵erent
real datasets. Concerning the motif discovery problem,
the competitors we considered are: QUICKMOTIF [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ],
STOMP [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and MOEN [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. We report in Figure 2 a
sample of the experiments we conducted (detailed
experimental results on several datasets are reported elsewhere [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]).
Here, we show the results of MAD, which finds motifs in
di↵erent real datasets. In the plots, we report the total
execution time varying motif length ranges. From this
experiment, we observe that VALMOD maintains a good and
stable performance across datasets and parameter settings,
quickly producing results, even in cases where the
competitors do not terminate within a reasonable amount of time.
Discord Discovery Experimental Evaluation. We
identify two state-of-the-art competitors to compare to our
approach, the Motif And Discord (MAD) framework. The
first one, DAD (Disk aware discord discovery) [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ],
implements an algorithm suitable to enumerate the fixed-length
T op 1 mth discords. The second approach,
GrammarViz [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], is the most recent technique, which discovers Top-k
1st discords. In Figure 3.(a) we report the results of T op 1
mth discord discovery, varying m. We note that MAD
gracefully scales over the number of discords to enumerate and
is up to one order of magnitude faster than DAD. In
Figure 3.(b), we show the result of T op k 1st discords
discovery. Once again, MAD scales better over the number
of discovered discords, as its execution time remains almost
constant. A di↵erent trend is observed for GrammarViz,
whose performance significantly deteriorates as k increases.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3. CONCLUSIONS</title>
      <p>Even though much e↵ort has been dedicated for
developping techniques for data series analytics, existing solutions
for subsequence matching, motif and discord discovery are
limited to fixed length queries/results. In this Ph.D. work,
we propose the first scalable solutions to the variable-length
version of these problems: ULISSE is the first index that
supports variable-length subsequence matching over both</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Camerra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shieh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Keogh</surname>
          </string-name>
          .
          <source>isax 2</source>
          .
          <article-title>0: Indexing and mining one billion time series</article-title>
          .
          <source>In ICDM</source>
          <year>2010</year>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Echihabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zoumpatianos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Benbrahim</surname>
          </string-name>
          .
          <article-title>The lernaean hydra of data series similarity search: An experimental evaluation of the state of the art</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>12</volume>
          (
          <issue>2</issue>
          ),
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>C. M. Y</surname>
          </string-name>
          . et al.
          <article-title>Matrix profile I: all pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets</article-title>
          . In ICDM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>M. L.</surname>
          </string-name>
          et al.
          <article-title>Matrix profile goes mad: Variable-length motif and discord discovery in data series</article-title>
          .
          <source>In Under Submission</source>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>T. R.</surname>
          </string-name>
          et al.
          <article-title>Searching and mining trillions of time series subsequences under dynamic time warping</article-title>
          .
          <source>In SIGKDD</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gogolou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tsandilas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Bezerianos</surname>
          </string-name>
          .
          <article-title>Progressive similarity search on time series data</article-title>
          . In BigVis, in conjunction with EDBT/ICDT,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kadiyala</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Shiri</surname>
          </string-name>
          .
          <article-title>A compact multi-resolution index for variable length queries in time series databases</article-title>
          .
          <source>KAIS</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kahveci</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          .
          <article-title>Variable length queries for time series data</article-title>
          .
          <source>In ICDEF</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Keogh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pazzani</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Mehrotra</surname>
          </string-name>
          .
          <article-title>Dimensionality reduction for fast similarity search in large time series databases</article-title>
          .
          <source>KAIS</source>
          ,
          <volume>3</volume>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Keogh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. B.</given-names>
            <surname>Zordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gunopulos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Cardle</surname>
          </string-name>
          .
          <article-title>Indexing large human-motion databases</article-title>
          .
          <source>In VLDB</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kondylakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dayan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zoumpatianos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          .
          <article-title>Coconut: A scalable bottom-up approach for building data series indexes</article-title>
          .
          <source>In PVLDB</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. H. U</given-names>
            ,
            <surname>M. L. Yiu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gong</surname>
          </string-name>
          .
          <article-title>Quick-motif: An ecient and scalable framework for exact motif discovery</article-title>
          ICDE,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Linardi</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          .
          <article-title>Scalable data series subsequence matching with ulisse</article-title>
          .
          <source>In Under Submission</source>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Linardi</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          . ULISSE:
          <article-title>ULtra compact Index for Variable-Length Similarity SEarch in Data Series</article-title>
          .
          <source>In ICDE</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Linardi</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          . Scalable, variable
          <article-title>-length similarity search in data series: The ULISSE approach</article-title>
          . PVLDB,
          <volume>11</volume>
          (
          <issue>13</issue>
          ):
          <fpage>2236</fpage>
          -
          <lpage>2248</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Linardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Keogh</surname>
          </string-name>
          . VALMOD:
          <article-title>A suite for easy and exact detection of variable length motifs in data series</article-title>
          .
          <source>In SIGMOD Conference</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Linardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Keogh</surname>
          </string-name>
          .
          <article-title>Matrix profile X: Valmod - scalable discovery of variable-length motifs in data series</article-title>
          .
          <source>In SIGMOD</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mueen</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Chavoshi</surname>
          </string-name>
          .
          <article-title>Enumeration of time series motifs of all lengths</article-title>
          .
          <source>Knowl. Inf. Syst.</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>A. G. H.</surname>
          </string-name>
          of Operational Intelligence Department Airbus. Personal communication.,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          .
          <article-title>Data series management: The road to big sequence analytics</article-title>
          .
          <source>SIGMOD Rec</source>
          .,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>B.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fatourou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          . Paris:
          <article-title>The next destination for fast data series indexing and query answering</article-title>
          .
          <source>In IEEE Big Data</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Parrino</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Terzano</surname>
          </string-name>
          .
          <article-title>Automatic detection of cyclic alternating pattern (cap) sequences in sleep: preliminary results</article-title>
          .
          <source>Clinical Neurophysiology</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>P.</given-names>
            <surname>Senin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Oates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gandhi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Boedihardjo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Frankenstein</surname>
          </string-name>
          .
          <article-title>Time series anomaly discovery with grammar-based compression</article-title>
          .
          <source>In EDBT</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Shieh</surname>
          </string-name>
          and
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Keogh</surname>
          </string-name>
          .
          <article-title>isax: indexing and mining terabyte sized time series</article-title>
          .
          <source>In KDD</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>A dataadaptive and dynamic segmentation index for whole matching on time series</article-title>
          . PVLDB,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Yagoubi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Akbarinia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Masseglia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          . Dpisax:
          <article-title>Massively distributed partitioned isax</article-title>
          .
          <source>In ICDM</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>D.</given-names>
            <surname>Yankov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Keogh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Rebbapragada</surname>
          </string-name>
          .
          <article-title>Disk aware discord discovery: finding unusual time series in terabyte sized datasets</article-title>
          .
          <source>Knowl. Inf. Syst.</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zoumpatianos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Idreos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          . RINSE:
          <article-title>interactive data series exploration with ADS+</article-title>
          .
          <source>PVLDB</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zoumpatianos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Idreos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          .
          <article-title>ADS: the adaptive data series index</article-title>
          . VLDB J.,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zoumpatianos</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Palpanas</surname>
          </string-name>
          .
          <article-title>Data series management: Fulfilling the need for big sequence analytics</article-title>
          .
          <source>In ICDE</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>