Unsupervised Subsequence Anomaly Detection
                          in Large Sequences

                                              Paul Boniol
                  Supervised by: Themis Palpanas, Mohammed Meftah, Emmanuel Remy
                                                   EDF R&D, LIPADE, Université de Paris
                                                           paul.boniol@edf.fr


ABSTRACT                                                                        Anomaly detection is a well studied task [2, 13, 16, 8]
Subsequence anomaly detection in long sequences is an im-                    that can be tackled by either examining single values, or
portant problem with applications in a wide range of do-                     sequences of points. In the specific context of sequences,
mains. However, the approaches that have been proposed                       which is the focus of this paper, we are interested in iden-
so far in the literature have severe limitations: they either                tifying anomalous subsequences [16, 12, 8], which are not
require prior domain knowledge that is used to design the                    single abnormal values, but rather an abnormal sequence of
anomaly discovery algorithms, or become cumbersome and                       values. In real-world applications, this distinction becomes
expensive to use in situations with recurrent anomalies of                   crucial: in certain cases, even though every individual point
the same type. In this Ph.D. work, we address these prob-                    may be normal, the trend exhibited by the sequence of these
lems, and present unsupervised methods suitable for domain                   same values may be anomalous. Evidently, failing to iden-
agnostic subsequence anomaly detection. We explore two                       tify such situations could lead to severe problems that are
possible way to represent the normal behavior of a long                      only detected when it is too late.
data series that lead to fast and accurate identification of                    Some existing techniques explicitly look for a set of pre-
abnormal subsequences. These normal representations are                      determined types of anomalies [1]. These are techniques that
either based on subsequences (using a data structure called                  have been specifically designed to operate in a particular
the normal model), or on graphs, by taking advantage of                      setting, they require domain expertise, and cannot general-
graph properties to encode the normal transitions between                    ize. Other techniques identify as anomalies the subsequences
neighboring subsequences of a long series. The experimental                  with the largest distances to their nearest neighbors (termed
results, on a large set of synthetic and real datasets, demon-               discords) [16, 12]. The assumption is that the most distant
strate that the proposed approaches correctly identify single                subsequence is completely isolated from the ”normal” sub-
and recurrent anomalies of various types, without any prior                  sequences.
knowledge of the characteristics of these anomalies. Our                        However, this definition fails in the case where an anomaly
approaches outperform by a large margin several competing                    repeats itself (approximately the same). In this situation,
approaches in accuracy, while being up to orders of magni-                   anomalies will have other anomalies as close neighbors, and
tude faster.                                                                 will not be identified as discords. In order to remedy this
                                                                             situation, the mth discord approach has been proposed [15],
                                                                             which takes into account the multiplicity m of the anomalous
1.    INTRODUCTION                                                           subsequences that are similar to one another, and marks as
  Time series1 anomaly detection is a crucial problem with                   anomalies all the subsequences in the same group. However,
application in a wide range of domains. Examples of such                     this approach assumes that we know the cardinality of the
applications can be found in manufacturing, astronomy, en-                   anomalies, which is not true in practice (otherwise, we need
gineering, and other domains [10, 11]. This implies a real                   to try several different m values, increasing drastically the
need by relevant applications for developing methods that                    execution time). Furthermore, the majority of the previous
can accurately and efficiently achieve this goal.                            approaches require prior knowledge of the anomaly length,
1
                                                                             and their performance deteriorates significantly when the
  A time series, or data series, or sequence, is an ordered se-              correct length value is not used.
quence of real-valued points. If the dimension that imposes                     Our work addresses the aforementioned problems. We
the ordering is time then we talk about time series, but
it could also be mass, angle, position, etc. We will use the                 proposed two approaches aiming to build a normal represen-
terms time series, data series, and sequence interchangeably.                tation of the data series, which enables to identify anomalous
                                                                             subsequences. The two approaches use different data struc-
                                                                             tures to represent the normal behavior of the sequences; they
                                                                             are summarized below:

                                                                                • NormA [3, 4], a normal-model based subsequence
                                                                                  anomaly detection method in large data series, for
                                                                                  which the normal data structure is a set of sub-
                                                                                  sequences paired with a weight. The higher those
Proceedings of the VLDB 2020 PhD Workshop, August 31st, 2020. Tokyo,
Japan. Copyright (C) 2020 for this paper by its authors. Copying permitted        weights are, the more ”normal” their paired subse-
for private and academic purposes.                                                quences are.
                                                                                                                      Detected                           '-                      Detected
     • Series2Graph [5, 6], an unsupervised graph-based ap-                          !",ℓ   '(5                                                   !",ℓ
                                                                                                                      Anomalies                                                  Anomalies
                                                                                       7"                                                                               '6
       proach for subsequence anomaly detection in large                                                     '(6
       data series, for which the data structure is modeled                                                   !%,ℓ                          '*                8ℓ        !%,ℓ
                                                                              '(*
       as a directed graph. Nodes fo this graph can be seen                                             7%
                                                                                      ℬ           '(4                                                              '4
       as subsequences of the data series.                                                               '(-
                                                                                                    7&                                                             '5
                                                                               '(3                             !&,ℓ                          '3                           !&,ℓ
2.    NORMAL BEHAVIOR
   In this section we describe our proposed approaches to the           (a) d defined as the distance to the Normal               (b) Normal represtnation of the data series as
                                                                                             *
                                                                           Model '( = '(       , + * , … , '(- , + - ;                8ℓ .The path to reach the red dots has smaller
aforementioned problems. In the next part we describe the                  ℬ is the weighted barycenter of NM                         weight than those to reach the blue dots.
notions and the elements used in our solutions.
   We begin by introducing some formal notations useful for
                                                                        Figure 1: Illustration of the two subsequence
the rest of the paper. A data series T ∈ Rn is a sequence
                                                                        anomaly definitions proposed approaches:  (a)
of real-valued numbers Ti ∈ R [T1 , T2 , ..., Tn ], where n = |T |
                                                                        NormA; (b) Series2Graph.
is the length of T , and Ti is the ith point of T . We are
typically interested in local regions of the data series, known
as subsequences. A subsequence Ti,` ∈ R` of a data series
T is a continuous subset of the values from T of length `               the normal behaviors of the data series will be represented
starting at position i. Formally, Ti,` = [Ti , Ti+1 , ..., Ti+`−1 ].    in the normal model, while unusual behaviors will not (or
Given two sequences, A and B, of the same length, `, we                 will have a very low weight).
can calculate theirqZ-normalized Euclidean distance, dist,                 Figure 1(a) is an illustration of a Normal Model. As de-
                              Ai −µA
                                                                        picted, the Normal Model NM is a weighted combination of
                                     − Biσ−µ
                       P`                      B 2
as follows: dist =       i=1 ( σA           B
                                                 ) , where µ and        a set of subsequences (points within the dotted circles). The
σ represent respectively the mean and standard deviation                combination of these subsequences and their related weights
of the sequences. For the remaining of this paper, we will              returns distances di , dj , dk that are high enough to be differ-
simply use the term distance.                                           entiated from the normal points/subsequences. These dis-
   Given a subsequence Ti,` , we say that its mth Nearest               tances can be seen as the distance between subsequences
Neighbor (mth NN) is Tj,` if Tj,` has the mth shortest dis-             and a weighted barycenter B (in green) that represents NM .
tance to Ti,` among all the subsequences of length ` in T ,             Note that we do not actually compute this barycenter; we
excluding trivial matches; a trivial match of Ti,` is a subse-          illustrate it in Figure 1(a) for visualization purposes.
quence Ta,` , where |i − a| < `/2 (i.e., the two subsequences              We choose `NM > ` in order to make sure that we do
overlap by more than half their length). Since we are in-               not miss useful subsequences, i.e., subsequences with a large
terested in subsequence anomalies, we first define the set              overlap with an anomalous subsequence. For instance, for
of all subsequences of length ` in a given data series T :              a given subsequence of length `, a normal model of length
T` = {Ti,` |∀i.0 ≤ i ≤ |T | − ` + 1}. In general, we assume             `NM = 2` will also contain the subsequences overlapping
that T` contains both normal and anomalous subsequences.                with the first and last half of the anomalous subsequence.
We then need a way to characterize normal behavior. For                 We can now define the Abnormal subsequences as follows [3].
the purpose of the paper, we abstractly define the normal
behavior of the data series as follows:                                    Definition 2 (Subsequence Anomaly). Assume a
                                                                        data series T , the set T` of all its subsequences of length
   Definition 1 (Normal Behavior, NB ). Given          a                `, and the Normal Model NM of T . Then, the subsequence
data series T , NB is a model that represents the normal                Tj,` ∈
(i.e., not anomalous) trends and patterns in T .                              P T` with
                                                                                      i
                                                                                           anomaly score,  i.e., distance
                                                                                                                         i
                                                                                                                                to NM ,
                                                                        dj =    NMi w ∗ minx∈[0,`N     −`] dist(Tj,` , (NM )x,` )  is an
                                                                                                    M
   The above definition is not precise on purpose: it allows            anomaly if d is in the T op-k largest distances among all
several interpretations, which can lead to different kinds of           subsequences in T` , or d > , where  ∈ R>0 is a threshold.
models. Nevertheless, subsequence anomalies can then be
defined in a uniform way: anomalies are the subsequences                   Note that the only essential input parameter is the length
that have the largest distances to the expected, normal be-             ` of the anomaly (which is also one of the inputs in all rel-
havior, NB (or their distance is above a set threshold). Both           evant algorithms in the literature [12, 16, 7, 9, 8]). The
of the proposed approaches will define their own data struc-            parameter k (or ) is not essential, as long as the algorithm
ture to represent NB , as well as their own distance measure.           can rank the anomalies.
                                                                           We stress that in practice, experts start by examining the
2.1     The NormA Approach                                              most anomalous pattern, and then move down in the ranked
  We describe a new approach for subsequence anomaly                    list, since there is (in general) no rigid threshold separating
detection.         We propose a formalization for NB ,                  anomalous from non-anomalous behavior [2].
called the Normal Model, denoted NM , and defined                          As we mentioned above and will detail later on, we choose
as follows [3]: NM is a set of sequences, NM =                          to define NM as a set of sequences that summarizes normal-
     0
{(NM   , w0 ), (NM
                 1
                   , w1 ), ..., (NM
                                  n
                                    , wn )}, where NM i
                                                          is a subse-   ity in T , by representing the average behavior of a set of
                                               i
quence of length `NM (the same for all NM ) that corresponds            normal sequences. Intuitively, NM is the set of data series,
to a recurring behavior in the data series T , and wi is its            which tries to minimize the sum of Z-normalized Euclidean
normality score (as we explain later, the highest this score is,        distances between itself and some of the subsequences in T .
                                                    i
the more usual the behavior represented by NM           is). In other   Last but not least, we need to compute NM in an unsuper-
words, this model averages (with proper weights) the differ-            vised way, i.e., without having normal/abnormal labels for
ent recurrent behaviors observed in the data, such that all             the subsequences in T` .
  Observe that this definition of NM implies the following               graph or digraph G is an ordered pair G = (N , E) where N
challenge: even though NM summarizes the normal behavior                 is a Node Set, and E is an ordered Edge Set.
only, it needs to be computed based on T , which may include                We now provide a new formulation for subsequence
(several) anomalies. We address these challenges by taking               anomaly detection. The idea is that a data series is trans-
advantage of the fact that anomalies are a minority class.               formed into a sequence of abstract states (corresponding to
                                                                         different subsequence patterns), represented by nodes N in a
2.1.1     NormA Framework                                                directed graph, G(N , E), where the edges E encode the num-
   We now briefly describe NormA [3], our unsupervised so-               ber of times one state occurred after another. Under this
lution to the subsequence anomaly detection problem.                     formulation, paths in the graph composed of high-weight
[Computing the Normal Model] Recall that NM should                       edges and high-degree nodes correspond to normal behavior.
capture (summarize) the normal behavior of the data. This                Then, the Normality of a data series is defined as follows [5].
may not be very hard to do for a sequence T that does not
                                                                            Definition 3 (θ-Normality). Let a node set be de-
contain any anomalous subsequences. In practice though,
                                                                         fined as N = {N1 , N2 , ..., Nm }. Let also a data series T
we want to apply the NormA approach in an unsupervised
way on any sequence, which may contain several anomalies.                be represented as a sequence of nodes hN (1) , N (2) , ..., N (n) i
   We compute the NM in three steps. First, we extract                   with ∀i ∈ [0, n], N (i) ∈ N and m ≤ n.                         The θ-
the subsequences that can serve as candidates for building               N ormality of T is the subgraph Gνθ (Nν , Eν ) of G(N , E) with
the NM . These candidates are either randomly selected                   E = {(N (i) , N (i+1) )}i∈[0,n−1] , such that: Nν ⊂ N and
from T (NormA-smpl), or correspond to motifs. Then,                      ∀(N (i) , N (i+1) ) ∈ Eν , w((N (i) , N (i+1) )).(deg(N (i) ) − 1) ≥ θ.
we group these subsequences according to their similarity
in a set of clusters C (we use hierarchical clustering and
Minimum Description Length to identify the right num-                       Using θ-Normality subgraphs naturally leads to a ranking
ber of clusters). The last step consists of scoring each                 of subsequences based on their ”normality”. For practical
cluster, and selecting the cluster that best represents nor-             reasons, this ranking can be transformed into a score, where
mal behavior. Formally, for a given cluster c ∈ C, we                    each rank can be seen as a threshold in that score. We used
set a weight wc = N orm(c, C) with the following for-                    such a score in GraphAn to detect abnormal subsequences.
                                              2
mula: N orm(c, C) = PF requency(c)              ×Coverage(c)
                                                               , where      Note that given the existence of graph G, the above def-
                                 x∈C dist(Center(c),Center(x))
                                                                         initions imply a way for identifying the anomalous subse-
F requency(c) is the number of subsequences in c, and
                                                                         quences. The problem is now how to construct this graph.
Coverage(c) is the time interval between the first and the
last occurrence of a subsequence in c. We thus set NM =                  2.2.1     Series2Graph Framework
     0
{(NM   , w0 ), (NM
                 1
                   , w1 ), ..., (NMn
                                     , wn )}, with NM i
                                                         the centroid
         th                                                                 We now briefly describe Series2Graph [5, 6], our unsuper-
of the i cluster in C.
                                                                         vised solution to the subsequence anomaly detection prob-
[Normal Model Based Anomaly Detection] Intu-
                                                                         lem. For a given data series T , the overall Series2Graph
itively, the anomalous subsequences of the long series T are
                                                                         process is divided into four main steps as follows:
the ones that are far away from NM . We thus compute the
                                                                            [Subsequence Embedding] We project all the subse-
meta S = [d0 , d1 , ..., d|T |−`NM ], with dj the distance of the
                                                                         quences (of a given length `) of T in a three-dimensional
subsequence Ti,j to the Normal Model NM as defined in Def-               space that corresponds to the three most important com-
inition 2. These distances correspond to the degree of abnor-            ponents of the Principle Component Analysis (PCA). This
mality: the larger the distance is to NM , the more abnormal             space is subsequently transformed into a two-dimensional
the subsequence is. We then extract the k subsequences of                space, composed of two components r~y , r~z corresponding to
length ` with the highest distances to NM , and rank them                two orthogonal vectors of vref ~ , an axis composed of every
according to their distances. Alternatively, we can extract              flat sequences (a ∗ 1`−λ with a ∈ R). In the later space, the
all subsequences with distance larger than a threshold.                  shape similarity is preserved [5].
                                                                            [Node Creation] We then create a node for each one of
2.2     The Series2Graph Approach                                        the densest parts of the above two-dimensional space. The
   We now present an alternative data structure to represent             space is first discretized by a set of radius Iψ of angle ψ. We
the subsequences normal behaviors of the data series. The                then estimate the density of subsequences along each one of
previous normal model was a set of subsequences that aimed               these radius using gaussian kernels. The maximal values of
to store both normal and abnormal subsequences into the                  the estimated densities are assigned to nodes and form our
same set, associated with weights that rank them based on                Node Set N . These nodes summarize all major patterns of
their normality. One can argue that an ordering information              length ` that occurred in T [5].
is missing from this data structure representation. We thus                 [Edge Creation] We then retrieve all transitions between
formulate an approach for subsequence anomaly detection                  pairs of subsequences represented by two different nodes:
based on the data series representation into a Graph, in                 each transition corresponds to a pair of subsequences, where
which edges encode the ordering information [5]. Figure 1(b)             one occurs immediately after the other in the input data se-
illustrates the Graph data structure.                                    ries T . We represent transitions with an edge between the
   We first define several basic elements related to graphs.             corresponding nodes, and we thus form the Edge Set E. The
We define a Node Set N as a set of unique integers. Given a              edge weights are set to the number of times the correspond-
Node Set N , an Edge Set E is then a set composed of tuples              ing pair of subsequences was observed in T . Finally we build
(xi , xj ), where xi , xj ∈ N . w(xi , xj ) is the weight of that        our graph G` (N , E).
edge. Given a Node Set N , an Edge Set E (pairs of nodes in                 [Subsequence Scoring] We compute the normality
N ), a Graph G is an ordered pair G = (N , E). A directed                (or anomaly) score of a subsequence of length `q ≥ `
         (a)
           (a)                                                       (a)                               (a)
                                       (a)                                                                                                                             proposed approach of using a Normal Behavior model for the
                                                                                                                                                                       subsequence anomaly detection problem is very promising.
                                                                                                                                                                           In our current work, we focus on the following directions.
                                                                                                                                                                       [Normal Behavior Model] We will study in depth dif-
D-smplNormA-SJSTOMP
           NormA-SJ
           NormA-smpl
           NormA-SJGV          NormA-smpl
                                  IF
                                   NormA-smpl
                                   DADGV
                                  NormA-smpl    LOFSTOMPDADGVGV  GVS2G STOMP  NormA-SJ
                                                                            IFDAD  DAD LOFSTOMP
                                                                                  DAD                  NormA-smpl
                                                                                                   STOMP
                                                                                                IF STOMP               IF IF
                                                                                                               LOF NormA-SJ
                                                                                                             S2G            IF   S2GGV
                                                                                                                                     LOFLOF        DAD
                                                                                                                                                     S2GS2G
                                                                                                                                        LOF NormA-smpl  S2G            ferent
                                                                                                                                                                    STOMP
                                                                                                                                                                        GV        models
                                                                                                                                                                                      IF
                                                                                                                                                                                    DAD       for
                                                                                                                                                                                               LOF  Normal
                                                                                                                                                                                                 STOMP     S2G Behavior,
                                                                                                                                                                                                                  IF      LOF and perform
                                                                                                                                                                                                                                       S2G        analytical
     10000            1000010000
                             100000       100000
                                          10000
                                              10000
                                              10000    Time out
                                                                      10000               10000
                                                                                          10000
                                                                                     100000
                                                                                                                   Time out
                                                                                                               10000
                                                                                                                   10000
                                                                                                                   10000
                                                                                                                    10000                                10000
                                                                                                                                                                       and experimental
                                                                                                                                                                                10000                comparisons
                                                                                                                                                                                                            10000       among them. The goal will
                               10000        10000                                     100001000
                                                                                                                                                                       be to understand the theoretical properties, as well as the
                                        Time (sec)


                                                                                            1000                                                                                 1000


                                                                     Time (sec)
      1000             1000 1000            1000
                                               1000
                                               1000                    1000                                     1000 1000
                                                                                                                     1000                                  1000                                              1000
                                1000
                                                                                                                                                                       practical limitations of each one, so as to make informed
                                             1000                                      1000
       100               100 100             100100
                                                 100                    100                  100
                                                                                            100                  100100100                                  100        decisions  100 about the best model    100       to use.
                                 100          100                                       100
                          Time (sec)


        10                10 10 10            10 10
                                                  10                     10                  1010                  10 10 10                                   10
                                                                                                                                                                       [Multivariate
                                                                                                                                                                                   10
                                                                                                                                                                                                 Data      Series]
                                                                                                                                                                                                               10
                                                                                                                                                                                                                         Given that many domains
                                               10                                        10
                                                                                                 1
                                                                                                                                                                       monitor,          process,    and   analyze      multivariate       data series, we
         1                 1     1 1           11 11                      1               1 1                        1 11                                      1                    1                           1
 00000  1000000
  1500000
 000
 0000      20   120 1500000
            1000000
               2000000
           1000000                     0
                                    0 2000000
                              20 702000000
                        1500000
                        1500000       2000000           201201000000
                                                  20070500
                                                       20        500000 1000
                                                                          70120 70 20000001500
                                                                               01000000
                                                                                70           0 5000 0120
                                                                                            1500000
                                                                                                  120 120500000    500
                                                                                                           20000001000
                                                                                                                    500  0 2000
                                                                                                                        1000000   10001000
                                                                                                                                     1500
                                                                                                                                    1500000 500 1500
                                                                                                                                         500500  70       1000
                                                                                                                                                       1500
                                                                                                                                                  2000000     1000
                                                                                                                                                              1000
                                                                                                                                                                       will1500
                                                                                                                                                                              work
                                                                                                                                                                            1500 70 0
                                                                                                                                                                  20 120 1500
                                                                                                                                                                                          on algorithms
                                                                                                                                                                                                  500
                                                                                                                                                                                                   120
                                                                                                                                                                                                                that
                                                                                                                                                                                                               10000
                                                                                                                                                                                                                        will1500500
                                                                                                                                                                                                                                 inherently1000
                                                                                                                                                                                                                                                consider
                                                                                                                                                                                                                                                      1500
                                                                                                                                                                                                                                                           to-
 umber
of
oints
  Number     ofof
     anomalies    points
                       Number(c)
                  (c)points                (b) Number
                                       of Number
                                          anomalies          (c)
                                                   (d) Anomaly   of
                                                             of(b)  points
                                                                    Number
                                                                 Number
                                                                 anomalies
                                                                (c) Number length  ofof
                                                                                     of  points
                                                                                      anomalies
                                                                                              (d) Anomaly
                                                                                         anomalies             (c)(d)
                                                                                                      (b) Number     Anomaly
                                                                                                                      length        length
                                                                                                                                   (c)
                                                                                                                              of points
                                                                                                                          Anomaly       Number
                                                                                                                                       (d)   Anomaly
                                                                                                                                         length
                                                                                                                                          (d)         of anomalies
                                                                                                                                                Anomaly      length   (c)
                                                                                                                                                                 length    Number
                                                                                                                                                                       gether      all of        (d) Anomaly
                                                                                                                                                                                            anomalies
                                                                                                                                                                                          variables                length The
                                                                                                                                                                                                       of the series.        (d) Anomaly     lengthproblems
                                                                                                                                                                                                                                    challenging
                                                                                                                                                                       are how to create a multivariate Normal Behavior model
                            Figure 2: Comparison of six state-of-the-art meth-                                                                                         and corresponding data structure, and how to define and
                            ods to Series2graph and NormA: (a) Precision@k                                                                                             formalize suitable distance measures for this case.
                            with a critical diagram over 21 datasets; (b) execu-                                                                                       [Online Operation] In several situations, subsequence
                            tion time vs varying number of points; (c) execution                                                                                       anomaly detection applications need to operate in an on-
                            time vs varying anomaly length.                                                                                                            line fashion. In this context, we will develop extensions of
                                                                                                                                                                       our methods for real-time applications. We will analyze the
                            (within or outside of T ), based on the previously com-                                                                                    impact of data characteristics changes on the Normal Behav-
                            puted edges/nodes and their weights/degrees. Formally, for                                                                                 ior models, and explore solutions for updating the Normal
                            a subsequence Ti,` of T , represented by a path in G` (N , E)                                                                              Behavior data structures in real-time.
                                                           q

                         Pth = hN (i) , N (i+1) , ..., N (i+`q ) i, the normality score is de-                                                   Acknowledgments Work partially supported by EDF
                                                                        (j) (j+1)
                                                       Pi+` −1                    )(deg(N (j) )−1)
                         fined as: N orm(Pth ) = j=iq w(N ,N                    `q
                                                                                                   ,                                             R&D and ANRT French program.
                         where w(e) and deg(n) are the weight of edge e and the                                                                  References
                         degree of node n, respectively.                                                                                          [1] D. Abboud, M. Elbadaoui, W. Smith, and R. Randall. Advanced
                                                                                                                                                      bearing diagnostics: A comparative study of two powerful ap-
                         2.3                 Experimental Analysis                                                                                    proaches. MSSP, 114, 2019.
                                                                                                                                                  [2] V. Barnet and T. Lewis. Outliers in Statistical Data. John
                            In our preliminary experimental evaluations [3, 5], we used                                                               Wiley and Sons, Inc., 1994.
                         21 synthetic and real datasets from diverse domains, and                                                                 [3] P. Boniol, M. Linardi, F. Roncallo, and T. Palpanas. Automated
                         measured Precision@k (i.e., accuracy in the Top-k answers)                                                                   Anomaly Detection in Large Sequences. In ICDE, 2020.
                                                                                                                                                  [4] P. Boniol, M. Linardi, F. Roncallo, and T. Palpanas. SAD: An
                         and execution time. (Due to lack of space, we only report                                                                    Unsupervised System for Subsequence Anomaly Detection. In
                         here aggregated results.)                                                                                                    ICDE, 2020.
                            After rejecting the null hypothesis using the Fried-                                                                  [5] P. Boniol and T. Palpanas. Series2graph: Graph-based subse-
                                                                                                                                                      quence anomaly detection for time series. PVLDB, 13(11), 2020.
                         man test, we used the pairwise Post-Hoc Analysis with a                                                                  [6] P. Boniol, T. Palpanas, M. Meftah, and E. Remy. Graphan:
                         Wilcoxon signed-rank test (α = 0.05) [14], and we depict the                                                                 Graph-based subsequence anomaly detection. PVLDB, 13(11),
                         results in Figure 2(a). The results show that the NormA and                                                                  2020.
                         Series2Graph approaches are the overall winners, perform-                                                                [7] Y. Bu, O. T. Leung, A. W. Fu, E. J. Keogh, J. Pei, and
                                                                                                                                                      S. Meshkin. WAT: finding top-k discords in time series database.
                         ing statistically significantly better than the state-of-the-art                                                             In SIAM, 2007.
                         algorithms from both the data series and the multidimen-                                                                 [8] M. Linardi, Y. Zhu, T. Palpanas, and E. J. Keogh. Matrix Profile
                         sional data communities. (Even though Series2Graph seems                                                                     Goes MAD: Variable-Length Motif And Discord Discovery in
                                                                                                                                                      Data Series. In DAMI, 2020.
                         to be slightly better than both variants of Norma, this dif-                                                             [9] W. Luo and M. Gallagher. Faster and parameter-free discord
                         ference is not statistically significant.)                                                                                   search in quasi-periodic time series. In Advances in Knowledge
                            Moreover, Figures 2(b) and (c) demonstrate that NormA-                                                                    Discovery and Data Mining, 2011.
                                                                                                                                                 [10] T. Palpanas. Data series management: The road to big sequence
                         smpl and Series2Graph are considerably faster than the                                                                       analytics. SIGMOD Rec., 44(2):47–52, Aug. 2015.
                         other methods (note the log-scale y-axis). In particular,                                                               [11] T. Palpanas and V. Beckmann. Report on the First and Second
                         NormA-smpl is between 1-3 orders of magnitude faster than                                                                    Interdisciplinary Time Series Analysis Workshop (ITISA). ACM
                                                                                                                                                      SIGMOD Record, 48(3), 2019.
                         the rest, as we increase the length of the input sequence, or                                                           [12] P. Senin, J. Lin, X. Wang, T. Oates, S. Gandhi, A. P. Boedi-
                         the length of the anomalies.                                                                                                 hardjo, C. Chen, and S. Frankenstein. Time series anomaly dis-
                            We note that further experiments are needed in order to                                                                   covery with grammar-based compression. In EDBT, 2015.
                                                                                                                                                 [13] S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki,
                         compare in detail our proposed methods.                                                                                      and D. Gunopulos. Online outlier detection in sensor data using
                                                                                                                                                      non-parametric models. In VLDB 2006, pages 187–198, 2006.
                         3.             CONCLUSIONS AND FUTURE WORK                                                                              [14] F. Wilcoxon. Individual comparisons by ranking methods. Bio-
                                                                                                                                                      metrics Bulletin, 1(6):80–83, 1945.
                            Our studies on NormA and Series2Graph, described                                                                     [15] D. Yankov, E. J. Keogh, and U. Rebbapragada. Disk aware
                         above, constitute a first attempt to formalize the problem of                                                                discord discovery: finding unusual time series in terabyte sized
                                                                                                                                                      datasets. Knowl. Inf. Syst., 17(2):241–262, 2008.
                         Normal Behavior representation in the context of data se-                                                               [16] C. M. Yeh, Y. Zhu, L. Ulanova, N. Begum, Y. Ding, H. A. Dau,
                         ries. The two proposed approaches describe data structures                                                                   D. F. Silva, A. Mueen, and E. J. Keogh. Matrix profile I: all pairs
                         based on subsequence sets and graphs, respectively. The ex-                                                                  similarity joins for time series: A unifying view that includes
                                                                                                                                                      motifs, discords and shapelets. In ICDM, pages 1317–1322, 2016.
                         perimental results show that both NormA and Series2Graph
                         outperform the current state-of-the-art techniques methods
                         over a large set of diverse datasets. This indicates that the