<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Anomaly Detection in Electrocardiogram Readings with Stacked LSTM Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Markus Thill</string-name>
          <email>markus.thill@th-koeln.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sina Däubener</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wolfgang Konen</string-name>
          <email>wolfgang.konen@th-koeln.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Bäck</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Leiden University</institution>
          ,
          <addr-line>LIACS, 2333 CA Leiden</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TH Köln - Cologne University of Applied Sciences</institution>
          ,
          <addr-line>51643 Gummersbach</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Real-world anomaly detection for time series is still a challenging task. This is especially true for periodic or quasi-periodic time series since automated approaches have to learn long-term correlations before they are able to detect anomalies. Electrocardiography (ECG) time series, a prominent real-world example of quasi-periodic signals, are investigated in this work. Anomaly detection algorithms often have the additional goal to identify anomalies in an unsupervised manner. In this paper we present an unsupervised time series anomaly detection algorithm. It learns with recurrent Long Short-Term Memory (LSTM) networks to predict the normal time series behavior. The prediction error on several prediction horizons is used to build a statistical model of normal behavior. We propose new methods that are essential for a successful model-building process and for a high signal-to-noise-ratio. We apply our method to the well-known MIT-BIH ECG data set and present first results. We obtain a good recall of anomalies while having a very low false alarm rate (FPR) in a fully unsupervised procedure. We compare also with other anomaly detectors (NuPic, ADVec) from the state-of-the-art.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Anomaly detection in time series is of increasing
importance in many application areas, e.g. health care [
        <xref ref-type="bibr" rid="ref4 ref6">6, 4</xref>
        ],
sensor networks [
        <xref ref-type="bibr" rid="ref14 ref20">20, 14</xref>
        ] or predictive maintenance [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
Anomaly detection is a very active research area (for an
overview see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]), yet it is not easy to come up with a
general definition of an anomaly. The notion of an anomaly
greatly depends on the application area and on
characteristics of the time series in question. To learn the
characteristics of nominal and anomalous behavior from data
it is often necessary to apply sophisticated methods from
natural computing (evolutionary neural networks [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and
immune systems [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]). While some anomalies are simple
to detect (e.g. a sudden spike in a relatively constant
signal may be detected by simple threshold heuristics), other
anomalies are subtle and more complex to detect. This is
especially the case for periodic or quasi-periodic time
series where the anomaly may be a time-shifted peak, a peak
with a different form or other patterns which only emerge
from a long-range analysis of the signal.
      </p>
      <p>
        Electrocardiography (ECG) time series constitute a
prominent real-world example of quasi-periodic signals.
Anomaly detection in ECG readings plays an important
role in health care and medical diagnosis. There exist
well-maintained databases, e.g. the MIT-BIH database [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
where a large body of data is annotated with numerous
types of anomalies which are characterized and verified
by medical experts. Automated anomaly detection in such
ECG data is still a challenging topic, because the
deviations from nominal behavior are often subtle and require
long-range analysis. Furthermore, there are considerable
signal variations from patient to patient or even within an
ECG time series.
      </p>
      <p>
        Long short-term memory (LSTM) networks [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which
are a special form of recurrent neural networks (RNN) and
thus belong to the class of deep learning methods, have
proven to be particularly useful in learning sequences with
long-range dependencies. They avoid the vanishing
gradient problem [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and are more stable and better
scalable [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] than other RNN architectures. LSTMs have been
successfully advanced the state-of-the-art in many
application areas like language modeling and translation, acoustic
modeling of speech, analysis of audio data, handwriting
recognition and others [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. We will use stacked LSTMs
as the building block for our ECG time series prediction.
      </p>
      <p>
        It is the purpose of the present paper to investigate
whether anomaly detection in quasi-periodic ECG time
series can be trained in an unsupervised manner, hence,
without the usage of the anomaly class labels. We will
describe in Sec. 2 an LSTM prediction model which is
trained to predict over multiple horizons and is applied to
time series containing nominal and also rare anomalous
data. We observe multidimensional error vectors (one
vector for each point in time) and fit a multivariate Gaussian
distribution to them. Based on the Mahalanobis distance
we can assign to each point in time a probability of
being anomalous. Sec. 3 describes our experimental setup
and the MIT-BIH Arrhythmia Database used in our
experiments. Sec. 4 presents and discusses our results, while
Sec. 5 concludes.
Anomaly detection in general has been done with
methods from machine learning [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and more precisely from
natural computing: Han and Cho [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and other works
cited therein use evolutionary approaches in optimizing
neural networks for the task of intrusion detection. Kieu
et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] use deep learning (LSTM, autoencoder) for
anomaly detection. [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] uses a multi-resolution
waveletbased approach for unsupervised anomaly detection.
Stibor et al. [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] describe an immune-system approach to
anomaly detection: They tackle a problem prevalent in
anomaly detection (and relevant also for the ECG case):
Often only nominal data are available during training,
nevertheless the model should later detect anomalies as well.
This task, known as one-class classification or negative
selection, is solved in [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] with an immune-system
approach. In our case we have an unknown, small number
of anomalies embedded in nominal data. We describe in
Sec. 2.5 a statistical test to find anomalies in an
unsupervised, threshold-free manner.
      </p>
      <p>
        Much work is devoted to anomaly detection in ECG
readings: Several authors use multi-resolution
waveletbased techniques [
        <xref ref-type="bibr" rid="ref23 ref27">23, 27</xref>
        ]. A novelty-search approach on
ECG data is taken in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] in order to perform unsupervised
anomaly classification. Sivaraks et al. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] use motif
discovery for robust anomaly detection.
      </p>
      <p>
        The works of Malhotra [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and Chauhan [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] are
closest to our present approach. They describe nicely the
general idea of LSTM based prediction and their application
to ECG, motor sensor or power-consumption time series.
But the big drawback of [
        <xref ref-type="bibr" rid="ref19 ref4">19, 4</xref>
        ] is that they need a manual
and supervised separation into up to six data sets: training,
validation, test sets which are further subdivided into
nominal &amp; anomalous subsets. This means that for a real-world
application the ECG data for a new person would need
to undergo an expert anomaly classification prior to any
training. This will be highly unpractical in most
application scenarios. Our method instead aims at using the whole
body of data for a person and train the LSTMs, without the
necessity to have supervised anomaly information.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <sec id="sec-2-1">
        <title>LSTM for Time Series Prediction</title>
        <p>The learning task is formulated as a time series
forecasting problem. Hence, we attempt to train a model which is
able to predict future values in the time series. The
intuition behind this approach is that the usual quasi-periodic
patterns in the ECG time series should be predictable with
only small errors, while abnormal behavior should lead to
large deviations in the predictions. Although the presented
methodology is only applied to ECG data in this paper, it
is sufficiently general to be applied to other (predictable)
time series as well.</p>
        <p>Data Preparation Consider a d-dimensional time series
of length T . In a first step, it is often recommendable to
scale or normalize the individual dimensions of the time
series. In our setup, each dimension of each ECG
signal are scaled into the range [ 1; 1]. The training and test
samples are generated by extracting sub-sequences of
suitable length from the original time series. This is done by
sliding a window of length W with a lag of 1 over the
time series and collecting the windowed data in a tensor
D 2 RT 0 W din , where T 0 is the number of sub-sequences.
Usually, one would select all d dimensions of the time
series, so that din = d. Then, the first 80% of the samples
of D are selected to form the training data Xtrain. The
remaining 20% are used as a test set Xtest to later
compute an unbiased error estimate. While the inputs are
dindimensional, the output-targets for each time step have the
dimension m, since one can select for one time series
multiple (m) prediction horizons. Technically, it is also
possible to predict several time series dimensions with dout d
simultaneously in one model, however, we found in our
experiments that the results do not improve in this case
(for the investigated ECG time series data). The targets
~yt 2 Rm are future values of the selected signal at times
t + hi for i 2 f1; :::; mg, where the horizons are specified
in H = (h1; h2; : : : ; hm). Since we follow a many-to-many
time series prediction approach, where the algorithm
performs a prediction at each instance of time t, the tensor
containing the target signals has the shape RT 0 W m with
T 0 = T W max(H) + 1. T 0 is the same for the
inputand output tensor. As before, the first 80% of the targets
are used for training (Ytrain) and the remaining targets for
the test set (Ytest).</p>
        <p>
          Model Architecture and Training A stacked LSTM
architecture [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] with L = 2 layers is used to learn the
prediction task. Each layer consists of u = 64 units. A dense
output layer with m units and a linear activation generates the
predictions for the specified prediction horizons in H. The
net is trained with the sub-sequences of length W taken in
mini-batches of size B from the training inputs Xtrain and
targets Ytrain. 10% of the training data are held out for the
validation set. The LSTM model is trained by using the
Adam optimizer [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] to minimize the mean-squared-error
(MSE) loss. Other loss functions, such as log-cosh
(logarithm of the hyperbolic cosine) and MAE (mean absolute
error) were tested as well and produced similar results for
our data. Early stopping is applied to prevent overfitting
of the model and to reduce the overall time required for
the training process. For this purpose the MSE on the
validation set is tracked. For most of the investigated time
series, 10-20 epochs are sufficient to reach a minimum of
the validation error.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Modelling the Residuals with a multivariate Gaussian Distribution</title>
        <p>40
30
20
10
ity 0
s
n
e40
d
30
20
10
0
Uncorrected
Corrected
0.00
error
−0.10
−0.05
0.05
0.10</p>
      </sec>
      <sec id="sec-2-3">
        <title>Computing the Prediction Errors for the full Time Se</title>
        <p>ries After the LSTM prediction model is trained, the
whole time series X 2 RT din of length T is passed through
the model and a tensor Yˆ of shape RT m is predicted.
Then, the prediction errors E = Y Yˆ are calculated,
where Y contains the true target values for the horizons
in H. Now, each row i in the matrix E represents an error
vector ~ei 2 Rm.</p>
      </sec>
      <sec id="sec-2-4">
        <title>Removing Outliers from the Prediction Errors We no</title>
        <p>ticed in our initial experiments that it was not possible to
find good Gaussian fits for the individual dimensions of
the prediction errors in E. An example for this is shown in
Figure 1, upper part. This was due to the fact that the tails
of the error distributions contained many outliers, which
significantly distorted the estimated fit. Hence, we decided
to remove the outliers in the tails of each dimension (only
during the Gaussian modeling phase). We could find good
solutions by discarding the upper and lower 3% quantile
in each dimension. In our experiments we observed that
this approach usually removes slightly more than 20% of
the data records.</p>
      </sec>
      <sec id="sec-2-5">
        <title>Estimating a multivariate Gaussian Distribution After</title>
        <p>removing the outliers from the prediction errors E, the
errors are roughly Gaussian distributed and the parameters
of a multivariate Gaussian distribution can be estimated.
For this purpose the covariance matrix S and mean
vector ~m are computed for the cleaned matrix E. Then, the
squared Mahalanobis distance</p>
        <p>M(~ei) = (~ei ~m)T S 1(~ei ~m)
(1)
to the mean vector ~m is determined for each error vector ~ei
in E.
2.3
For most data points in the time series the corresponding
Mahalanobis distance will be comparably small, since they
are located close to the mean of the distribution. On the
other side, unusual patterns in the error vectors~ei – such as
large errors in one or more dimensions – will result in large
values in the Mahalanobis distance. Therefore, the
Mahalanobis distance can be used as an indicator for anomalous
behavior in the time series signal. In our LSTM-AD
algorithm, points with a Mahalanobis distance larger than
a specified anomaly threshold will be flagged as
anomalous. Figure 2 shows exemplarily the Mahalanobis
distance over time for a selected ECG signal. Depending on
the choice of threshold, more or less points will be
classified as anomalous. If the threshold is set too small, the
algorithm will likely produce many false detections. If the
threshold is chosen too large, many anomalies might be
missed. Ideally, a threshold can be found which allows to
identify all anomalies without any false detections.
However, in practice, one usually has to trade off true and false
detections and select the threshold according to the own
requirements.
2.4</p>
      </sec>
      <sec id="sec-2-6">
        <title>Window-Based Error Correction</title>
        <p>Initially, we could not obtain very good results when
running our algorithm on several example ECG time series.
The Mahalanobis distance signal was rather noisy and
could not be used to distinguish between nominal and
abnormal patterns in the data. In Figure 2 (top) this problem
is visualized for one example time series. Further
investigation showed that the predictions of the LSTM network
were good in general, but not sufficiently accurate near the
heart beat peaks where the prediction had been slightly
shifted (up to 10 time steps) forwards or backwards
compared to the real signal. We could identify that the
quasiperiodic character of most ECG signals (small but
ubiquitous frequency changes in the heart beat) is the main
source of this problem. The following solution for this
problem is proposed: In order to address the variability in
the frequency of the signal, small corrections in the
predictions of the individual horizons will be permitted. For each
output dimension k 2 f1; : : : ; mg the target values yt;k are
compared to the neighbored predictions yˆ 2 Yˆt(;kwin) and the
prediction with the smallest absolute error is effectively
taken:
yˆt;k
arg min jyt;k
yˆ2Yˆt(;kwin)</p>
        <p>yˆj
ˆ (win) = [yˆt ck;k; : : : ; yˆt;k; : : : ; yˆt+ck;k]</p>
        <p>Yt;k</p>
        <p>We found that reasonable results can be achieved with
window parameters ck up to a length of 10, depending on
the prediction horizon hk:
ck = min(hk; 10):
(2)
(3)</p>
        <p>The window-based error correction is applied right after
the LSTM prediction of Yˆ and before the prediction errors
E are computed in Sec. 2.2.</p>
        <p>Although this approach corrects the predictions of the
LSTM network explicitly with the true target values Y of
the time series, it is not supervised in the sense that no
anomaly labels are presented to the algorithm at any time
of the training and correction process. As will be shown in
Sec. 4, we could significantly improve the performance of
LSTM-AD utilizing this correction step.
2.5</p>
      </sec>
      <sec id="sec-2-7">
        <title>Threshold-free Anomaly Detection</title>
        <p>
          For determining which points are anomalous, we extend
Rosner’s outlier test [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] to a multivariate scenario. This
means that we want to test the hypothesis:
        </p>
        <p>H0 : There are no anomalies in the data.</p>
        <p>Ha : There are up to n percent of anomalies in the data.
We assume that the residuals of each output k with k 2
f1; :::; mg are approximately normal distributed. Based on
these residuals we iteratively calculate a multivariate
normal distribution with mean ~m and covariance matrix S. In
a next step we calculate the differences of an observed
vector compared to this underlying multivariate normal
distribution. We do so by calculating the squared Mahalanobis
distance according to Eq. (1), which can be assumed to be
c2-distributed with m degrees of freedom.</p>
        <p>The proposed algorithm selects the highest value of
the Mahalanobis distance M j = maxi M(~ei) and evaluates
whether this value is above the 1 a quantile of the cm2
distribution. We refer to this quantile for the ease of
notation as cm2 (1 a). If so, the observation is considered to
be an anomaly. A row window around the anomaly index
is then deleted from the data set and the algorithm
proceeds with calculating the multivariate normal distribution
based on the reduced data set. The procedure stops when
at most n% of the data were labeled as an anomaly or if
M j cm2 (1 a).</p>
        <p>The procedure is summarized in Algorithm 1.</p>
        <p>Algorithm 1 Extended Rosner test
1: procedure TEST(n; a; D) . n: maximal percentage
of anomalies; D 2 Rn m: set of residuals, 1 a confidence
level
2: N dn ne
3: for j = 1; 2; : : : ; N do
4: ~m MEAN(D); S COV(D)
5: Mj maxi M(~ei) . M acc. to Eq. (1) for all ~ei 2 D
6: if Mj cm2(1 a) then
7: D D n fa window around index of Mjg
8: end if
9: end for
10: Set q to the last index where Mj &gt; cm2(1 a)
11: return fM1; M2; : : : ; Mqg and corresp. row indices.
12: end procedure
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experimental Setup</title>
      <sec id="sec-3-1">
        <title>The MIT-BIH Arrhythmia Database</title>
        <p>
          For our experiments we use the MIT-BIH Arrhythmia
database [
          <xref ref-type="bibr" rid="ref21 ref22 ref9">9, 21, 22</xref>
          ], which contains two-channel
electrocardiogram (ECG) signals of 48 patients of the Beth
Israel Hospital (BIH) Arrhythmia Laboratory. Each
recording has a length of approximately half an hour, which
corresponds to 650 000 data points each. The two
channels recorded are the modified limb lead II (MLII) and a
modified lower lead V5. For our LSTM-AD algorithm,
both the MLII and V5 signal will be used as inputs of
the LSTM model and only the MLII signal is predicted
for the specified horizons.1 The individual signals have a
1We also performed experiments where both signals (MLII and V5)
are predicted, but could not observe any noticeable improvement.
quasi-periodic behavior, with differing heart-beat patterns
for each subject.
        </p>
        <p>
          There are a multitude of different events in all ECG time
series, which were labelled by human experts. The whole
list of heart beat annotations can be viewed at [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. For
this initial investigation which presents first results of our
unsupervised approach, we decide to limit ourselves to all
time series with 50 or fewer events. Overall, 13 time
series will be considered for our experiments. The selected
time series contain 130 anomalous events from 6 anomaly
classes, which are listed in Table 1. Since only one point
is labelled for each anomaly, we place an anomaly
window of length 600 around each event, which roughly
corresponds to the length of one heart beat before and after
the labeled point. A more detailed database description
can be found in [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Parameterization of the Algorithms</title>
        <p>All algorithms compared in this work require a set of
parameters, which are – if not mentioned otherwise – fixed
for all experiments. For each algorithm an anomaly
threshold can be set, which specifies the sensitivity of the
algorithm towards anomalies and which trades off false
detections (false positives) and missed anomalies (false
negatives). This threshold is usually set according to the
requirements of the anomaly detection task – allowing either
a higher precision or a higher recall.</p>
        <p>
          LSTM-AD We implemented our proposed algorithm
using the Keras framework [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] with a TensorFlow [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
backend. The parameters of the algorithm are
summarized in Table 2. Most of the parameters are related to the
stacked LSTM network. We did not systematically tune
the parameters.
        </p>
        <p>
          ADVec Twitter’s ADVec algorithm [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] is a time series
anomaly detection algorithm, which is based on the
generalized ESD test and other robust statistical approaches.
There are mainly two parameters which have to be
provided: The first parameter a represents the level of
statistical significance with which to accept or reject anomalies.
Although we did not tune the parameter extensively, we
found the a = 0:05 to deliver the best results. The
second parameter maxanoms specifies the maximum number
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Parameter</title>
        <p>H
L</p>
        <p>B
optimizer
din
value
(1; 3; : : : ; 47; 49)
3
2048
ADAM
2</p>
      </sec>
      <sec id="sec-3-4">
        <title>Parameter</title>
        <p>L
u
W
ainit
dout
of anomalies that the algorithm will detect as a percentage
of the data. This parameter is used as anomaly threshold.</p>
        <p>
          NuPic Numenta’s anomaly detection algorithm [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] has
a large set of parameters which have to be set. Although
the parameters can be tuned with an internal swarming
tool [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], we decided to use the standard parameter
settings recommended in [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], since the time-expensive
tuning process is not feasible for the data considered in this
work. NuPic outputs an anomaly likelihood for each time
series point in the interval [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ], which is suitably
thresholded to control the sensitivity of the algorithm.
In order to evaluate and compare the performance of our
proposed LSTM-AD algorithm and the two other
algorithms, several common performance quantities are used
in this paper. Similarly to ordinary classification tasks,
a confusion matrix can be constructed for each time
series, containing the number of true-positives (TP),
falsepositives (FP), false-negatives and true-negatives (TN).
TP indicates the number of correctly identified anomalies,
whereby only one detection within the anomaly window
(Sec. 3.1) is counted. All false detections outside any
anomaly window are considered as false-positives (FP)
and each anomaly window missed by an algorithm will be
counted as a false negative (FN). All other points are
considered as true-negatives (TN). From the confusion
matrix, the additional well-known metrics precision p,
recall r (true-positive rate, TPR) can be derived, as well as
the false-positive rate (FPR), the positive likelihood ratio
(PLR), and the F1-score, which are defined as:
Firstly, we confirm that our LSTM models do not overfit,
since the training and test set errors are for all time series
nearly the same. The median of all such training and test
errors is 2:91 10 3 and 3:43 10 3, respectively. The first
80% of each time series is used as training set and the
remaining 20% is used subsequently for the test set.
        </p>
        <p>The anomaly results for the ECG readings are
summarized in Table 3 and Table 4. In Table 3, the anomaly
threshold for the Mahalanobis distance was tuned
individually to maximize the F1-score for each ECG signal. As
seen in the table, the Mahalanobis distance is generally
a good indicator for separating nominal from anomalous
behavior in the heartbeat signals, if a suitable threshold is
known. For all time series a recall value of 0:5 or larger
can be observed and with one exception, also the F1-score
exceeds the value 0:5. On average, a F1-score of
approximately 0:81 can be achieved for all time series. Note that
all FPRs are smaller than 3 10 5.</p>
        <p>Since the anomaly threshold is used to trade off
falsepositive and false-negatives (precision and recall), one can
vary the threshold in a certain range and collect the results
for different values. This is done in Figure 3, which also
shows the results for different thresholds for ADVec and
NuPic. It has to be noted that ADVec accounts only for
fixed-length seasonalities, it is not built for quasi-periodic
signals as they occur in ECG readings, so it is quite
understandable that it has only low performance here.</p>
        <p>Table 4 shows the results which are obtained when no
prior knowledge about the anomaly threshold is assumed.
Instead, the threshold is calculated automatically and
incrementally by our approach from Sec. 2.5 which is
inspired by Rosner’s ESD test. The average precision and
F1 are lower than in Table 3 since the false positives (FP)
are higher.</p>
        <p>In Figure 4, two excerpts of time series No. 13 (left)
and No. 10 (right) with the detections of our LSTM-AD
algorithm, NuPic and ADVec are exemplarily shown. In
both examples it can be seen that LSTM-AD detects all
indicated anomalies, while NuPic and ADvec only detect
two and one anomaly, respectively. Additionally, the other
two algorithms produce several false positives.</p>
        <p>The importance of our proposed window-based error
correction method (Sec. 2.4) is illustrated in Figure 2 for
ECG signal No. 13: If no window-based error correction
is applied, the obtained Mahalanobis distance cannot be
suitably used to distinguish between nominal and
anomalous patterns in the displayed ECG data. Only after
applying our approach, a better signal-to-noise ratio is
established which allows to perfectly separate the anomalies
from the nominal points. For most of the investigated ECG
readings we found that the window-based error correction
significantly improves the signal-to-noise ratio in the
Mahalanobis distance. Table 5 shows: The average F1-score
increases from F1=0:50 when no window-based error
correction is applied to F1=0:81.</p>
        <p>In Table 6, various measures are listed for the individual
anomaly classes. The anomaly types a, F and x can all be
detected by LSTM-AD. Also for the anomaly class V a
high recall can be achieved. However, the two remaining
types appear to be hard to detect for our algorithm.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion &amp; Future Work</title>
      <p>We have presented a fully unsupervised method to detect
anomalies in ECG readings. This method relies on an
accurate LSTM predictor to learn the nominal behavior of
the ECG for several prediction horizons. By learning the
error distribution between predicted and perceived values,
a multivariate normal error model for the nominal data is
built. When applying the model, anomalous events have a
high probability of being detected through an unusual high
Mahalanobis distance.</p>
      <p>Our method is unsupervised in the sense that no
anomaly class labels are needed for training the algorithm.
In fact, it is even not necessary that anomalous events are
present at all in the training data, i.e. our algorithm can
operate as a one-class classificator. We checked this by
repeating the experiment leading to Table 3, but this time
removing all data around anomalies during LSTM
training. When using the trained model as anomaly detector
on all data, it worked as accurate as in Table 3, the mean
F1-score being now F1 = 0:83.</p>
      <p>We achieve for the ECG readings these high precision,
recall and F1-values (on average higher than 80%, see
Table 3), if we tune the final threshold for the Mahalanobis
distance such that F1 is maximized. Admittedly, this last
step is not unsupervised, since we calculate the confusion
matrix based on the true anomaly labels.</p>
      <p>The alternative unsupervised case based on Rosner’s
test (Sec. 2.5 and Table 4) is weaker in terms of
precision and recall. This may be due to the fact that the
current error data do not fulfill the assumption of being
normally distributed and therefore also the assumption of a
c 2-distribution is violated. This results in the c 2-criterion
giving no useful thresholds.</p>
      <p>It has to be noticed that the measure ’Mahalanobis
distance’ has the same discriminative power in both cases. It
is only that the final threshold is not adjusted optimally for
the individual time series in the alternative case. Viewed
from the practitioner’s perspective, it may be acceptable to
start with a non-optimal threshold and adjust it in a
humanin-the-loop approach. However, a fully unsupervised
highquality method would be nicer.</p>
      <p>We have shown that the window-based error correction
is essential to achieve a Mahalanobis distance graph where
the anomaly cases clearly stand out (Fig. 2 and Table 5).</p>
      <p>Our LSTM-AD algorithm outperformed two
state-ofthe-art anomaly detection algorithms (NuPic and ADVec)
on the investigated ECG readings, achieving a higher
precision and recall over a large range of anomaly thresholds.</p>
      <p>In this work we have presented first results of an
unsupervised anomaly detector suitable for ECG readings or
other quasi-periodic signals. The results are encouraging,
but there is still room for improvement. Possible future
works include: 1) Improving the modeling step such that
the nominal error distribution comes closer to a Gaussian
shape and hence the nominal Mahalanobis distance closer
to a c 2-distribution. Then the unsupervised extended
Rosner test can be expected to work better. 2) To do so, one
has to address the problem of non-stationarity in the ECG
readings, e. g. by applying suitable preprocessing steps to
reduce the effect of signal quality changes. 3) Enrich the
model by multi-resolution approaches to span larger
prediction horizons on a coarser scale.
nique for long-term anomaly detection in the cloud. In
6th USENIX Workshop on Hot Topics in Cloud Computing,
Philadelphia, PA, 2014.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abadi</surname>
          </string-name>
          et al.
          <article-title>Tensorflow: A system for large-scale machine learning</article-title>
          .
          <source>In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation</source>
          , OSDI'
          <volume>16</volume>
          , pages
          <fpage>265</fpage>
          -
          <lpage>283</lpage>
          , Berkeley, CA, USA,
          <year>2016</year>
          . USENIX Association.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          .
          <article-title>Running swarms</article-title>
          . http://nupic.docs.
          <source>numenta.org/0</source>
          .6.0/guide-swarming.html, May
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Chandola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          .
          <article-title>Anomaly detection: A survey</article-title>
          .
          <source>ACM computing surveys (CSUR)</source>
          ,
          <volume>41</volume>
          (
          <issue>3</issue>
          ):
          <fpage>15</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chauhan</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Vig</surname>
          </string-name>
          .
          <article-title>Anomaly detection in ecg time signals via deep long short-term memory networks</article-title>
          .
          <source>In Data Science and Advanced Analytics (DSAA)</source>
          ,
          <year>2015</year>
          .
          <volume>36678</volume>
          2015. IEEE International Conference on, pages
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          . IEEE,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Chollet</surname>
          </string-name>
          et al. Keras. https://keras.io,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Chuah</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Fu</surname>
          </string-name>
          .
          <article-title>ECG anomaly detection via time series analysis</article-title>
          .
          <source>In International Symposium on Parallel and Distributed Processing and Applications</source>
          , pages
          <fpage>123</fpage>
          -
          <lpage>135</lpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Sanz-Bobi</surname>
          </string-name>
          ,
          <source>and J. del Pico</source>
          .
          <article-title>SIMAP: Intelligent system for predictive maintenance: Application to the health condition monitoring of a windturbine gearbox</article-title>
          .
          <source>Computers in Industry</source>
          ,
          <volume>57</volume>
          (
          <issue>6</issue>
          ):
          <fpage>552</fpage>
          -
          <lpage>568</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>George</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Hawkins</surname>
          </string-name>
          .
          <article-title>Towards a mathematical theory of cortical micro-circuits</article-title>
          .
          <source>PLoS Comput Biol</source>
          ,
          <volume>5</volume>
          (
          <issue>10</issue>
          ):
          <fpage>e1000532</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Goldberger</surname>
          </string-name>
          et al.
          <article-title>Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals</article-title>
          .
          <source>Circulation</source>
          ,
          <volume>101</volume>
          (
          <issue>23</issue>
          ):
          <fpage>e215</fpage>
          -
          <lpage>e220</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>Greff</surname>
          </string-name>
          et al.
          <article-title>LSTM: A search space odyssey</article-title>
          .
          <source>IEEE transactions on neural networks and learning systems</source>
          ,
          <volume>28</volume>
          (
          <issue>10</issue>
          ):
          <fpage>2222</fpage>
          -
          <lpage>2232</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.-J.</given-names>
            <surname>Han</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.-B.</given-names>
            <surname>Cho</surname>
          </string-name>
          .
          <article-title>Evolutionary neural networks for anomaly detection based on the behavior of a program</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>B</given-names>
          </string-name>
          (Cybernetics),
          <volume>36</volume>
          (
          <issue>3</issue>
          ):
          <fpage>559</fpage>
          -
          <lpage>570</lpage>
          ,
          <year>June 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          .
          <article-title>The vanishing gradient problem during learning recurrent neural nets and problem solutions</article-title>
          .
          <source>International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems</source>
          ,
          <volume>6</volume>
          (
          <issue>02</issue>
          ):
          <fpage>107</fpage>
          -
          <lpage>116</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          .
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R. U.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Hossain</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Andersson</surname>
          </string-name>
          .
          <article-title>A novel anomaly detection algorithm for sensor data under uncertainty</article-title>
          .
          <source>Soft Computing</source>
          ,
          <volume>22</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1623</fpage>
          -
          <lpage>1639</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kieu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Jensen</surname>
          </string-name>
          .
          <article-title>Outlier detection for multidimensional time series using deep neural networks</article-title>
          .
          <source>In 2018 19th IEEE International Conference on Mobile Data Management (MDM)</source>
          , pages
          <fpage>125</fpage>
          -
          <lpage>134</lpage>
          . IEEE,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          .
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>arXiv preprint arXiv:1412.6980</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lavin</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          .
          <article-title>Evaluating real-time anomaly detection algorithms - the Numenta anomaly benchmark</article-title>
          .
          <source>In IEEE Conference on Machine Learning and Applications (ICMLA2015)</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Lemos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tierra-Criollo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Caminhas</surname>
          </string-name>
          .
          <article-title>ECG anomalies identification using a time series novelty detection technique</article-title>
          .
          <source>In IV Latin American Congress on Biomedical Engineering</source>
          <year>2007</year>
          ,
          <article-title>Bioengineering Solutions for Latin America Health</article-title>
          , pages
          <fpage>65</fpage>
          -
          <lpage>68</lpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>P.</given-names>
            <surname>Malhotra</surname>
          </string-name>
          et al.
          <article-title>Long short term memory networks for anomaly detection in time series</article-title>
          .
          <source>In 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)</source>
          , pages
          <fpage>89</fpage>
          -
          <lpage>94</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>P.</given-names>
            <surname>Malhotra</surname>
          </string-name>
          et al.
          <article-title>LSTM-based encoder-decoder for multi-sensor anomaly detection</article-title>
          .
          <source>arXiv preprint arXiv:1607.00148</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>G. B.</given-names>
            <surname>Moody</surname>
          </string-name>
          and R. G. Mark.
          <article-title>PhysioNet: The MIT-BIH Arrhythmia Database</article-title>
          . https://www.physionet.org/ physiobank/database/mitdb/,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>G. B.</given-names>
            <surname>Moody</surname>
          </string-name>
          and R. G. Mark.
          <article-title>The impact of the mit-bih arrhythmia database</article-title>
          .
          <source>IEEE Engineering in Medicine and Biology Magazine</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ):
          <fpage>45</fpage>
          -
          <lpage>50</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>H. M. Rai</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Trivedi</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Shukla</surname>
          </string-name>
          .
          <article-title>ECG signal processing for abnormalities detection using multi-resolution wavelet transform and artificial neural network classifier</article-title>
          .
          <source>Measurement</source>
          ,
          <volume>46</volume>
          (
          <issue>9</issue>
          ):
          <fpage>3238</fpage>
          -
          <lpage>3246</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>B.</given-names>
            <surname>Rosner</surname>
          </string-name>
          .
          <article-title>Percentage points for a generalized ESD manyoutlier procedure</article-title>
          .
          <source>Technometrics</source>
          ,
          <volume>25</volume>
          (
          <issue>2</issue>
          ):
          <fpage>165</fpage>
          -
          <lpage>172</lpage>
          ,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>H.</given-names>
            <surname>Sivaraks</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Ratanamahatana</surname>
          </string-name>
          .
          <article-title>Robust and accurate anomaly detection in ECG artifacts using time series motif discovery</article-title>
          .
          <source>Computational and mathematical methods in medicine</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>T.</given-names>
            <surname>Stibor</surname>
          </string-name>
          et al.
          <article-title>Is negative selection appropriate for anomaly detection</article-title>
          ?
          <source>In Proceedings of the 7th annual conference on Genetic and evolutionary computation</source>
          , pages
          <fpage>321</fpage>
          -
          <lpage>328</lpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>M.</given-names>
            <surname>Thill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Konen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Bäck</surname>
          </string-name>
          .
          <article-title>Time series anomaly detection with discrete wavelet transforms and maximum likelihood estimation</article-title>
          .
          <source>In O. Valenzuela, I. Rojas</source>
          , et al., editors,
          <source>Intern. Conference on Time Series (ITISE)</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>O.</given-names>
            <surname>Vallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hochenbaum</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Kejariwal</surname>
          </string-name>
          .
          <article-title>A novel tech-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>