Active Learning for LSTM-autoencoder-based Anomaly Detection in Electrocardiogram Readings Tomáš Šabata1 and Martin Holeňa2 1 Faculty of Information Technology, Czech Technical University in Prague, Prague, Czech Republic tomas.sabata@fit.cvut.cz 2 Institute of Computer Science of the Czech Academy of Sciences, Prague, Czech Republic martin@cs.cas.cz Keywords: Active Learning, Anomaly detection, LSTM-Autoencoder, Time series 1 Introduction Recently, the amount of generated time series data has been increasing rapidly in many areas such as healthcare, security, meteorology and others. However, it is very rare that those time series are annotated. For this reason, unsupervised machine learning techniques such as anomaly detection are often used with such data. There exist many unsupervised algorithms for anomaly detection ranging from simple statistical techniques such as moving average or ARIMA till complex deep learning algorithms such as LSTM-autoencoder. For a nice overview of the recent algorithms we refer to read [2,1]. Difficulties with the unsupervised approach are: defining an anomaly score to correctly represent how anomalous is the time series, and setting a threshold for that score to distinguish between normal and anomaly data. Supervised anomaly detection, on the other hand, needs an expensive involvement of a human ex- pert. An additional problem with supervised anomaly detection is usually the occurrence of very low ratio of anomalies, yielding highly imbalanced data. In this extended abstract, we propose an active learning extension for an anomaly detector based on a LSTM-autoencoder. It performs active learning using various classification algorithms and addresses data imbalance with over- sampling and under-sampling techniques. We are currently testing it on the ECG5000 dataset from the UCR time series classification archive [3]. 2 Active learning for LSTM-autonecoder-based anomaly detection LSTM-autoencoder [9] is nowadays increasingly used to detect anomalies in time series data [11,5,4]. The algorithm aims to learn the identity function. It con- sists of two parts – an encoder and a decoder. The encoder compresses the © 2020 for this paper by its authors. Use permitted under CC BY 4.0. 2 Tomáš ActiveŠabata and Martin Learning Holeňa for LSTM-autoencoder-based Anomaly Detection 73 input representation of the data into a low-dimensional latent representation (usually called code) from which the decoder reconstructs the original input. The model parameters are found by minimizing the reconstruction error. Sam- ples are then considered anomalous if their reconstruction error is higher than a selected threshold. Although LSTM-autoencoder works well for time-series with complicated patterns, setting the anomaly score threshold can be very complicated without labelled data. Furthermore, with a higher ratio of anomalies present in the train- ing data, a simple setting of the threshold might produce a lot of false positives and false negatives. Therefore, we incorporated active learning into building the anomaly detector that uses the code of a previously trained LSTM autoencoder. The most related research has been done by Pigmentel [8], who proposed to use a classifier (logistic regression) with the latent layer of autoencoder together with anomaly score. In contrast, we experiment with the latent layer of a recurrent autoencoder without anomaly score and propose to use resampling techniques. First, an LSTM autoencoder is trained on unlabelled data. An initial anomaly detection threshold on reconstruction error distribution is selected. At the initial value of the threshold, we decided to use the mean plus three times the standard deviation of the reconstruction error as an initial threshold. Every sample with the reconstruction error above the threshold is labelled as an anomaly and the same number of samples below the threshold are labelled as normal. Further- more, instead of the original time-series data, we use their representation in the code layer. This provides us an artificially created, balanced dataset and con- verts the anomaly detection to a binary classification task. A classifier is trained on the created dataset and an active learning loop starts. In each iteration, a re- sampler is used to balance the new updated labelled dataset. The resampler can be either an undersampling or oversampling algorithm. The classifier is fitted using the resampled data. Uncertainty sampling (US) active learning framework [7] is used to select instances which should be labelled by an oracle. We use margin US, i.e. we select instances leading to the smallest difference between the likelihood of anomalous and normal data classes. Anomaly detection is then based on predictions of that classifier instead of on the anomaly threshold. The pseudo-code for the algorithm is shown in Algorithm 1. 3 Experiment and Results The proposed algorithm was evaluated on a benchmark time-series dataset with electrocardiogram readings [3]. Each heart-beat record in the dataset is labelled with one of 5 classes where the last three classes are very rare and we consider them as an anomaly. The dataset was split into training, validation and testing in the ratio 70:15:15. The validation dataset was used to find the hyperparameters for the LSTM-autoencoder. The autoencoder achieving the lowest f1 score in the anomaly detection on the validation data set was chosen. The final architecture of the encoder consisted of two LSTM cells. The cells have one hidden layer with 48 neurons in the first cell and 24 neurons in the second cell. The hidden state 74 Active Tomáš Learning Šabata and for LSTM-autoencoder-based Martin Holena Anomaly Detection 3 Algorithm 1: Active learning for LSTM-autoencoder anomaly detection Input: U: unlabelled data set of sequences θ: anomaly score threshold φ(·): query strategy utility function begin train LSTM autoencoder on data set U // calculate anomaly scores ai = |xi − decoder(encoder(xi ))|, x ∈ U sort a in the descending order sort x respectively to a i = 0, L = ∅ // Add anomalous data samples into the labelled dataset while ai > θ do L = L ∪ hencoder(xi ), 1i U = U \ xi i=i+1 end // Add normal data samples into the labelled dataset for j = i to 2i do L = L ∪ hencoder(xj ), 0i U = U \ xj end while stopping criterion is not met do R = resampler(L) train binary classifier m on R // Find the most informative seqeuence from U and ask for label x∗ = argmaxx∈U φ(x) y ∗ = query(x∗ ) L = L ∪ hx∗ , y ∗ i U = U \ x∗ end end of the last cell is copied and used as the input of the first cell of the decoder. The decoder has architecture mirrored. We experimented with 5 classifiers: logistic regression, decision tree, Gaus- sian naive Bayes classifier, k-nearest neighbours classifier and support vector machines. We experimented with 11 under-sampling and 4 over-sampling tech- niques taken from the imbalanced-learn python toolbox [6]. In the presented results, we report 5 under-sampling techniques that were best on average. In the experiment, we compared a fully unsupervised approach, in which anomalies are detected by using a chosen threshold, with our extension with respect to F1 score. Figure 1 shows how actively asking for annotations can improve the unsu- pervised anomaly detection with an LSTM-autoencoder (red dashed line). The 4 Tomáš ActiveŠabata and Martin Learning Holeňa for LSTM-autoencoder-based Anomaly Detection 75 best results were achieved with repeated edited nearest neighbours [10] and a k- nearest neighbour classifier. However, using an SVM as the base classifier yielded more stable performance. Moreover, a classifier model fed by labels outperformed the LSTM-autoen- coder with the initially set anomaly score threshold (red dashed line) and the best possible anomaly score threshold (green dashed line). The source code is available in GitHub repository 3 . 4 Conclusion We presented an active Learning for LSTM-autoencoder-based anomaly detec- tion for time-series data. An experiment on the ECG5000 data set has shown that the proposed method is able to boost the performance of the model significantly with only approximately 200 labelled samples. We plan next to experiment with variational LSTM-autoencoders and to pay attention to the interpretability of the detected anomalies . Acknowledgements The work has been supported by the grant 18-18080S of the Czech Science Foundation (GAČR). References 1. Ahmad, S., Lavin, A., Purdy, S., Agha, Z.: Unsupervised real-time anomaly detec- tion for streaming data. Neurocomputing 262, 134–147 (2017) 2. Cook, A., Mısırlı, G., Fan, Z.: Anomaly detection for iot time-series data: A survey. IEEE Internet of Things Journal (2019) 3. Dau, H.A., Keogh, E., Kamgar, K., Yeh, C.C.M., Zhu, Y., Gharghabi, S., Ratanamahatana, C.A., Yanping, Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G., Hexagon-ML: The ucr time series classification archive (October 2018), https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ 4. Ergen, T., Mirza, A.H., Kozat, S.S.: Unsupervised and semi-supervised anomaly detection with lstm neural networks. arXiv preprint arXiv:1710.09207 (2017) 5. Guo, Y., Liao, W., Wang, Q., Yu, L., Ji, T., Li, P.: Multidimensional time se- ries anomaly detection: A gru-based gaussian mixture variational autoencoder ap- proach. In: Asian Conference on Machine Learning. pp. 97–112 (2018) 6. Lemaı̂tre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. The Journal of Machine Learning Research 18(1), 559–563 (2017) 7. Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: SIGIR’94. pp. 3–12. Springer (1994) 8. Pimentel, T., Monteiro, M., Viana, J., Veloso, A., Ziviani, N.: A generalized active learning approach for unsupervised anomaly detection. stat 1050, 23 (2018) 3 https://github.com/tsabata/active_anomaly_detection 76 Active Tomáš Learning Šabata and for LSTM-autoencoder-based Martin Holena Anomaly Detection 5 model_type = SVM model_type = KNN model_type = LogReg 0.6 0.5 0.4 f1 0.3 0.2 0.1 0 10 20 30 iteration model_type = DT model_type = GNB 0.6 0.5 0.4 resampler_type EditedNN f1 0.3 RepeatedEditedNN AllKNN 0.2 NeighbourCleaningRule RandomUnderSampler 0.1 0 10 20 30 0 10 20 30 iteration iteration (a) Under-sampling techniques. model_type = SVM model_type = KNN model_type = LogReg 0.6 0.5 0.4 f1 0.3 0.2 0.1 0 10 20 30 iteration model_type = DT model_type = GNB 0.6 0.5 0.4 resampler_type f1 BorderlineSMOTE 0.3 RandomOverSampler 0.2 SMOTE SVMSMOTE 0.1 0 10 20 30 0 10 20 30 iteration iteration (b) Over-sampling techniques. Fig. 1: F1 score performance metric in active learning loop. The figure contains 5 models (support vector machines (SVM), k-nearest neighbours (KNN), logistic regression(LogReg), decision tree (DT) and Gaussian naive Bayes(GNB)) and 9 resampling techniques. The greyed area represents standard deviation. Red dashed line represents the performance of the LSTM-autoencoder anomaly de- tector without active learning and initial setting of the anomaly score threshold. Green dashed line represent the performance of the LSTM-autoencoder anomaly detector without active learning and the best setting of the anomaly score thresh- old, and blue dashed line represents the best-achieved performance in the last iteration of active learning loop. 6 Tomáš ActiveŠabata and Martin Learning Holeňa for LSTM-autoencoder-based Anomaly Detection 77 9. Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using lstms. In: International conference on machine learning. pp. 843–852 (2015) 10. Tomek, I., et al.: An experiment with the edited nearest-nieghbor rule. IEEE Trans- actions on Systems, Man, and Cyberbetics 6, 448–452 (1976) 11. Zhang, C., Chen, Y.: Time series anomaly detection with variational autoencoders. arXiv preprint arXiv:1907.01702 (2019)