Gaussian Processes for Anomaly Description in Production Environments Christian Beecks Kjeld Willy Schmidt University of Münster and Fraunhofer Institute for University of Münster, Germany Applied Information Technology FIT, Germany kjeld.schmidt@uni-muenster.de christian.beecks@uni-muenster.de Fabian Berns Alexander Grass University of Münster, Germany Fraunhofer Institute for Applied Information fabian.berns@uni-muenster.de Technology FIT, Germany alexander.grass@fit.fraunhofer.de which are combinations of well-known kernels. By fitting ker- nel expressions to the corresponding sensor data, we are able ABSTRACT to decompose the inherent structure of an anomaly and to de- Concomitant with the rapid spread of cyber-physical systems scribe its individual behavior such as linearity and periodicity and the advancement of technologies from the Internet of Things, by natural language. For this purpose, we make use of Gaussian many modern production environments are characterized by vast processes [20] and the Compositional Kernel Search model [11]. amounts of sensor data which are generated throughout differ- We carry out our analysis on the recently proposed IoT dataset ent stages of production processes. In this paper, we propose a [5], a real-world industry 4.0 dataset, which has been collected novel method for discovering the inherent structures of anom- within the EU project MONSOON1 . To sum up, we make the alies arising in IoT sensor data. Our idea consists in modeling and following contributions: describing anomalies by means of kernel expressions, which are • We propose a machine-learning-based method in order to combinations of well-known kernels. The results of our empirical model anomalies and to describe their inherent compo- analysis show that our proposal is suitable for modeling differ- nents. ently structured anomalies. Moreover, the results indicate that • We enrich the MONSOON IoT dataset with a novel ground Gaussian processes provide a powerful tool for future algorithmic truth derived from domain experts in order to further investigations of IoT sensor data. stimulate research of anomaly detection algorithms on this real-world dataset. The paper is structured as follows. In Section 2, we outline re- 1 INTRODUCTION lated work. In Section 3, we briefly introduce Gaussian processes Concomitant with the rapid spread of cyber-physical systems and their application to adapt kernel expressions to sensor data. and the advancement of technologies from the Internet of Things The preliminary results of our proposed method are reported (IoT), many modern production environments are characterized and discussed in Section 4, before we conclude our paper with by vast amounts of sensor data which are generated through- an outlook on future research directions in Section 5. out different stages of production processes. These sensor data streams are often considered as valuable information sources 2 RELATED WORK with a high economic potential and are characterized by high vol- Strongly related to our approach are anomaly detection algo- ume, velocity and variety. Their data-driven value is indisputable rithms. There is a plethora of these algorithms including Z-Score for optimizing and fine-tuning industrial production processes. [10], Mahalanobis Distance-Based, Empirical Covariance Estima- Monitoring sensor data from complex production processes in tion [18] [9], Mahalanobis Distance-Based, Robust Covariance order to detect outliers or low-performing production behavior Estimation [22] [9], Subspace-based PCA Anomaly Detector [9], caused by undesired drifts and trends, which we summarize as One-Class SVM [23] [18] [9] [12], Isolation Forest (I-Forest) [16] anomalies, is a challenging task. Not only due to the massive [18], Gaussian Mixture Model [18] [9] [19], Deep Auto-Encoder amount of sensor data but also due to different types of anom- [8], Local Outlier Factor [7] [18] [9] [1], Least Squares Anomaly alies, which are potentially unknown in advance, manual or au- Detector [24], GADPL [14] and k-nearest Neighbour [13] [1] [12]. tomatic inspection systems are frequently supported by anomaly While these algorithms are all possible options for anomaly detection algorithms. While the last years have witnessed the detection, as shown in different surveys such as [13], [19] and [9], development of different anomaly detection algorithms, cf. the they are not directly suited for describing the inherent structure work of Renaudie et al. [21] for a recent performance evaluation of anomalies, which is the major focus of this paper. We choose in an industrial context, only less effort has been spent to the the means of Gaussian processes for anomaly description due investigation of the inherent structure of an anomaly. to their capability to not only gather statistical indicators, but In this paper, we thus propose a novel method to discover the deliver the very characteristics of specific anomalous behavior inherent structure of an anomaly. Our idea consists in model- from the data [20]. ing and describing anomalies by means of kernel expressions, For describing these characteristics, Lloyd et al. [17] have pro- posed the Automatic Bayesian Covariance Discovery System First International Workshop on Data Science for Industry 4.0. Copyright ©2019 for the individual papers by the papers’ authors. Copying permit- that adapts the Compositional Kernel Search Algorithm [11] by ted for private and academic purposes. This volume is published and copyrighted by its editors. 1 www.spire2030.eu/monsoon Published in the Workshop Proceedings of the EDBT/ICDT 2019 Joint Conference (March 26, 2019, Lisbon, Portugal) on CEUR-WS.org. Figure 1: An example of the MONSOON IoT dataset with three anomalies. adding intuitive natural language descriptions of the function Anomaly BIC Kernel Expression classes described by their models. In [15], these models are ex- 0 -799 C*PER + C*PER + C*PER panded to discover kernel structures which are able to explain 1 -706 C*SE*PER + C*SE + C multiple time series at once. 2 -604 C*PER + C*PER + C*PER + C In this work, we make use of these algorithms in order to 3 -921 C*SE*PER + C*PER + C describe the inherent structures of anomalies, as shown in the 4 -742 C*PER + C*PER + C*SE + C following section. 5 -543 C*SE*LIN + C*SE + C*WN + C 6 -630 C*PER + C*SE + C*WN + C 3 GAUSSIAN PROCESSES 7 -1020 C*PER + C*PER + C*PER + C*SE + C 8 -762 C*SE*PER + C*PER + C In this section, we describe the analysis of anomalies in sensor 9 -1025 C*PER + C*PER + C*SE + C data via Gaussian processes. To this end, we assume the sensor 10 -424 C*PER + C*SE + C*SE data to be univariate2 and an anomaly A to be a finite subsequence 11 -849 C*PER + C*PER + C*SE + C of timestamp-value pairs A = {(ti , vi )}i=i n with timestamps t ∈ T i 12 -311 C*SE*PER + C*PER + C and values vi ∈ R. 13 -860 C*LIN + C*PER + C*PER + C*PER + C As we do not know in advance the number of values and the 14 -339 C*PER + C*SE + C*SE distances between individual timestamps, we can also thought 15 -590 C*SE*PER + C*PER + C*SE of an anomaly A as a mathematical function A : T → R, which 16 -503 C*PER + C*SE + C assigns every timestamp t ∈ T a real-valued value v(t) ∈ R. By 17 -602 C*SE*PER + C*SE + C*WN + C considering the individual values v(t) to be random variables 18 -545 C*PER + C*SE + C*SE + C following a Gaussian distribution, we can formalize the Gaussian 19 -804 C*PER + C*SE + C*WN + C process as 20 -281 C*PER + C*SE + C*SE 21 -426 C*PER + C*PER + C*SE v(t) ∼ GP(m(t), k(t, t ′ )), 22 -425 C*SE*PER + C*PER + C*SE where m(t) = E[v(t)] is the mean function and k(t, t ′ ) = 23 -975 C*SE*PER + C*PER + C E[(v(t) −m(t)) · (v(t ′ ) −m(t ′ ))] is the covariance function k : T × 24 -1181 C*PER*LIN + C*PER + C*SE T → R. In other words, a Gaussian process is a stochastic process 25 -880 C*PER*PER + C*PER + C*PER + C over random variables, where every subset of random variables 26 -455 C*PER + C*PER + C*SE from the Gaussian process follows a normal distribution. The 27 -542 C*PER + C*SE + C*SE distribution of the Gaussian process is the joint distribution of all Table 1: Discovered kernel structures and the Bayesian In- of these random variables and it is thus a probability distribution formation Criterion (BIC) for the encountered 28 anom- over (the space of) functions in RT . alies. While the covariance function k defined above is a general way to model the behavior of data, we aim to describe each anomaly A by its own covariance function k A . That is, we aim to learn a covariance function k A , which is then also denoted as kernel expression in the domain of machine learning, by fitting example, an anomaly A with a highly weighted linear kernel k LIN combinations of well-known kernels, such as indicates a hidden linearity component while a highly weighted periodic kernel k PER indicates an inherent periodicity in the • the constant kernel k C (t, t ′ ) = λ ∈ R, anomaly. • the linear kernel k LIN (t, t ′ ) = (t − l) · (t ′ − l), The resulting kernel expressions are reported and discussed |t −t ′ | 2 • the squared exponential kernel k SE (t, t ′ ) = exp − 2l 2 , in the next section. ′ 2 sin2 t −t • or the periodic kernel k PER (t, t ′ ) = exp l2 2 . 4 PRELIMINARY RESULTS In order to individually fit a kernel expression to each anomaly based on the aforementioned kernels, we use the compositional In this section, we report and discuss the results of our pre- kernel model, as utilized for instance in [17]. This allows us to liminary performance evaluation. For this purpose, we use the decompose an anomaly into individual components, which can be recently introduced MONSOON IoT dataset [5] which comprises ranked by their contribution towards explaining the data. As an 357,383 data records in total. This dataset is based on a real pro- duction line of coffee capsules and the attribute under observation 2 It is noteworthy that this approach also applies to multivariate data. is the plastification time, that is the time which is needed to melt 5048 8797 7290 2171 1030 9165 (plastify) the plastic melt for the actual injection molding cycle. -410 -524 -515 -507 -360 -522 -498 -513 -543 -516 -511 -545 -531 -509 -411 -468 -485 -523 -438 -513 -542 792 27 More information about this process can be found in [3]. 1852 7576 5533 5304 -242 -452 -463 -397 -437 -461 -454 -434 -456 -417 -393 -477 -462 -403 -430 -305 -408 -446 -438 -425 -455 -437 216 508 An overview of this attribute value, i.e. the pastification time, 26 as a function of the cycle number is shown in Figure 1. As can 22461 16285 14404 10076 16161 10154 16554 10290 10342 1311 1179 4823 1062 5462 -849 -542 -781 -862 -818 -812 -311 -833 -824 -879 -853 -880 406 461 be seen in the figure, while the normal plastification time is at 25 approximately 4.2 seconds, it drops down to less then 3 seconds 85201 16334 10761 12135 15040 -1008 -1008 -1030 -1073 -1181 1396 3229 8431 2919 6388 4675 5359 8193 -524 -948 -979 -766 -798 -942 in case of an anomaly. Supported by domain experts, we figured 328 733 610 647 24 out 28 anomalies in total in this dataset, of which three are shown 26246 15147 10828 14474 1452 1284 4963 8695 3514 9877 6903 4740 7871 7497 -922 -826 -113 -866 -901 -942 -868 -940 -975 -326 in the above figure. 999 819 781 23 25 In the first series of experiments, we computed the best fit- 22641 20507 17287 21613 11217 20454 15764 6228 7129 -310 -423 -150 -425 -425 -165 -316 -245 -302 -341 -294 -425 -180 -190 ting kernel expressions by means of the ABCD algorithm. The 212 461 -97 22 72 57 results are shown in Table 1 for each anomaly. Together with the 27159 23140 19687 18906 10539 34709 15172 -271 -417 -413 -415 -124 -308 -188 -315 -320 -426 -413 -124 -127 -231 kernel expression of the corresponding anomaly, we also show 429 232 146 198 -81 -26 21 80 the Bayesian Information Criterion (BIC) value which models 12974 3996 -169 -269 -262 -269 -105 -228 -150 -280 -195 -212 -224 -281 -259 -251 -185 -151 the trade-off between model accuracy and size. As can be seen in 299 127 -41 -47 -44 -97 -85 -69 -77 20 35 the table, all anomalies are well described by their corresponding 6690 3580 3175 5836 1630 9121 1934 1349 -783 -732 -825 -825 -748 -711 -797 -790 -796 -843 -825 -740 -813 -804 -545 -842 -809 632 866 -74 kernel expression (lower BIC values indicate better fit and vice 19 versa). Surprisingly many kernel expressions do not show a lin- 1186 5856 4713 1180 4203 -453 -536 -518 -534 -554 -364 -505 -532 -552 -531 -525 -560 -536 -524 -545 -467 -458 -504 -551 -532 -535 497 439 ear component k LIN , although some anomalies clearly show this 18 linear tendency. We figure out that this is due to overfitting of 16475 13876 15531 12023 15564 12148 1010 8959 1939 6945 -453 -552 -370 -347 -564 -555 -386 -514 -417 -369 -471 -486 -602 -188 -388 -387 802 -37 the kernel expression in the ABCD algorithm. We aim to address 17 this issue in future research. 6864 3486 2244 -336 -456 -471 -448 -517 -481 -467 -443 -473 -445 -453 -495 -464 -503 -488 -431 -313 -410 -459 -446 -438 -450 -455 744 494 16 In the second series of experiments, we evaluated how suitable 14360 22502 12064 18424 10184 13959 13738 a kernel expression of a certain anomaly fits to other anomalies. 7729 9626 -540 -586 -451 -574 -576 -470 -521 -524 -366 -553 -590 -157 -487 -433 768 180 392 412 -67 15 The results in form of the corresponding BIC values are summa- 2792 -138 -334 -339 -315 -359 -343 -336 -308 -339 -315 -321 -282 -349 -339 -334 -344 -353 -305 -237 -319 -300 -334 -318 -302 -322 -318 rized in Table 2. As can be seen in this table, kernel expressions 496 14 of a certain anomaly do in general not fit to other anomalies. 14637 21496 19649 16934 18458 11521 24654 1808 5682 1571 5298 9810 9651 -775 -837 -803 -690 -739 -860 -826 -757 One reason for this behavior is the high degree of idiosyncrasy 507 244 960 863 807 141 304 13 of the anomalies. Another reason might be the overfitting issue -290 -291 -266 -304 -317 -278 -265 -289 -295 -295 -256 -308 -311 -298 -286 -284 -304 -293 -289 -257 -257 -232 -261 -305 -290 -280 -294 295 12 mentioned above. To sum up, we have investigated the potential of describing 4323 8490 7084 4633 8375 3878 6579 2186 6449 5189 -780 -745 -819 -178 -762 -727 -786 -799 -789 -849 -807 -795 -205 -205 -828 -773 109 972 11 anomalies in IoT sensor data by means of kernel expressions. 12444 10135 Our preliminary results indicate that our proposal is well suited 8444 -323 -369 -220 -188 -340 -353 -280 -346 -295 -424 -205 -338 -349 -353 -152 -273 -348 -330 -254 -269 -335 -325 -53 10 19 34 for this purpose. As one major challenge, we figure out that the 201580 problem of overfitting needs to be addressed in future research. 23668 12678 11941 22178 11777 -1003 -1025 -1057 1050 2158 3202 4547 4590 9764 8144 1111 6198 1209 1530 9609 -988 -955 -605 -615 656 897 -44 9 5 CONCLUSIONS AND FUTURE WORK 12783 17842 15087 15353 15094 15895 13005 12860 2069 1813 3848 6420 1317 7569 -705 -741 -634 -726 -711 -610 -762 -680 -707 -724 -687 -616 419 652 8 In this paper, we have addressed the problem of discovering the 104642 inherent structures of anomalies arising in IoT sensor data. To this 21496 16915 13649 11292 23901 11202 22105 10486 11450 -1012 -1020 -1034 1261 1428 1161 4043 2757 8595 1349 -973 -942 -949 -531 -699 595 473 610 end, we have proposed to model and describe anomalies by means 7 of kernel expressions, which are combinations of well-known 60870 12783 26932 42825 38575 43440 39816 25790 46105 31476 45911 25113 43508 35339 8272 8137 5830 -384 -131 -596 -630 -321 -274 -253 -189 938 -99 kernels. The results of our empirical analysis show that our pro- 40 6 posal is suitable for modeling differently structured anomalies. 15065 28200 20436 17399 22063 13235 32643 14088 6647 1187 8295 -427 -522 -252 -188 -543 -519 -277 -435 -353 -275 -390 -460 -192 -337 -274 386 -70 Moreover, the results indicate that Gaussian processes provide a 5 powerful tool for future algorithmic investigations of IoT sensor 10581 10540 14644 1493 9764 6465 2545 4124 2295 -613 -650 -575 -742 -664 -645 -571 -654 -603 -613 -661 -648 -687 -303 -622 -589 844 212 833 data. 4 In future work, we aim to address the problem of overfitting 15859 11743 15472 4123 8404 5692 2896 7627 4008 2677 5935 4425 -874 -307 -921 -861 -812 -879 -876 -897 -883 -922 -307 -860 510 422 -37 -26 by modifying the grammar used within the ABCD algorithm for 3 computing the kernel expressions. In addition, we aim to further 41283 39634 35152 43065 34307 18892 45060 30046 53438 22082 32787 32689 8037 4854 4611 -371 -604 -603 -598 -203 -251 -341 -329 -188 -171 108 -49 develop our proposal in order to not only describe anomalies 43 2 but also detect anomalies (which is not the focus of the current 17963 12806 10272 16636 22702 10101 7434 8646 6959 9489 -706 -706 -689 -712 -690 -628 -695 -676 -264 -690 -142 -706 -690 -648 595 618 618 paper). For this purpose, we aim to measure similarity in IoT 64 1 sensor data by incorporating Gaussian processes into adaptive 0 -799 -690 -644 -746 -757 -664 -630 -710 -705 -707 1100 -768 3497 -725 782 -682 841 -728 239 -666 1073 167 -550 -740 749 -721 108 188 distance-based similarity models, such as the Signature Matching Distance [6], and query processing algorithms [2, 4]. Kernel 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 0 1 2 3 4 5 6 7 8 9 ACKNOWLEDGMENTS Table 2: Evaluation of the BIC for every kernel expression The project underlying this paper has received funding from against every anomaly. the European Union’s Horizon 2020 research and innovation program under grant agreement No 723650 (MONSOON). This paper reflects only the authors’ views and the commission is not responsible for any use that may be made of the information it The MIT Press. contains. [21] David Renaudie, Maria A. Zuluaga, and Rodrigo Acuna-Agost. 2018. Bench- marking Anomaly Detection Algorithms in an Industrial Context: Dealing with Scarce Labels and Multiple Positive Types. In IEEE International Confer- REFERENCES ence on Big Data. 1227–1236. [22] Peter J Rousseeuw. 1984. Least median of squares regression. Journal of the [1] Bryan Auslander, Kalyan Moy Gupta, and David W. Aha. 2011. A com- American statistical association 79, 388 (1984), 871–880. parative evaluation of anomaly detection algorithms for maritime video [23] Bernhard Schölkopf, John C. Platt, John C. Shawe-Taylor, Alex J. Smola, and surveillance. In Proc. SPIE 8019, Sensors, and Command, Control, Communi- Robert C. Williamson. 2001. Estimating the Support of a High-Dimensional cations, and Intelligence (C3I) Technologies for Homeland Security and Home- Distribution. Neural Comput. 13, 7 (July 2001), 1443–1471. https://doi.org/10. land Defense X (SPIE Proceedings), Edward M. Carapezza (Ed.). SPIE, 801907. 1162/089976601750264965 https://doi.org/10.1117/12.883535 [24] M. Tavallaee, N. Stakhanova, and A. A. Ghorbani. 2010. Toward Credible [2] Christian Beecks and Max Berrendorf. 2018. Optimal k-Nearest-Neighbor Evaluation of Anomaly-Based Intrusion-Detection Methods. IEEE Transactions Query Processing via Multiple Lower Bound Approximations. In IEEE Inter- on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 40, 5 national Conference on Big Data, Big Data 2018, Seattle, WA, USA, December (September 2010), 516–524. https://doi.org/10.1109/TSMCC.2010.2048428 10-13, 2018. IEEE, 614–623. https://doi.org/10.1109/BigData.2018.8622493 [3] Christian Beecks, Shreekantha Devasya, and Ruben Schlutter. 2019. Machine Learning for Enhanced Waste Quantity Reduction: Insights from the MON- SOON Industry 4.0 Project. In Machine Learning for Cyber Physical Systems, Jürgen Beyerer, Christian Kühnert, and Oliver Niggemann (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1–6. [4] Christian Beecks and Alexander Graß. 2016. Multi-step threshold algorithm for efficient feature-based query processing in large-scale multimedia databases. In 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, December 5-8, 2016. IEEE, 596–605. https://doi.org/10.1109/BigData. 2016.7840652 [5] Christian Beecks, Alexander Grass, and Shreekantha Devasya. 2018. Metric Indexing for Efficient Data Access in the Internet of Things. In IEEE Interna- tional Conference on Big Data, Big Data 2018, Seattle, WA, USA, December 10-13, 2018. IEEE, 5132–5136. https://doi.org/10.1109/BigData.2018.8622387 [6] Christian Beecks, Steffen Kirchhoff, and Thomas Seidl. 2013. Signature match- ing distance for content-based image retrieval. In International Conference on Multimedia Retrieval, ICMR’13, Dallas, TX, USA, April 16-19, 2013. ACM, 41–48. https://doi.org/10.1145/2461466.2461474 [7] Markus Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying Density-Based Local Outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. ACM, 93–104. [8] Arno Candel, Erin LeDell, Viraj Parmar, and Anisha Arora. 2018. Deep Learning with H2O. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/ DeepLearningBooklet.pdf. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/ booklets/DeepLearningBooklet.pdf (Accessed on 01/08/2019). [9] Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly De- tection: A Survey. ACM Comput. Surv. 41, 3, Article 15 (July 2009), 58 pages. https://doi.org/10.1145/1541880.1541882 [10] R. Domingues, F. Buonora, R. Senesi, and O. Thonnard. 2016. An Application of Unsupervised Fraud Detection to Passenger Name Records. In 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop (DSN-W). 54–59. https://doi.org/10.1109/DSN-W.2016.21 [11] David Duvenaud, James Robert Lloyd, Roger Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. 2013. Structure Discovery in Nonparametric Regres- sion through Compositional Kernel Search. arXiv:arXiv:1302.4922 [12] Eleazar Eskin, Andrew Arnold, Michael Prerau, Leonid Portnoy, and Sal Stolfo. 2002. A Geometric Framework for Unsupervised Anomaly Detection. In Applications of Data Mining in Computer Security, Daniel Barbará and Sushil Jajodia (Eds.). Advances in Information Security, 1568-2633, Vol. 6. Springer US and Imprint and Springer, Boston, MA, 77–101. https://doi.org/10.1007/ 978-1-4615-0953-0{_}4 [13] Markus Goldstein and Seiichi Uchida. 2016. A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data. (2016). [14] Alexander Graß, Christian Beecks, and Jose Angel Carvajal Soto. 2019. Un- supervised Anomaly Detection in Production Lines. In Machine Learning for Cyber Physical Systems, Jürgen Beyerer, Christian Kühnert, and Oliver Niggemann (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 18–25. [15] Yunseong Hwang, Anh Tong, and Jaesik Choi. 2016. Automatic Construction of Nonparametric Relational Regression Models for Multiple Time Series. In ICML 2016: Proceedings of the 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research), Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PLMR, 3030–3039. [16] Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In Eighth IEEE International Conference on Data Mining, 2008, Fosca Giannotti (Ed.). IEEE, Piscataway, NJ, 413–422. https://doi.org/10.1109/ICDM.2008.17 [17] James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. 2014. Automatic Construction and Natural-Language Description of Nonparametric Regression Models. arXiv:arXiv:1402.4304 [18] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cour- napeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12 (Nov. 2011), 2825–2830. http://dl.acm.org/citation.cfm?id=1953048.2078195 [19] Clifton Phua, Vincent C. S. Lee, Kate Smith-Miles, and Ross W. Gayler. 2010. A Comprehensive Survey of Data Mining-based Fraud Detection Research. CoRR abs/1009.6119 (2010). arXiv:1009.6119 http://arxiv.org/abs/1009.6119 [20] Carl Edward Rasmussen and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning (Adaptive Computation And Machine Learning).