=Paper= {{Paper |id=Vol-2322/dsi4-4 |storemode=property |title=Gaussian Processes for Anomaly Description in Production Environments |pdfUrl=https://ceur-ws.org/Vol-2322/dsi4-4.pdf |volume=Vol-2322 |authors=Christian Beecks,Kjeld Willy Schmidt,Fabian Berns,Alexander Graß |dblpUrl=https://dblp.org/rec/conf/edbt/BeecksSBG19 }} ==Gaussian Processes for Anomaly Description in Production Environments== https://ceur-ws.org/Vol-2322/dsi4-4.pdf
    Gaussian Processes for Anomaly Description in Production
                          Environments
                             Christian Beecks                                                         Kjeld Willy Schmidt
       University of Münster and Fraunhofer Institute for                                        University of Münster, Germany
        Applied Information Technology FIT, Germany                                              kjeld.schmidt@uni-muenster.de
              christian.beecks@uni-muenster.de

                                Fabian Berns                                                             Alexander Grass
                    University of Münster, Germany                                      Fraunhofer Institute for Applied Information
                    fabian.berns@uni-muenster.de                                                 Technology FIT, Germany
                                                                                             alexander.grass@fit.fraunhofer.de

                                                                                    which are combinations of well-known kernels. By fitting ker-
                                                                                    nel expressions to the corresponding sensor data, we are able
ABSTRACT                                                                            to decompose the inherent structure of an anomaly and to de-
Concomitant with the rapid spread of cyber-physical systems                         scribe its individual behavior such as linearity and periodicity
and the advancement of technologies from the Internet of Things,                    by natural language. For this purpose, we make use of Gaussian
many modern production environments are characterized by vast                       processes [20] and the Compositional Kernel Search model [11].
amounts of sensor data which are generated throughout differ-                       We carry out our analysis on the recently proposed IoT dataset
ent stages of production processes. In this paper, we propose a                     [5], a real-world industry 4.0 dataset, which has been collected
novel method for discovering the inherent structures of anom-                       within the EU project MONSOON1 . To sum up, we make the
alies arising in IoT sensor data. Our idea consists in modeling and                 following contributions:
describing anomalies by means of kernel expressions, which are                          • We propose a machine-learning-based method in order to
combinations of well-known kernels. The results of our empirical                          model anomalies and to describe their inherent compo-
analysis show that our proposal is suitable for modeling differ-                          nents.
ently structured anomalies. Moreover, the results indicate that                         • We enrich the MONSOON IoT dataset with a novel ground
Gaussian processes provide a powerful tool for future algorithmic                         truth derived from domain experts in order to further
investigations of IoT sensor data.                                                        stimulate research of anomaly detection algorithms on
                                                                                          this real-world dataset.
                                                                                       The paper is structured as follows. In Section 2, we outline re-
1    INTRODUCTION                                                                   lated work. In Section 3, we briefly introduce Gaussian processes
Concomitant with the rapid spread of cyber-physical systems                         and their application to adapt kernel expressions to sensor data.
and the advancement of technologies from the Internet of Things                     The preliminary results of our proposed method are reported
(IoT), many modern production environments are characterized                        and discussed in Section 4, before we conclude our paper with
by vast amounts of sensor data which are generated through-                         an outlook on future research directions in Section 5.
out different stages of production processes. These sensor data
streams are often considered as valuable information sources                        2    RELATED WORK
with a high economic potential and are characterized by high vol-                   Strongly related to our approach are anomaly detection algo-
ume, velocity and variety. Their data-driven value is indisputable                  rithms. There is a plethora of these algorithms including Z-Score
for optimizing and fine-tuning industrial production processes.                     [10], Mahalanobis Distance-Based, Empirical Covariance Estima-
   Monitoring sensor data from complex production processes in                      tion [18] [9], Mahalanobis Distance-Based, Robust Covariance
order to detect outliers or low-performing production behavior                      Estimation [22] [9], Subspace-based PCA Anomaly Detector [9],
caused by undesired drifts and trends, which we summarize as                        One-Class SVM [23] [18] [9] [12], Isolation Forest (I-Forest) [16]
anomalies, is a challenging task. Not only due to the massive                       [18], Gaussian Mixture Model [18] [9] [19], Deep Auto-Encoder
amount of sensor data but also due to different types of anom-                      [8], Local Outlier Factor [7] [18] [9] [1], Least Squares Anomaly
alies, which are potentially unknown in advance, manual or au-                      Detector [24], GADPL [14] and k-nearest Neighbour [13] [1] [12].
tomatic inspection systems are frequently supported by anomaly                         While these algorithms are all possible options for anomaly
detection algorithms. While the last years have witnessed the                       detection, as shown in different surveys such as [13], [19] and [9],
development of different anomaly detection algorithms, cf. the                      they are not directly suited for describing the inherent structure
work of Renaudie et al. [21] for a recent performance evaluation                    of anomalies, which is the major focus of this paper. We choose
in an industrial context, only less effort has been spent to the                    the means of Gaussian processes for anomaly description due
investigation of the inherent structure of an anomaly.                              to their capability to not only gather statistical indicators, but
   In this paper, we thus propose a novel method to discover the                    deliver the very characteristics of specific anomalous behavior
inherent structure of an anomaly. Our idea consists in model-                       from the data [20].
ing and describing anomalies by means of kernel expressions,                           For describing these characteristics, Lloyd et al. [17] have pro-
                                                                                    posed the Automatic Bayesian Covariance Discovery System
First International Workshop on Data Science for Industry 4.0.
Copyright ©2019 for the individual papers by the papers’ authors. Copying permit-   that adapts the Compositional Kernel Search Algorithm [11] by
ted for private and academic purposes. This volume is published and copyrighted
by its editors.                                                                     1 www.spire2030.eu/monsoon


Published in the Workshop Proceedings of the EDBT/ICDT 2019 Joint Conference (March 26, 2019, Lisbon, Portugal) on CEUR-WS.org.
                                  Figure 1: An example of the MONSOON IoT dataset with three anomalies.


adding intuitive natural language descriptions of the function                       Anomaly      BIC Kernel Expression
classes described by their models. In [15], these models are ex-                     0            -799 C*PER + C*PER + C*PER
panded to discover kernel structures which are able to explain                       1            -706 C*SE*PER + C*SE + C
multiple time series at once.                                                        2            -604 C*PER + C*PER + C*PER + C
   In this work, we make use of these algorithms in order to                         3            -921 C*SE*PER + C*PER + C
describe the inherent structures of anomalies, as shown in the                       4            -742 C*PER + C*PER + C*SE + C
following section.                                                                   5            -543 C*SE*LIN + C*SE + C*WN + C
                                                                                     6            -630 C*PER + C*SE + C*WN + C
3     GAUSSIAN PROCESSES                                                             7           -1020 C*PER + C*PER + C*PER + C*SE + C
                                                                                     8            -762 C*SE*PER + C*PER + C
In this section, we describe the analysis of anomalies in sensor
                                                                                     9           -1025 C*PER + C*PER + C*SE + C
data via Gaussian processes. To this end, we assume the sensor
                                                                                     10           -424 C*PER + C*SE + C*SE
data to be univariate2 and an anomaly A to be a finite subsequence
                                                                                     11           -849 C*PER + C*PER + C*SE + C
of timestamp-value pairs A = {(ti , vi )}i=i
                                         n with timestamps t ∈ T
                                                             i
                                                                                     12           -311 C*SE*PER + C*PER + C
and values vi ∈ R.
                                                                                     13           -860 C*LIN + C*PER + C*PER + C*PER + C
   As we do not know in advance the number of values and the
                                                                                     14           -339 C*PER + C*SE + C*SE
distances between individual timestamps, we can also thought
                                                                                     15           -590 C*SE*PER + C*PER + C*SE
of an anomaly A as a mathematical function A : T → R, which
                                                                                     16           -503 C*PER + C*SE + C
assigns every timestamp t ∈ T a real-valued value v(t) ∈ R. By
                                                                                     17           -602 C*SE*PER + C*SE + C*WN + C
considering the individual values v(t) to be random variables
                                                                                     18           -545 C*PER + C*SE + C*SE + C
following a Gaussian distribution, we can formalize the Gaussian
                                                                                     19           -804 C*PER + C*SE + C*WN + C
process as
                                                                                     20           -281 C*PER + C*SE + C*SE
                                                                                     21           -426 C*PER + C*PER + C*SE
                          v(t) ∼ GP(m(t), k(t, t ′ )),
                                                                                     22           -425 C*SE*PER + C*PER + C*SE
   where m(t) = E[v(t)] is the mean function and k(t, t ′ ) =                        23           -975 C*SE*PER + C*PER + C
E[(v(t) −m(t)) · (v(t ′ ) −m(t ′ ))] is the covariance function k : T ×              24          -1181 C*PER*LIN + C*PER + C*SE
T → R. In other words, a Gaussian process is a stochastic process                    25           -880 C*PER*PER + C*PER + C*PER + C
over random variables, where every subset of random variables                        26           -455 C*PER + C*PER + C*SE
from the Gaussian process follows a normal distribution. The                         27           -542 C*PER + C*SE + C*SE
distribution of the Gaussian process is the joint distribution of all              Table 1: Discovered kernel structures and the Bayesian In-
of these random variables and it is thus a probability distribution                formation Criterion (BIC) for the encountered 28 anom-
over (the space of) functions in RT .                                              alies.
   While the covariance function k defined above is a general
way to model the behavior of data, we aim to describe each
anomaly A by its own covariance function k A . That is, we aim
to learn a covariance function k A , which is then also denoted as
kernel expression in the domain of machine learning, by fitting                    example, an anomaly A with a highly weighted linear kernel k LIN
combinations of well-known kernels, such as                                        indicates a hidden linearity component while a highly weighted
                                                                                   periodic kernel k PER indicates an inherent periodicity in the
     • the constant kernel k C (t, t ′ ) = λ ∈ R,                                  anomaly.
     • the linear kernel k LIN (t, t ′ ) = (t − l) · (t ′ − l),                       The resulting kernel expressions are reported and discussed
                                                                |t −t ′ | 2
     • the squared exponential kernel k SE (t, t ′ ) = exp − 2l 2 ,                in the next section.
                                                                           ′
                                                             2 sin2 t −t
     • or the periodic kernel k PER (t, t ′ ) = exp               l2
                                                                      2
                                                                               .
                                                                                   4   PRELIMINARY RESULTS
   In order to individually fit a kernel expression to each anomaly
based on the aforementioned kernels, we use the compositional                      In this section, we report and discuss the results of our pre-
kernel model, as utilized for instance in [17]. This allows us to                  liminary performance evaluation. For this purpose, we use the
decompose an anomaly into individual components, which can be                      recently introduced MONSOON IoT dataset [5] which comprises
ranked by their contribution towards explaining the data. As an                    357,383 data records in total. This dataset is based on a real pro-
                                                                                   duction line of coffee capsules and the attribute under observation
2 It is noteworthy that this approach also applies to multivariate data.           is the plastification time, that is the time which is needed to melt
                                                                       5048

                                                                       8797

                                                                       7290

                                                                       2171

                                                                       1030

                                                                       9165
(plastify) the plastic melt for the actual injection molding cycle.




                                                                       -410
                                                                       -524
                                                                       -515
                                                                       -507
                                                                       -360
                                                                       -522
                                                                       -498
                                                                       -513
                                                                       -543
                                                                       -516

                                                                       -511

                                                                       -545

                                                                       -531

                                                                       -509

                                                                       -411

                                                                       -468
                                                                       -485
                                                                       -523
                                                                       -438
                                                                       -513

                                                                       -542
                                                                       792
                                                                       27
More information about this process can be found in [3].




                                                                       1852

                                                                       7576

                                                                       5533




                                                                       5304
                                                                       -242
                                                                       -452
                                                                       -463
                                                                       -397
                                                                       -437
                                                                       -461
                                                                       -454
                                                                       -434
                                                                       -456
                                                                       -417

                                                                       -393

                                                                       -477

                                                                       -462

                                                                       -403
                                                                       -430
                                                                       -305

                                                                       -408
                                                                       -446
                                                                       -438

                                                                       -425
                                                                       -455
                                                                       -437
                                                                       216




                                                                       508
   An overview of this attribute value, i.e. the pastification time,




                                                                       26
as a function of the cycle number is shown in Figure 1. As can




                                                                       22461




                                                                       16285

                                                                       14404

                                                                       10076

                                                                       16161

                                                                       10154

                                                                       16554




                                                                       10290
                                                                       10342
                                                                       1311




                                                                       1179



                                                                       4823



                                                                       1062

                                                                       5462
                                                                       -849
                                                                       -542
                                                                       -781

                                                                       -862
                                                                       -818
                                                                       -812
                                                                       -311
                                                                       -833



                                                                       -824



                                                                       -879




                                                                       -853

                                                                       -880
                                                                       406

                                                                       461
be seen in the figure, while the normal plastification time is at




                                                                       25
approximately 4.2 seconds, it drops down to less then 3 seconds




                                                                       85201




                                                                       16334

                                                                       10761



                                                                       12135



                                                                       15040
                                                                       -1008



                                                                       -1008



                                                                       -1030




                                                                       -1073




                                                                       -1181
                                                                       1396


                                                                       3229


                                                                       8431
                                                                       2919


                                                                       6388


                                                                       4675




                                                                       5359
                                                                       8193
                                                                       -524


                                                                       -948
                                                                       -979




                                                                       -766




                                                                       -798

                                                                       -942
in case of an anomaly. Supported by domain experts, we figured




                                                                       328

                                                                       733




                                                                       610


                                                                       647
                                                                       24
out 28 anomalies in total in this dataset, of which three are shown




                                                                       26246




                                                                       15147

                                                                       10828




                                                                       14474
                                                                       1452



                                                                       1284


                                                                       4963


                                                                       8695
                                                                       3514
                                                                       9877

                                                                       6903


                                                                       4740




                                                                       7871
                                                                       7497
                                                                       -922

                                                                       -826

                                                                       -113
                                                                       -866
                                                                       -901

                                                                       -942



                                                                       -868



                                                                       -940




                                                                       -975

                                                                       -326
in the above figure.




                                                                       999


                                                                       819

                                                                       781
                                                                       23


                                                                       25
   In the first series of experiments, we computed the best fit-




                                                                       22641

                                                                       20507

                                                                       17287

                                                                       21613

                                                                       11217

                                                                       20454




                                                                       15764
                                                                       6228




                                                                       7129
                                                                       -310
                                                                       -423
                                                                       -150

                                                                       -425
                                                                       -425
                                                                       -165
                                                                       -316
                                                                       -245



                                                                       -302

                                                                       -341

                                                                       -294




                                                                       -425
                                                                       -180

                                                                       -190
ting kernel expressions by means of the ABCD algorithm. The




                                                                       212




                                                                       461
                                                                       -97
                                                                       22




                                                                       72




                                                                       57
results are shown in Table 1 for each anomaly. Together with the




                                                                       27159

                                                                       23140

                                                                       19687

                                                                       18906

                                                                       10539

                                                                       34709




                                                                       15172
                                                                       -271
                                                                       -417


                                                                       -413
                                                                       -415
                                                                       -124
                                                                       -308
                                                                       -188



                                                                       -315

                                                                       -320




                                                                       -426
                                                                       -413
                                                                       -124

                                                                       -127

                                                                       -231
kernel expression of the corresponding anomaly, we also show




                                                                       429



                                                                       232




                                                                       146

                                                                       198
                                                                       -81




                                                                       -26
                                                                       21




                                                                       80
the Bayesian Information Criterion (BIC) value which models




                                                                       12974

                                                                       3996
                                                                       -169
                                                                       -269


                                                                       -262
                                                                       -269
                                                                       -105
                                                                       -228
                                                                       -150
                                                                       -280


                                                                       -195

                                                                       -212
                                                                       -224



                                                                       -281
                                                                       -259
                                                                       -251



                                                                       -185
                                                                       -151
the trade-off between model accuracy and size. As can be seen in




                                                                       299




                                                                       127
                                                                       -41
                                                                       -47




                                                                       -44




                                                                       -97
                                                                       -85




                                                                       -69

                                                                       -77
                                                                       20




                                                                       35
the table, all anomalies are well described by their corresponding




                                                                       6690

                                                                       3580

                                                                       3175

                                                                       5836

                                                                       1630

                                                                       9121




                                                                       1934
                                                                       1349
                                                                       -783
                                                                       -732
                                                                       -825
                                                                       -825
                                                                       -748
                                                                       -711
                                                                       -797
                                                                       -790
                                                                       -796

                                                                       -843

                                                                       -825

                                                                       -740

                                                                       -813

                                                                       -804


                                                                       -545
                                                                       -842

                                                                       -809
                                                                       632


                                                                       866
                                                                       -74
kernel expression (lower BIC values indicate better fit and vice




                                                                       19
versa). Surprisingly many kernel expressions do not show a lin-




                                                                       1186

                                                                       5856

                                                                       4713

                                                                       1180



                                                                       4203
                                                                       -453
                                                                       -536
                                                                       -518
                                                                       -534
                                                                       -554
                                                                       -364
                                                                       -505
                                                                       -532
                                                                       -552
                                                                       -531

                                                                       -525

                                                                       -560

                                                                       -536

                                                                       -524
                                                                       -545
                                                                       -467

                                                                       -458
                                                                       -504
                                                                       -551

                                                                       -532

                                                                       -535
                                                                       497

                                                                       439
ear component k LIN , although some anomalies clearly show this

                                                                       18
linear tendency. We figure out that this is due to overfitting of




                                                                       16475



                                                                       13876

                                                                       15531

                                                                       12023

                                                                       15564




                                                                       12148
                                                                       1010




                                                                       8959




                                                                       1939




                                                                       6945
                                                                       -453
                                                                       -552
                                                                       -370
                                                                       -347
                                                                       -564
                                                                       -555
                                                                       -386
                                                                       -514
                                                                       -417

                                                                       -369

                                                                       -471

                                                                       -486

                                                                       -602

                                                                       -188



                                                                       -388

                                                                       -387
                                                                       802
                                                                       -37
the kernel expression in the ABCD algorithm. We aim to address         17
this issue in future research.




                                                                       6864

                                                                       3486




                                                                       2244
                                                                       -336
                                                                       -456
                                                                       -471
                                                                       -448
                                                                       -517
                                                                       -481
                                                                       -467
                                                                       -443
                                                                       -473
                                                                       -445

                                                                       -453

                                                                       -495

                                                                       -464
                                                                       -503
                                                                       -488
                                                                       -431
                                                                       -313

                                                                       -410
                                                                       -459
                                                                       -446

                                                                       -438
                                                                       -450
                                                                       -455
                                                                       744




                                                                       494
                                                                       16

   In the second series of experiments, we evaluated how suitable




                                                                       14360

                                                                       22502

                                                                       12064

                                                                       18424

                                                                       10184

                                                                       13959




                                                                       13738
a kernel expression of a certain anomaly fits to other anomalies.




                                                                       7729




                                                                       9626
                                                                       -540
                                                                       -586
                                                                       -451

                                                                       -574
                                                                       -576
                                                                       -470
                                                                       -521
                                                                       -524

                                                                       -366

                                                                       -553

                                                                       -590

                                                                       -157




                                                                       -487

                                                                       -433
                                                                       768



                                                                       180




                                                                       392

                                                                       412
                                                                       -67
                                                                       15




The results in form of the corresponding BIC values are summa-




                                                                       2792
                                                                       -138
                                                                       -334
                                                                       -339
                                                                       -315
                                                                       -359
                                                                       -343
                                                                       -336
                                                                       -308
                                                                       -339
                                                                       -315
                                                                       -321
                                                                       -282

                                                                       -349
                                                                       -339
                                                                       -334
                                                                       -344
                                                                       -353
                                                                       -305
                                                                       -237
                                                                       -319
                                                                       -300
                                                                       -334
                                                                       -318

                                                                       -302
                                                                       -322
                                                                       -318
rized in Table 2. As can be seen in this table, kernel expressions




                                                                       496
                                                                       14




of a certain anomaly do in general not fit to other anomalies.
                                                                       14637




                                                                       21496

                                                                       19649

                                                                       16934

                                                                       18458

                                                                       11521

                                                                       24654
                                                                       1808


                                                                       5682



                                                                       1571




                                                                       5298




                                                                       9810
                                                                       9651
                                                                       -775



                                                                       -837
                                                                       -803
                                                                       -690

                                                                       -739



                                                                       -860



                                                                       -826




                                                                       -757
One reason for this behavior is the high degree of idiosyncrasy
                                                                       507
                                                                       244
                                                                       960




                                                                       863


                                                                       807

                                                                       141
                                                                       304
                                                                       13




of the anomalies. Another reason might be the overfitting issue
                                                                       -290
                                                                       -291
                                                                       -266
                                                                       -304
                                                                       -317
                                                                       -278
                                                                       -265
                                                                       -289
                                                                       -295
                                                                       -295
                                                                       -256
                                                                       -308
                                                                       -311
                                                                       -298
                                                                       -286
                                                                       -284
                                                                       -304
                                                                       -293
                                                                       -289
                                                                       -257
                                                                       -257
                                                                       -232
                                                                       -261
                                                                       -305

                                                                       -290
                                                                       -280
                                                                       -294
                                                                       295
                                                                       12




mentioned above.
   To sum up, we have investigated the potential of describing
                                                                       4323




                                                                       8490

                                                                       7084

                                                                       4633

                                                                       8375

                                                                       3878

                                                                       6579
                                                                       2186




                                                                       6449
                                                                       5189
                                                                       -780
                                                                       -745
                                                                       -819
                                                                       -178
                                                                       -762
                                                                       -727
                                                                       -786
                                                                       -799
                                                                       -789

                                                                       -849

                                                                       -807



                                                                       -795

                                                                       -205


                                                                       -205
                                                                       -828

                                                                       -773
                                                                       109




                                                                       972
                                                                       11




anomalies in IoT sensor data by means of kernel expressions.
                                                                       12444

                                                                       10135
Our preliminary results indicate that our proposal is well suited



                                                                       8444
                                                                       -323
                                                                       -369
                                                                       -220
                                                                       -188
                                                                       -340
                                                                       -353
                                                                       -280
                                                                       -346
                                                                       -295
                                                                       -424
                                                                       -205

                                                                       -338

                                                                       -349
                                                                       -353
                                                                       -152
                                                                       -273


                                                                       -348
                                                                       -330
                                                                       -254

                                                                       -269
                                                                       -335
                                                                       -325
                                                                       -53
                                                                       10
                                                                       19




                                                                       34
for this purpose. As one major challenge, we figure out that the
                                                                       201580




problem of overfitting needs to be addressed in future research.
                                                                       23668

                                                                       12678

                                                                       11941




                                                                       22178




                                                                       11777
                                                                       -1003




                                                                       -1025




                                                                       -1057
                                                                       1050

                                                                       2158



                                                                       3202


                                                                       4547



                                                                       4590
                                                                       9764

                                                                       8144
                                                                       1111

                                                                       6198


                                                                       1209
                                                                       1530

                                                                       9609
                                                                       -988
                                                                       -955
                                                                       -605




                                                                       -615
                                                                       656




                                                                       897
                                                                       -44
                                                                       9




5   CONCLUSIONS AND FUTURE WORK
                                                                       12783




                                                                       17842

                                                                       15087

                                                                       15353

                                                                       15094



                                                                       15895




                                                                       13005
                                                                       12860
                                                                       2069




                                                                       1813



                                                                       3848


                                                                       6420
                                                                       1317

                                                                       7569
                                                                       -705
                                                                       -741
                                                                       -634

                                                                       -726
                                                                       -711
                                                                       -610
                                                                       -762
                                                                       -680



                                                                       -707



                                                                       -724




                                                                       -687

                                                                       -616
                                                                       419

                                                                       652
                                                                       8




In this paper, we have addressed the problem of discovering the
                                                                       104642




inherent structures of anomalies arising in IoT sensor data. To this
                                                                       21496
                                                                       16915
                                                                       13649

                                                                       11292

                                                                       23901

                                                                       11202

                                                                       22105




                                                                       10486
                                                                       11450
                                                                       -1012

                                                                       -1020




                                                                       -1034
                                                                       1261
                                                                       1428
                                                                       1161



                                                                       4043




                                                                       2757




                                                                       8595



                                                                       1349
                                                                       -973




                                                                       -942


                                                                       -949



                                                                       -531




                                                                       -699
                                                                       595


                                                                       473

                                                                       610
end, we have proposed to model and describe anomalies by means
                                                                       7




of kernel expressions, which are combinations of well-known
                                                                       60870

                                                                       12783




                                                                       26932

                                                                       42825
                                                                       38575
                                                                       43440

                                                                       39816
                                                                       25790
                                                                       46105

                                                                       31476

                                                                       45911
                                                                       25113




                                                                       43508
                                                                       35339
                                                                       8272




                                                                       8137


                                                                       5830
                                                                       -384

                                                                       -131

                                                                       -596
                                                                       -630


                                                                       -321



                                                                       -274



                                                                       -253




                                                                       -189
                                                                       938
                                                                       -99


kernels. The results of our empirical analysis show that our pro-
                                                                       40
                                                                       6




posal is suitable for modeling differently structured anomalies.
                                                                       15065




                                                                       28200

                                                                       20436

                                                                       17399

                                                                       22063

                                                                       13235

                                                                       32643




                                                                       14088
                                                                       6647
                                                                       1187




                                                                       8295
                                                                       -427
                                                                       -522
                                                                       -252
                                                                       -188
                                                                       -543
                                                                       -519
                                                                       -277
                                                                       -435
                                                                       -353

                                                                       -275

                                                                       -390

                                                                       -460

                                                                       -192




                                                                       -337

                                                                       -274
                                                                       386
                                                                       -70




Moreover, the results indicate that Gaussian processes provide a
                                                                       5




powerful tool for future algorithmic investigations of IoT sensor
                                                                       10581

                                                                       10540




                                                                       14644
                                                                       1493




                                                                       9764




                                                                       6465

                                                                       2545




                                                                       4124
                                                                       2295
                                                                       -613
                                                                       -650
                                                                       -575
                                                                       -742
                                                                       -664
                                                                       -645
                                                                       -571
                                                                       -654
                                                                       -603

                                                                       -613

                                                                       -661

                                                                       -648

                                                                       -687

                                                                       -303



                                                                       -622

                                                                       -589
                                                                       844
                                                                       212

                                                                       833




data.
                                                                       4




   In future work, we aim to address the problem of overfitting
                                                                       15859




                                                                       11743




                                                                       15472
                                                                       4123
                                                                       8404

                                                                       5692
                                                                       2896
                                                                       7627

                                                                       4008


                                                                       2677




                                                                       5935
                                                                       4425
                                                                       -874
                                                                       -307
                                                                       -921

                                                                       -861
                                                                       -812
                                                                       -879

                                                                       -876



                                                                       -897



                                                                       -883




                                                                       -922
                                                                       -307
                                                                       -860
                                                                       510




                                                                       422
                                                                       -37




                                                                       -26




by modifying the grammar used within the ABCD algorithm for
                                                                       3




computing the kernel expressions. In addition, we aim to further
                                                                       41283




                                                                       39634
                                                                       35152
                                                                       43065

                                                                       34307
                                                                       18892
                                                                       45060

                                                                       30046

                                                                       53438
                                                                       22082




                                                                       32787
                                                                       32689
                                                                       8037




                                                                       4854


                                                                       4611
                                                                       -371
                                                                       -604


                                                                       -603
                                                                       -598
                                                                       -203
                                                                       -251
                                                                       -341



                                                                       -329




                                                                       -188

                                                                       -171
                                                                       108
                                                                       -49




develop our proposal in order to not only describe anomalies
                                                                       43
                                                                       2




but also detect anomalies (which is not the focus of the current
                                                                       17963

                                                                       12806

                                                                       10272

                                                                       16636



                                                                       22702




                                                                       10101
                                                                       7434




                                                                       8646


                                                                       6959




                                                                       9489
                                                                       -706
                                                                       -706
                                                                       -689

                                                                       -712
                                                                       -690
                                                                       -628
                                                                       -695
                                                                       -676

                                                                       -264

                                                                       -690

                                                                       -142

                                                                       -706




                                                                       -690

                                                                       -648
                                                                       595


                                                                       618

                                                                       618




paper). For this purpose, we aim to measure similarity in IoT
                                                                       64
                                                                       1




sensor data by incorporating Gaussian processes into adaptive
                                                                          0
                                                                       -799
                                                                       -690
                                                                       -644
                                                                       -746
                                                                       -757
                                                                       -664
                                                                       -630
                                                                       -710
                                                                       -705
                                                                       -707
                                                                       1100
                                                                       -768
                                                                       3497
                                                                       -725
                                                                        782
                                                                       -682
                                                                        841
                                                                       -728
                                                                        239
                                                                       -666
                                                                       1073
                                                                        167
                                                                       -550
                                                                       -740
                                                                        749
                                                                       -721
                                                                        108
                                                                        188




distance-based similarity models, such as the Signature Matching
Distance [6], and query processing algorithms [2, 4].
                                                                       Kernel




                                                                       10
                                                                       11
                                                                       12
                                                                       13
                                                                       14
                                                                       15
                                                                       16
                                                                       17
                                                                       18
                                                                       19
                                                                       20
                                                                       21
                                                                       22
                                                                       23
                                                                       24
                                                                       25
                                                                       26
                                                                       27
                                                                       0
                                                                       1
                                                                       2
                                                                       3
                                                                       4
                                                                       5
                                                                       6
                                                                       7
                                                                       8
                                                                       9




ACKNOWLEDGMENTS                                                        Table 2: Evaluation of the BIC for every kernel expression
The project underlying this paper has received funding from            against every anomaly.
the European Union’s Horizon 2020 research and innovation
program under grant agreement No 723650 (MONSOON). This
paper reflects only the authors’ views and the commission is not
responsible for any use that may be made of the information it                              The MIT Press.
contains.                                                                              [21] David Renaudie, Maria A. Zuluaga, and Rodrigo Acuna-Agost. 2018. Bench-
                                                                                            marking Anomaly Detection Algorithms in an Industrial Context: Dealing
                                                                                            with Scarce Labels and Multiple Positive Types. In IEEE International Confer-
REFERENCES                                                                                  ence on Big Data. 1227–1236.
                                                                                       [22] Peter J Rousseeuw. 1984. Least median of squares regression. Journal of the
 [1] Bryan Auslander, Kalyan Moy Gupta, and David W. Aha. 2011. A com-                      American statistical association 79, 388 (1984), 871–880.
     parative evaluation of anomaly detection algorithms for maritime video            [23] Bernhard Schölkopf, John C. Platt, John C. Shawe-Taylor, Alex J. Smola, and
     surveillance. In Proc. SPIE 8019, Sensors, and Command, Control, Communi-              Robert C. Williamson. 2001. Estimating the Support of a High-Dimensional
     cations, and Intelligence (C3I) Technologies for Homeland Security and Home-           Distribution. Neural Comput. 13, 7 (July 2001), 1443–1471. https://doi.org/10.
     land Defense X (SPIE Proceedings), Edward M. Carapezza (Ed.). SPIE, 801907.            1162/089976601750264965
     https://doi.org/10.1117/12.883535                                                 [24] M. Tavallaee, N. Stakhanova, and A. A. Ghorbani. 2010. Toward Credible
 [2] Christian Beecks and Max Berrendorf. 2018. Optimal k-Nearest-Neighbor                  Evaluation of Anomaly-Based Intrusion-Detection Methods. IEEE Transactions
     Query Processing via Multiple Lower Bound Approximations. In IEEE Inter-               on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 40, 5
     national Conference on Big Data, Big Data 2018, Seattle, WA, USA, December             (September 2010), 516–524. https://doi.org/10.1109/TSMCC.2010.2048428
     10-13, 2018. IEEE, 614–623. https://doi.org/10.1109/BigData.2018.8622493
 [3] Christian Beecks, Shreekantha Devasya, and Ruben Schlutter. 2019. Machine
     Learning for Enhanced Waste Quantity Reduction: Insights from the MON-
     SOON Industry 4.0 Project. In Machine Learning for Cyber Physical Systems,
     Jürgen Beyerer, Christian Kühnert, and Oliver Niggemann (Eds.). Springer
     Berlin Heidelberg, Berlin, Heidelberg, 1–6.
 [4] Christian Beecks and Alexander Graß. 2016. Multi-step threshold algorithm for
     efficient feature-based query processing in large-scale multimedia databases.
     In 2016 IEEE International Conference on Big Data, BigData 2016, Washington
     DC, USA, December 5-8, 2016. IEEE, 596–605. https://doi.org/10.1109/BigData.
     2016.7840652
 [5] Christian Beecks, Alexander Grass, and Shreekantha Devasya. 2018. Metric
     Indexing for Efficient Data Access in the Internet of Things. In IEEE Interna-
     tional Conference on Big Data, Big Data 2018, Seattle, WA, USA, December 10-13,
     2018. IEEE, 5132–5136. https://doi.org/10.1109/BigData.2018.8622387
 [6] Christian Beecks, Steffen Kirchhoff, and Thomas Seidl. 2013. Signature match-
     ing distance for content-based image retrieval. In International Conference on
     Multimedia Retrieval, ICMR’13, Dallas, TX, USA, April 16-19, 2013. ACM, 41–48.
     https://doi.org/10.1145/2461466.2461474
 [7] Markus Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000.
     LOF: Identifying Density-Based Local Outliers. In Proceedings of the 2000 ACM
     SIGMOD International Conference on Management of Data. ACM, 93–104.
 [8] Arno Candel, Erin LeDell, Viraj Parmar, and Anisha Arora. 2018. Deep
     Learning with H2O. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/
     DeepLearningBooklet.pdf. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/
     booklets/DeepLearningBooklet.pdf (Accessed on 01/08/2019).
 [9] Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly De-
     tection: A Survey. ACM Comput. Surv. 41, 3, Article 15 (July 2009), 58 pages.
     https://doi.org/10.1145/1541880.1541882
[10] R. Domingues, F. Buonora, R. Senesi, and O. Thonnard. 2016. An Application
     of Unsupervised Fraud Detection to Passenger Name Records. In 2016 46th
     Annual IEEE/IFIP International Conference on Dependable Systems and Networks
     Workshop (DSN-W). 54–59. https://doi.org/10.1109/DSN-W.2016.21
[11] David Duvenaud, James Robert Lloyd, Roger Grosse, Joshua B. Tenenbaum,
     and Zoubin Ghahramani. 2013. Structure Discovery in Nonparametric Regres-
     sion through Compositional Kernel Search. arXiv:arXiv:1302.4922
[12] Eleazar Eskin, Andrew Arnold, Michael Prerau, Leonid Portnoy, and Sal Stolfo.
     2002. A Geometric Framework for Unsupervised Anomaly Detection. In
     Applications of Data Mining in Computer Security, Daniel Barbará and Sushil
     Jajodia (Eds.). Advances in Information Security, 1568-2633, Vol. 6. Springer
     US and Imprint and Springer, Boston, MA, 77–101. https://doi.org/10.1007/
     978-1-4615-0953-0{_}4
[13] Markus Goldstein and Seiichi Uchida. 2016. A Comparative Evaluation of
     Unsupervised Anomaly Detection Algorithms for Multivariate Data. (2016).
[14] Alexander Graß, Christian Beecks, and Jose Angel Carvajal Soto. 2019. Un-
     supervised Anomaly Detection in Production Lines. In Machine Learning
     for Cyber Physical Systems, Jürgen Beyerer, Christian Kühnert, and Oliver
     Niggemann (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 18–25.
[15] Yunseong Hwang, Anh Tong, and Jaesik Choi. 2016. Automatic Construction
     of Nonparametric Relational Regression Models for Multiple Time Series.
     In ICML 2016: Proceedings of the 33rd International Conference on Machine
     Learning (Proceedings of Machine Learning Research), Maria Florina Balcan
     and Kilian Q. Weinberger (Eds.), Vol. 48. PLMR, 3030–3039.
[16] Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In
     Eighth IEEE International Conference on Data Mining, 2008, Fosca Giannotti
     (Ed.). IEEE, Piscataway, NJ, 413–422. https://doi.org/10.1109/ICDM.2008.17
[17] James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum,
     and Zoubin Ghahramani. 2014. Automatic Construction and Natural-Language
     Description of Nonparametric Regression Models. arXiv:arXiv:1402.4304
[18] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel,
     Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron
     Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cour-
     napeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011.
     Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12 (Nov. 2011),
     2825–2830. http://dl.acm.org/citation.cfm?id=1953048.2078195
[19] Clifton Phua, Vincent C. S. Lee, Kate Smith-Miles, and Ross W. Gayler. 2010.
     A Comprehensive Survey of Data Mining-based Fraud Detection Research.
     CoRR abs/1009.6119 (2010). arXiv:1009.6119 http://arxiv.org/abs/1009.6119
[20] Carl Edward Rasmussen and Christopher K. I. Williams. 2006. Gaussian
     Processes for Machine Learning (Adaptive Computation And Machine Learning).