=Paper=
{{Paper
|id=Vol-2289/paper12
|storemode=property
|title=Anomaly Detection using Similarity-based One-Class SVM for Network Traffic Characterization
|pdfUrl=https://ceur-ws.org/Vol-2289/paper12.pdf
|volume=Vol-2289
|authors=Bouchra Lamrini,Augustin Gjini, Simon Daudin, Pascal Pratmarty, François Armando,Louise Travé-Massuyès
|dblpUrl=https://dblp.org/rec/conf/safeprocess/LamriniGDPAT18
}}
==Anomaly Detection using Similarity-based One-Class SVM for Network Traffic Characterization==
Anomaly Detection Using Similarity-based One-Class SVM for Network
Traffic Characterization
Bouchra Lamrini1 , Augustin Gjini1 , Simon Daudin1 ,
François Armando1 , Pascal Pratmarty1 and Louise Travé-Massuyès2
1
LivingObjects, Toulouse, France
e-mail: {bouchra.lamrini,augustin.gjini,simon.daudin,françois.armando,pascal.pratmarty}@livingobjects.com
2
LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
e-mail: louise@laas.fr
Abstract SVMs perform at least as good as other methods in
terms of the generalization error [6].
In this paper∗ , we investigate an unsu- Many factors contributed to the high popularity of
pervised machine learning method based SVMs today. First of all, their theoretical founda-
on one-class Support Vector Machines for tions have been deeply investigated and they come
anomaly detection in network traffic. In with a convex optimization procedure ensuring that
the absence of any prior expert knowledge the global optimum will be reached. Moreover, the
on anomalous data, we propose the use of solution is sparse making it really efficient in com-
a similarity measure for Multivariate Time parison to other kernel-based approaches [7]. In ad-
Series to evaluate the output results and se- dition, they may use a non linear transformation in
lect the best model. A set of Key Perfor- the form of a kernel that even allow SVMs to be con-
mance Indicators, oriented for network and sidered as a dimensionality reduction technique [8].
traffic monitoring, has been used to demon- One-Class SVMs [9] have been devised for cases in
strate the promising performance of the un- which one class only is known and the problem is to
supervised learning approach. detect anything outside this class. This is known as
novelty detection and it refers to automatic identifi-
cation of unforeseen or abnormal phenomena [1; 10;
1 Introduction 11], i.e. outliers, embedded in a large amount of nor-
Anomaly detection aims at identifying unusual pat- mal data.
terns in data that do not conform to expected behav- In contrast to traditional SVMs, One-Class SVMs
ior [1]. These non-conforming patterns are generally (OC-SVM) learn a decision boundary that achieves
referred in different application fields to as anomalies, maximum separation between the samples of the
aberrations, discordant observations, exceptions, nov- known class and the origin [12]. Only a small frac-
elty, outliers, peculiarities or contaminants, surprises, tion of data points are allowed to lie on the other side
strangeness. There has been applications in several of the decision boundary: those data points are con-
application fields from intrusion detection, e.g. iden- sidered as outliers.
tifying strange patterns in network traffic that could Anomaly detection is particularly important for
signal a hack [2], to system health monitoring, e.g. network traffic. The observed growth rate of infor-
spotting a malignant tumor in an MRI image scan [3], mational and economic damage caused by intention-
and from fraud detection in credit card transactions ally or unintentionally attacks, faults, and anomalies
[4], to fault detection in operating environments [5]. has been a driving force behind efforts to ensure that
In this paper we are interested in anomaly detection network monitoring systems are able to detect ab-
in network traffic. normal activity and characterize it. The limitations
Support Vector Machines (SVMs) have been one of of computing and storage resources require compe-
the most successful machine learning techniques that tence and ingenuity to effectively characterize ever-
can be used in a variety of classification applications. changing network traffic trends.
Non-availability of labelled data, high costs for
∗
Index Terms: Anomaly Detection, Support Vector Ma- constituting labeled training data, and need to iden-
chines (SVMs), One-Class SVMs, Unsupervised Learning, tify anomalous and novel observations in data without
Model Selection, Similarity Measure, Multivariate Time having necessarily seen an example of that behaviour
Series (MTS). in the past are the main challenges tackled in this
work. A central issue is model selection, i.e. choice Section 2 presents our case study. Section 3 is de-
of the optimal hyper-parameters that define the OC- voted to an overview of SVMs and One-Class SVMs
SVM learning configuration. This requires a method methods. In Section 4, we present the EROS similar-
to evaluate the results. ity measure used to search for the best training model
This paper contributes to this problem by evalu- for anomaly detection. Experimental results and re-
ating the results of the trained model by comparing lated discussions are provided in Section 5 to demon-
samples predicted normal with samples in the train- strate the approach performance. Finally, Section 6
ing set. Because samples are composed of a set of concludes the paper.
signals over a temporal window, we propose to use a
similarity index for Multivariate Time Series (MTS) 2 Case Study
called EROS (Extended Frobenius Norm). The re-
Processing network traffic involves dealing with an
sults of the model are evaluated iterativelly for differ-
immense amount of data that is quickly and con-
ent hyper-parameters of OC-SVM and the model that
stantly varying. Considering the enormous amount
evaluates best is selected. We show that OC-SVM in
of data involved it is very easy for malicious activities
combination with the Eros index [read more in Sec-
to go undetected, especially without any knowledge a
tion 4.1] can create automatically tuned reliable clas-
priori about the nature of the traffic, like it is often the
sifiers with reasonable computation cost.
case in the network domain.
The remainder of this paper is organized as follows.
Figure 1: Training Data Set. The y-axis represents the KPI signals. From top to bottom: Total Incoming
Traffic, Total Outgoing Traffic, Server Delay, and Network Delay.
In this study, data was collected from a real-time A history of two months (3408 samples) of data
monitoring platform dedicated to ensure key applica- generated at a 5 minutes rate was collected for each
tion performance. 51 sites using applications of the of the four KPIs above and for each site. Data was
same kind and having roughly the same uses at the segmented into time-windows wi , each of 48 points.
same time, were chosen. For each application, we Figure 1 shows the data contained in the training set.
selected carefully four relevant Key Performance In- Each time-window wi ∈ {w0 , ..., wl }, l = 70, is
dicators (KPIs) describing: identified between lines and there are 70 samples.
1. Total Incoming Traffic. Let us notice that the data samples that are provided
2. Total Outgoing Traffic. to the OC-SVM classification algorithm are multivari-
ate, each composed of four KPI segments over the
3. Server Delay, i.e. the connection time to the same time-window. The idea is to detect insidious
server, which sets the expiration time for send- problems that require an analysis of the signal under-
ing a request. lying interactions. We therefore want to detect “ab-
4. Network Delay that specifies how long it takes normal windows”. Each segment is characterized by
for a bit of data to travel across the network from seven statistical attributes: minimum (MIN), maxi-
one node to another. mum (MAX), mean (MEAN), median (MED), stan-
Page 2/8
dard deviation (STD), number of average crossings tributes MEAN, MED, MAX, MIN, STD, nbpMean,
(nbpMean), squared mean error (SME) computed be- and SME are illustrated in sub-figures from left to
tween the raw data and the linear fitting. These at- right. We can already notice some overruns that will
tributes are used by OC-SVM after normalization. make considerable contribution to the classifier pro-
Figure 2 shows the attributes built for the 70 segments file defined by the decision function.
of the four KPIs mentioned above. The seven at-
Figure 2: Attributes built for training data KPIs. Each point is calculated on a segment of 48 points.
3 An Unsupervised Similarity-based where:
+ 1, if (wT x + b) ≥ 0
Method for Anomaly Detection T
sign(w x + b) =
Support Vector Machines (SVMs) have always been − 1, otherwise
of interest in anomaly detection because of their abil- The concept of SVMs is to find (w, b) such that the
ity to provide non-linear classification through a ker- hyperplane is positioned at maximum distance of the
nel function. Via this short overview, we show that nearest training samples of the two classes in order
SVMs are theoretically well founded. We briefly in- to reduce the generalization error. This distance de-
troduce the basic concepts of SVMs then focus on fines the "margin". SVMs have first been proposed
OC-SVM that we adopted in this study. A more de- for linearly separable classification tasks. However
tailed presentation can be found in [13] and a good they were extended to non-linearly separable classifi-
example is available on URL using "LibSVM" library cation problems. Some samples are allowed to violate
of Matlab. the margin (soft-margin SVMs) and a non-linear deci-
sion boundary can be obtained by projecting the data
3.1 Support Vector Machines into a higher dimension space thanks to a non-linear
function Φ(x). Data points may not be linearly sepa-
Let us consider the traditional two-class support vec- rable in their original space but they are “lifted” into
tor machines in which we are given a set of n train- a feature space F where a hyperplane can separate
ing instances S = {(x1 , y1 ), (x2 , y2 ), ..., (xn , yn )}. them. When that hyperplane is projected back into
xi ∈ Rd , where yi is the class label of the xi in- the input space, it has a non-linear shape. To prevent
stance and yi ∈ [−1, +1]. The linear SVMs classifier the SVM classifier from over-fitting noisy data, slack
recovers an optimal separating hyperplane maximiz- variables ξ are introduced to allow some data points to
ing the "margin" of the classifier with the equation: lie within the margin, and the parameter C > 0 (Eq.2)
wT x + b = 0, with w ∈ F and b ∈ R two parameters tunes the trade-off between the classification error on
witch determine the position of the decision hyper- the training data and margin maximization. The ob-
plane in feature space F (its orientation is tuned by w jective function of SVM classifiers has the following
and its displacement by b). The decision function can minimization formulation:
thus be generally written as: 2 n
kwk X
min +C ξi (2)
f (x; w, b) = sign(wT x + b) ∈ {−1, +1} (1) w,b,ξi 2 i=1
Page 3/8
Subject to:
yi (wT φ(xi ) + b) ≥ 1 − ξi
ξi ≥ 0, i = 1, ..., n
The minimization problem is solved using La-
grange Multipliers αi , i = 1, . . . , n. The new deci- Figure 3: SVM results with two kernel functions.
sion function rule for a data point x is:
n
Schölkopf [13], which is presented in the next para-
X
f (x) = sign( αi yi K(x, xi ) + b) (3)
i=1 graph, and that according to Tax and Duin [14].
In the feature space F , OC-SVM method basically
Every αi > 0 is weighted in the decision function and separates all the data points from the origin by a hy-
thus supports the machine. Since SVMs are consid- perplane and it maximizes the distance of this hyper-
ered to be sparse, there are relatively few Lagrange plane to the origin. This results in a binary function
multipliers with a non-zero value. which captures the region of the input space where
The function K(x, xi ) = Φ(x)T Φ(xi ) is known as the training data lives. Thus the function returns +1
the kernel function. Since the outcome of the decision in a “small” region (capturing the training data points)
function only relies on the dot-product of the vectors and −1 elsewhere. The quadratic programming min-
in the feature space F (i.e. all the pairwise distances imization function is slightly different from the origi-
for the vectors), it is not necessary to perform an ex- nal stated by (Eq.2) and (Eq.3):
plicit projection. As long as a function K provides the
same results, it can be used instead. This is known as 2 n
kwk 1 X
the kernel trick. min + ξi − ρ (5)
w,ξi ,ρ 2 ηn i=1
Popular choices for the kernel function are linear,
polynomial, and sigmoïdal. In this study, we used the Subject to:
Gaussian Radial Base Function: w.φ(xi ) ≥ ρ − ξi
− kx − xi k
2 ξ≥0
K(x, xi ) = exp( ) (4) i = 1, ..., n
2σ 2
where σ ∈ R is a kernel parameter and kx − xi k is Schölkopf et al. [13] has reformulated SVMs to
the dissimilarity measure. With this set of formulas take the new regularization parameter η instead of
and concepts we are able to classify a set of data point C in the original formulation (Eq.2 and Eq.3). The
into two classes with a non-linear decision function. range of C is from zero to infinity, but η is always
The power of the method comes from using ker- between [0, 1]. η characterizes the solution in a nice
nel functions, which enable it to operate in a high- interpretable way: (1) it sets an upper bound on the
dimensional, implicit feature space without ever com- fraction of outliers, e.g. the training examples re-
puting the coordinates of the data in that space, but garded out-of-class, (2) and it sets a lower bound on
rather by simply computing the inner products be- the number of training examples used as support vec-
tween the images of all pairs of data in the fea- tors.
ture space. This operation is often computationally Again by using Lagrange techniques and using a
cheaper than the explicit computation of the coordi- kernel function for the dot-product calculations, the
nates. Figure 3 illustrates a non linearly separable decision function becomes:
data set clustered by SVM with two different kernel f (x) = sign((wΦ(xi )) − ρ)
functions: linear and radial based. The observations n
are plotted blue or magenta depending on the class
X (6)
= sign( αi K(x, xi ) − ρ)
and the background is darker as the distance from the i=1
hyperplane is higher. Scores are given on the right OC-SVMs thus create a hyperplane characterized
bottom corners and show a significant increase for the by w and ρ which has maximal distance from the ori-
non linear kernel. gin in the feature space F , hence separating all the
3.2 One-Class Support Vector Machines data points from the origin.
One-Class SVMs (OC-SVMs) are used to separate 4 Similarity-based Performance
the data of one specific class, the target class, from
other data. They are trained with positive examples Evaluation for Model Selection
only, i.e. data points from the target class. There are In this section, we address the problem of fitting the
two different approaches: the approach according to hyper-parameters of OC-SVM automatically, that is
Page 4/8
the problem of automatic model selection. In the case where hai , bi i is the inner product of ai and bi , w is a
of OC-SVM, this amounts to choose the kernel pa- weight vector which
Pn is based on the eigenvalues of the
rameter γ and the regularization parameter η. A pair MTS dataset, i=1 wi = 1 and cos(θi ) is the angle
(γi , ηj ) is defined as a learning configuration. between ai and bi . The range of Eros is between 0
For this purpose, we propose to run OC-SVM for and 1, with 1 being the most similar.
several learning configurations and select the best Definition 2. Singular Value Decomposition. Let
configuration by evaluating the similarity of the KPI A be a general real m × n matrix. The singular value
signals for the windows tagged normal by OC-SVM decomposition (SVD) of A is the factorization:
and the KPI windows of the training data that are as-
sumed to be normal examples. Since a sample win- A = U ΣV (8)
dow is composed of several KPI signals, we need where U is a column-orthonormal N × r matrix, r
a multidimensional similarity index for Multivariate is the rank of the matrix A, Σ is a diagonal r × r
Time Series (MTS). matrix of the eigenvalues γi of A where γ1 ≥ ·· ≥
γr ≥ 0 and V is a column-orthonormal M ×r matrix.
4.1 The similarity Index Eros The eigenvalues and the corresponding eigenvectors
Multidimensional similarity measures aim to indicate are sorted in non-increasing order. V is called the
simultaneously the level of similarity between several right eigenvector matrix, and U the left eigenvector
datasets (databases, data clusters, etc.). Unlike other matrix.
methods [15; 16; 17] that seek the level of similarity Yang et al. (2005) [18] describe the similarity in-
between two variables by omitting the existing cor- dex algorithm with the following steps:
relation between the set of variables, a multidimen- 1. Compute the covariance matrix of each MTS.
sional method takes into account the contribution of
each variable in defining a global similarity measure. 2. Use SVD to decompose each covariance matrix.
One of the methods processing MTS is the method 3. Recover eigenvalues and eigenvectors.
Eros (Extended Frobenius Norm) [18]. The interest 4. Compute the weight w of individuals by normal-
behind this method lies in its ability to assess the simi- izing the eigenvalues [18].
larity of MTS composed of a different number of data
5. Compute similarity between MTS.
points. It indeed uses the eigenvalues and eigenvec-
tors of the covariance matrix that has size n × n, n be- 4.2 Automatic Model Selection
ing the number of times series composing the MTS.
The first task is to define the learning configurations
In doing so, it also performs dimension reduction be-
that will be tested with OC-SVM. We follow the steps
cause the number of observations is generally higher
below:
than that of the variables.
We briefly describe the similarity index Eros based 1. Define the hyper-parameter space and a proce-
on the Frobenius Norm below. The definitions and no- dure to explore this space. In our case, we set a
tations used in this paper are taken from [19]. We first min-max and a variation step to constitute a grid
formally define the similarity index Eros. Next, we (β × β) value pairs, i.e. β values for each hyper-
present the algorithm describing the similarity mea- parameter.
sure procedure and the approach proposed for model 2. Explore the hyper-parameter space and set OC-
selection. SVM accordingly: for each pair of values, one
Definition 1. Eros (Extended Frobenius Norm). OC-SVM classifier is obtained after the learning
Let A and B be two MTS items of size mA × n step. The best configuration is retained by using
and mB × n respectively. Let VA and VB two right the Eros similarity index on the validation data
eigenvector matrices by applying Singular Value De- (25% of all data) and the training data (50% of
composition (SVD) to the covariance matrices, MA all data). The corresponding OC-SVM classifier
and MB , respectively. Let VA = [a1 , . . . , an ] and is taken as the best model.
VB = [b1 , . . . , bn ], where ai and bi are column-
3. Once found the best model, anomaly detection is
orthonormal of size n. The Eros similarity of A and
performed on new data to evaluate how well the
B is then defined as:
model behaves.
n
The similarity of windows tagged normal by OC-
SVM, denoted by MTSnormal , k = 0, ..., p, and the
X
Eros(A, B, w) = wi |< ai , bi >| k
data windows of the training data (considered as nor-
i=1
n
(7) mal), denoted by MTSlearn
l , l = 1, ..., q, is obtained
as follows. For every learning configuration [Figure
X
= |cos(θi )|
i=1
4] given by (γi , ηj ):
Page 5/8
1. Compute Eros for every window pair all the window pairs.
(MTSnormal
k , MTSlearn
l ), k = 0, ..., p, and The best learning configuration is taken as the one
l = 1, ..., q. leading to the maximal "Erosmean " value over all con-
2. Compute the average similarity "Erosmean " over sidered learning configurations (γi , ηj ).
Figure 4: Diagram showing the model selection process.
5 Experiments on the Case Study tection not only of singular points, but also of an atyp-
ical set of points even if each point taken.
Our detection approach was applied to the case study
Acquired raw data provide KPIs with different
presented in Section 2. A history of two months of
ranges, then features (attributes) themselves don’t
data generated every 5 minutes for four KPIs was col-
have homogeneous ranges. In order to guarantee good
lected over 151 sites. The window segmentation [20]
performances of the anomaly detection approach, we
was performed after analyzing two points that can sig-
chose to normalize these attributes with respect to
nificantly impact the detection stage:
their maximal and minimal values with a tolerance
• choice of the time-window length, i.e. the num- using a threshold s ∈ [0, 1]. This standard preprocess-
ber of hours and samples to take account in a ing ensures that all the attributes contribute equally to
window, the decision process independently of the parameters
responsible of KPI dynamics.
• definition of a reliable methodology to normalize To automatically select the best model, the hyper-
training and testing datasets versus in these. parameter space was discretized with a 10 × 10 grid,
As mentioned in section 2, the time-window length i.e. β = 10. 100 learning configurations were there-
was chosen of 4 hours, i.e. 48 samples. Clearly, fore evaluated to select the best model. This off-line
when access to web applications is established in a task was performed for each application site and ap-
few hours, a window of four hours is considered a peared computationally feasible.
significant period for traffic analysis. As noted above, Figure 5 shows some of the test results (25% of all
each time-window is characterized by seven statistical data). From 24 time-windows (wi ∈ {w0 , . . . , wm },
attributes: minimum (MIN), maximum (MAX), mean m = 23), 4 anomalies have been detected represented
(MEAN), median (MED), standard deviation (STD), by the 4 time-windows (yellow colored): w3 , w4 , w10
number of average crossings (nbpMean), squared and w12 . The results were confirmed with Parallel
mean error (SME) computed between the raw data Coordinates plots given in Figure 6.
and their linear fit. The attributes are computed for In a Parallel Coordinates Plot, each attribute is
each time-window in order to obtain a multidimen- given its own axis and all the axes are placed paral-
sional scatter plot, where each point represents a time- lel to each other. Values are plotted as a series of lines
window. One of the major interests of segmentation that connect across all the axes. This means that each
and feature computation is to synthesize the informa- line corresponds to one data window for which we
tion contained in a time-window. This allows the de- have 7 × 4 attributes (7 features for every KPI).
Page 6/8
The order in which the axes are arranged can im- Presenting this type of detection can ensure that the
pact the way how the reader understands the data. network administrators adopts another reasoning to
One reason for this is that the relationships between characterize the nature of the traffic (normal, abnor-
adjacent variables are easier to perceive than those be- mal, critical, ...) circulating on the network. It may
tween non-adjacent variables. So re-ordering the axes help him to identify the different forms of anomaly
can help in discovering patterns or correlations across in his network. Data analysis must give meaning
variables. We clearly see that the four time-windows to the data with the goal of discovering useful in-
defined by the pink lines represent a strange behav- formation, suggesting conclusions, and supporting
ior compared to the normal windows defined by the decision-making. The value of the data lies in the
green lines. story it tells.
Figure 5: Anomaly detection (yellow windows are detected abnormal). From top to bottom, KPIs appear
in this order on the y-axis: Total Incoming Traffic, Total Outgoing Traffic, Server Delay, and Network Delay
6 Conclusion
In this work, we applied the OC-SVM method to de-
tect anomalies in real network traffic, contributing
with an automatic method based on the similarity in-
dex Eros [19] for setting the hyper-parameters which
define the learning configuration. It provided very sat-
isfactory results.
The advantages of novelty detection for complex
processes like network traffic are multiple. In partic-
ular there is no need of faulty data. A wide variety of
cases of anomaly exist and it would be impossible to
characterize them all or to gather the corresponding
data. Challenges for future work is related to the
fact that data comes in a stream and dealing with the
data in real-time is quite tedious. The amount of data
leads to cases where resources are limited. Novelty
Figure 6: Parallel Coordinates Plot illustrating the detection in a distributed framework is also to be
four abnormal windows. From top to bottom, the investigated.
curves labeled in pink color shows successively
the time-windows: w3 , w4 , w10 and w12 .
Acknowledgement The authors thank Bertrand
Le Marec and David Maisonneuve, leading team of
LivingObjects, for their support and valuable com-
ments about the application.
Page 7/8
References [12] B. Schölkopf, J.C. Platt, J.C. Shawe-Taylor, A.J.
Smola, and R.C. Williamson. Estimating the
[1] V. Chandola, A. Banerjee, and V. Kumar.
support of a high-dimensional distribution. Neu-
Anomaly detection: A survey. ACM Computing ral Computation, 13(7):1443–1471, 2001.
Surveys, 41(3):15:1–15:58, 2009.
[13] B. Schölkopf, R. Williamson, A. Smola,
[2] V. Kumar. Parallel and distributed computing J. Shawe-Taylor, and J. Platt. Support vector
for cybersecurity. IEEE Distributed Systems On- method for novelty detection. In Proceeding of
line, 6(10):1–9, 2005. the 12th International Conference on Neural In-
[3] C. Spence, L. Parra, and P. Sajda. Detection, formation Processing Systems, pages 582–588,
synthesis and compression in mammographic 1999.
image analysis with a hierarchical image prob- [14] R.P.W. Tax, D.M.J. and Duin. Support
ability model. In Proceedings of the IEEE vector data description. Machine learning,
Workshop on Mathematical Methods in Biomed- 54(1):45–66, 2004.
ical Image Analysis (MMBIA’01), MMBIA’01, [15] G.E.A.P.A. Batista, W. Wang, and E.J. Keogh. A
pages 3–, 2001.
complexity-invariant distance measure for time
[4] E. Aleskerov, B. Freisleben, and B. Rao. Card- series. SDM, 2011.
watch: a neural network based database min- [16] C.A. Ratanamahatana and E.J. Keogh. Mak-
ing system for credit card fraud detection. In ing time-series classification more accurate us-
Proceedings Of The IEEE/IAFE 1997 Compu- ing learned constraints. In Proceedings of
tational Intelligence For Financial Engineering SIAM International Conference on Data Mining
(CIFEr), pages 220–226, 1997. (SDM’04), pages 11–22, 2004.
[5] R. Fujimaki, T. Yairi, and K. Machida. An ap- [17] S. Park, W.W. Chu, J. Yoon, and C. Hsu. Effi-
proach to spacecraft anomaly detection problem cient searches for similar subsequences of dif-
using kernel feature space. In Proceedings of ferent lengths insequence databases. In 16th
the Eleventh ACM SIGKDD International Con- International Conference on Data Engineering,
ference on Knowledge Discovery in Data Min- pages 23–32, 2000.
ing, KDD’05, pages 401–410, New York, NY,
[18] K. Yang and C. Shahabi. A multilevel distance
USA, 2005. ACM.
based index structure for multivariate time se-
[6] C.J.C. Burges. A tutorial on support vector ma- ries. In 12th International Symposium on Tem-
chines for pattern recognition. Data Mining and poral Representation and Reasoning, 2005.
Knowledge Discovery, 2(2):121–167, 1998. [19] K. Yang and C. Shahabi. A pca-based simi-
[7] C.M. Bishop. Pattern Recognition and Machine larity measure for multivariate time series. In
Learning. Springer, 2006. Proceedings of the Second ACM International
WorkShop on multimedia Databases, 2004.
[8] W. Wang, Z. Xu, W. Lu, and X. Zhang. Deter-
mination of the spread parameter in the gaussian [20] S. Fuertes, G. Picart, J.Y. Tourneret, L. Chaari,
kernel for classification and regression. Neuro- A. Ferrari, and C. Richard. Improving space-
computing, 55(3-4):643–663, 2003. craft health monitoring with automatic anomaly
detection techniques. In 14th International Con-
[9] M.A.F. Pimentel, D.A. Clifton, and L.C. ference on Space Operations., page 2430, 2016.
Tarassenko. A review of novelty detection. Sig-
nal Processing, 99:215–249, 2014.
[10] D. Dasgupta and S. Forrest. Novelty detection in
time series data using ideas from immunology.
In Proceedings of The 5th International Con-
ference on Intelligent Systems, Reno, Nevada,
1996.
[11] E. Keogh, S. Lonardi, and W. Chiu. Finding
surprising patterns in a time series database in
linear time and space. In Proceedings of the
8th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, KDD
’02, pages 550–556, New York, NY, USA, 2002.
ACM.
Page 8/8