=Paper= {{Paper |id=Vol-1297/020-25_paper-5 |storemode=property |title=Методы выявления аномалий: обзор (Methods for Anomaly Detection: a Survey) |pdfUrl=https://ceur-ws.org/Vol-1297/020-25_paper-5.pdf |volume=Vol-1297 |dblpUrl=https://dblp.org/rec/conf/rcdl/KalinichenkoST14 }} ==Методы выявления аномалий: обзор (Methods for Anomaly Detection: a Survey) == https://ceur-ws.org/Vol-1297/020-25_paper-5.pdf
                   Methods for Anomaly Detection: a Survey

© Leonid Kalinichenko                              © Ivan Shanin                                             © Ilia Taraban
                                    Institute of Informatics Problems of RAS
                                                      Moscow
leonidandk@gmail.com                         ivan_shanin@mail.ru                                    tarabanil@gmail.com

                                                                    detail, each form is related to a certain class of problems
                        Abstract                                    and appropriate methods that are presented with the
    In this article we review different approaches                  application examples. In Section 6 we discuss specific
    to the anomaly detection problems, their                        features of the anomaly detection problem that make
    applications and specific features. We classify                 strong impact on the methods used in this area. Section
    different methods according to the data                         7 contains conclusions and results of this review.
    specificity and discuss their applicability in
    different cases.                                                2 Data forms
                                                                        The precise definition of the outlier depends on the
1 Introduction                                                      specific problem and its data representation. In this
    Anomalies (or outliers, deviant objects, exceptions,            survey we will establish a correspondence between
rare events, peculiar objects) is an important concept of           concrete data representation forms and suitable anomaly
the data analysis. Data object is considered to be an               detection methods. We assume that the data are usually
outlier if it has significant deviation from the regular            presented in one of three forms: Metric Data, Evolving
pattern of the common data behaviour in a specific                  Data and Multistructured Data. Metric Data are the most
domain. Generally it means that this data object is                 common form of data representation, when every object
“dissimilar” to the other observations in the dataset. It is        in a dataset has a certain set of attributes that allows to
very important to detect these objects during the data              operate with notions of "distance" and "proximity".
analysis to treat them differently from the other data.             Evolving Data are presented as well-studied objects:
For instance, the anomaly detection methods are widely              Discrete Sequences, Time Series and Multidimensional
used for the following purposes:                                    Data Streams. Third form is the Multistructured Data,
                                                                    under this term we understand the data that are
    • Credit card (and mobile phone) fraud detection
                                                                    presented in unstructured, semi-structured or structured
[1, 2];
                                                                    form. This data form may not have a rigid structure, and
    • Suspicious Web site detection [3];                            yet it can contain various data dependencies. The most
    • Whole-genome DNA matching [4, 5];                             usual task with this type of data is to extract attributes
    • ECG-signal filtering [6];                                     that would allow using metric data oriented methods of
                                                                    the outlier analysis. In our survey the Multistructured
    • Suspicious transaction detection [7];                         Data are specialized as the Graph Data or Text Data.
    • Analysis of digital sky surveys [8, 9].
    The anomaly detection problem has become a                      3 Metric Data Oriented Methods
recognized rapidly-developing topic of the data
                                                                        In this section the methods are considered that use
analysis. Many surveys and studies are devoted to this
                                                                    the concept of “metric” data: such as the distance
problem [1, 3, 4, 5, 10, 11]. The main purpose of this
                                                                    between objects, the correlation between them, and the
review is to reveal specific features of widely known
                                                                    distribution of data. We assume that the data in this case
statistical and machine learning methods that are used to
                                                                    represents the objects in the space, so-called points.
detect anomalies. All considered methods will be
                                                                    Then the task is to determine regular and irregular
categorized by the data form they are applied to.
                                                                    points, depending on the specific metric distance
    The paper is organized as follows. In Section 2 we              between objects in the space, or the correlation, or the
introduce three generic data representations that are               spatial distribution of the points. In this case, we
most commonly used in anomaly detection problems:                   consider a structured data type, i.e., objects, which do
Metric Data, Evolving Data and Multistructured Data.                not depend on time (time series are discussed in
In Sections 3, 4 and 5 these data forms are discussed in            Section 4). Metric data form is the most widely-used,
                                                                    usually due to the fact that almost all entities can be
Proceedings of the 16th All-Russian Conference “Digital             represented as a structured object, a set of attributes, and
Libraries: Advanced Methods and Technologies, Digital               thus as a point in a particular space [12]. Thus, these
Collections” ― RCDL-2014, Dubna, Russia, October                    methods are used in various applications, e.g., in
13–16, 2014.                                                        medicine and astronomy. We subdivide methods based



                                                               20
on the notion of distance, based on the correlations, data           3.3 Probabilistically Distributed Data
distributions and finally related to the data with high
                                                                         In probabilistic methods, the main approach is to
dimension and categorical attributes. We now turn to a
                                                                     assume that the data satisfy some distribution law. Thus,
more detailed review of certain types of these methods.
                                                                     anomalous objects can be defined as objects that do not
3.1 Distance-Based Data                                              satisfy such basic rule. A classic example of these
                                                                     methods is the EM [23, 24], an iterative algorithm based
    Basic set of methods that use the notion of distance             on the maximum likelihood method. Each iteration is an
includes clustering methods, K nearest neighbors and                 expectation and maximization. Expectation supposes
their derivatives. Clustering methods use the distance               the calculation of the likelihood function, and
defined in space to separate the data into homogenous                maximization step is finding the parameter that
and dense groups (clusters). If we see that the point is not         maximizes the likelihood function. As well there are
included in large clusters, it is classified as anomaly. So          methods based on statistics, data distribution. These
we can assume that small clusters can be clusters of                 include the tail analysis of distributions (e.g., normal)
anomalous objects, because anomalies may also have a                 and using the Markov, Chebyshev, Chernoff inequality.
similar structure, i.e., be clustered. K-nearest neighbors               An example of finding anomalies in sensors of
method [13] is based on the concept of proximity. We                 rotating machinery is considered in [27]. In this task
consider k nearest points on the basis of certain rules, that        rolling element bearing failures are determined as
decide whether the object is abnormal or not. A simple               anomalies. In practice, such frequent errors are one of
example of such rule is the distance between objects, i.e.,          the foremost causes of failures in the rotating
the farthest object from its neighbors the more likely is            mechanical systems. Comparing with other SVM-based
abnormal. There are various kinds of rules starting from             approaches, the authors apply a Gaussian distribution.
the distance-based rules to the neighbor distribution-               After choosing threshold and calculating parameters of
based. For example, LOF (Local outlier factor) [14] is               distribution the anomalies are found. For testing they
based on the density of objects in a neighborhood.                   use vibration data from the NSF I/UCR Center for
Examples of clustering methods of anomaly detection in               Intelligent    Maintenance       Systems      (IMS     –
astronomy can be found in [15, 16, 17]. Besides classic              www.imscenter.net) and reach 97% accuracy.
clustering methods, many machine learning techniques
can be used: e. g. modified methods of neural networks –                 Another examples of application of these methods
SOM (Self-organizing map) [18, 19].                                  can be found in [25, 26].
    As an example, consider [20]. Authors propose their              3.4 Categorical Data
own clustering algorithm that also classifies anomalies.
The main task in this case is to find erroneous values and               The appropriate anomaly detection methods operate
interesting events in sensor data. Using Intel Berkeley              with continuous data - thus, one approach is to translate
Research lab dataset (2.3 million readings from 54                   the categorical into continuous attributes. As an
sensors) and synthetic dataset their algorithm reached               example, categorical data can be represented as a set of
Detection rate = 100%, False alarm rate = 0.10% and                  binary attributes. Certainly this kind of transformation
0.09% respectively. These experimental results show                  may increase the dimension of the data, but this
that their approach can detect dangerous events (such as             problem can be solved with methods of dimensionality
forest fire, air pollution, etc.) as well as erroneous or            reduction. Different probabilistic approaches also can be
noisy data.                                                          used for processing categorical data. It is clear that these
                                                                     approaches are not the only ones that can work with the
3.2 Correlated Dimension Data                                        categorical data. For example, some methods may be
                                                                     partially modified for using categorical data types:
    The idea of these methods is based on the concept of             distance and proximity can be extended for categorical
correlation between data attributes. This situation is               data.
often found in real data because different attributes can
be generated by the same processes. Thus, this effect                3.5 High-Dimensional Data
allows to use linear models and methods based on them.
A simple example of these methods is the linear                          In various applications the problem of the large
regression. Using the method of linear regression of the             number of attributes often arises. This problem implies
data we are trying to bring some plane, which describes              the extra attributes, the incorrectness of the concepts of
our data, then as the anomalous objects we pick those                the distance between the objects and the sophistication
that are far away from this plane. Also often PCA                    of methods. For example, correlated dimension methods
(Principal component analysis) [21] can be used aiming               will work much worse on a large number of attributes.
at the reducing of the dimensionality of the data. Due to            The main way of solving these problems is the search of
this the PCA is sometimes used in preprocessing data as              subspaces of attributes. Earlier we mentioned the PCA,
in [15]. But it can also be directly used to separate                which is most commonly used for this task. But when
anomalies. In this case, the basic idea is that at new               selecting a small number of attributes other problems
dimensions it is easier to distinguish normal objects                will be encountered. By changing the number of
from abnormal objects [22].                                          attributes, we lose information. Because of the small
                                                                     samples of anomalies, or the emergence of new types of
                                                                     anomalies, previously "abnormal" attributes can be lost.




                                                                21
More subtle approach for this problem is the Sparse               WinXP systems (including logs of the important system
Cube Method [28]. This technique is based on analysis             processes such as svchost, Lsass, Inetinfo) and showed
of the density distributions of projections from the data,        good results. One of the practical examples is given also
then the grid discretization is performed (data is                in [31].
forming a sparse hypercube at this point) and the
evolutionary algorithm is employed to find an                     4.2 Time Series Data
appropriate lower-dimensional subspace.                               If the data strongly depends on time, then we are
    Many applications are confronted with the problem             facing the need to predict the forthcoming data and
of high dimension. [29] will be taken as an example.              analyze the current trends. The most common way to
Here authors searched for images, characterized by low            determine an outlier is a surprising change of trends.
quality, low illumination intensity or some collisions.           The methods considered are based on well-developed
They compare the PCA-based approach and the                       apparatus of time series analysis including Kalman
proposed one which is based on the random projections.            Filtering, Autoregressive Modeling, detection of
After projection LOF works with neighborhood that was             unusual shapes with the Haar transform and various
taken from source space. Both approaches show good                statistic techniques. Historically, the first approach to
results, but the second is much faster at large                   finding this sort of outliers used an idea from the
dimensions than PCA and LOF.                                      immunology [33].

4 Evolving Data                                                   5 Multistructured Data
    It is very common that data is given in a temporal                Sometimes the data is presented in a more complex
(or just consecutive) representation. Usually it is caused        form than numerical "attribute / value" table. In this
by the origin of the data. The temporal feature can be            case it is important to understand what an outlier is by
discrete or continuous, so the data can be presented in           using of the appropriate method of analysis. We will
sequences or in time series. Methods that we review in            review two cases that need specific analysis: textual
this section can be applied to various common problems            data (e.g., poll answers) and data presented as graph
in medicine, economy, earth science, etc. Also we                 (e.g., social network data).
review methods suitable for "on-line" outlier analysis in
data streams.                                                     5.1 Text Data

4.1 Discrete Sequences Data                                           In connection with the development of
                                                                  communications, world wide web, and especially with
    There are many problems that need outlier detection           the advent of social networks, an interest in the analysis
in discrete sequences (web logs analysis, DNA analysis,           of texts on the Internet greatly increased. Considering
etc. [3, 4]). There are several ways to determine an              the text analytics and anomaly detection, several major
outlier in the data presented as a discrete sequence. We          tasks can be distinguished: searching for abnormal texts
can analyze values on specific positions or test the              – such as spam detection and searching for non-standard
whole sequence to be deviant. Three models are used to            text – novelty detection. When solving these problems,
measure deviation in these problems: distance-based,              the main problem is to represent texts in metric data.
frequency-based and Hidden Markov Model [10]. In the              Thus we may use the previously defined methods. A
survey [30] the methods are divided in three groups:              simple way is to use the standard metrics for texts, such
sequence-based, contiguous subsequence-based and                  as the tf-idf. Extaction of entites from texts also is
pattern-based. The first group includes Kernel Based              widespread. Using natural language processing
Techniques, Window Based Techniques, Markovian                    techniques such as LSA (Latent semantic analysis) [34]
Techniques, contiguous subsequence methods include                it is possible to group text, integrating it with the
Window Scoring Techniques and Segmentation Based                  standard anomaly detection methods. Due to the large
Techniques. Pattern-based methods include Substring               number of texts, often the learning may have supervised
Matching, Subsequence Matching and Permutation                    character.
Matching Techniques [30].                                             In [36] a study is focused on spam detection. Using
    In the work [34] the classic host-based anomaly               the tf-idf measure their algorithm is based on computing
intrusion detection problem is solved. The study is               distances between messages. Then it constructs
devoted to Windows Native API systems (a specific                 “normal” area using training set. Afterwards area’s
WindowsNT API that is used mostly during system                   threshold determines whether an email was a spam.
boot), while most of other works consider UNIX-based              LingSpam (2412 ham, 480 spam), SpamAssassin(4150
systems. Authors analyse system calls in order to detect          ham, 1896 spam) and TREC(7368 ham , 14937 spam)
the abnormal behaviour that indicates an attack or                were selected as experimental data sets. The spam
intrusion. In order to solve this problem authors use a           detector shows high accuracy and low false positive rate
slide window method to establish a database of "normal            for each dataset.
patterns". Then the SVM method is used for anomaly
detection, and in addition to that several window-based           5.2 Graph Data
features are used to construct a detection rule. The
                                                                     In this section we review how methods of data
method was tested on the real data from Win2K and
                                                                  analysis depend on the graph structure. The main



                                                             22
difference is that the graph can be large and complex or,         has its own specific features making possible to tune the
in the contrary, can consist of many smaller and simpler          appropriate general algorithms properly turning them
graphs. The main problem here is to extract appropriate           into the more efficient ones.
attributes from nodes, edges and subgraphs that allow to              Let us consider one of the basic concept of machine
use methods considered in Section 3. In the first case we         learning – the classification problem. The anomaly
will review methods that extract numerical attributes             detection problem can be considered as a classification
from smaller graphs and treat them like data objects              problem, in that case the data is assumed to have the
using algorithms from Section 3. In case of a large and           class of anomalies. Most of the methods that solve
complex graph we may be interested in node outliers,              classification problems assume that data classes have
linkage outliers and subgraph outlier. Methods that               some sort of inner predictable structure. But the only
analyze node outliers usually extract attributes from the         prediction that can be made about anomalies is that
given node and its neighborhood, but in case of a                 these objects do not resemble non-outlier "normal" data.
linkage outlier detection the concept of an outlier itself        In this case, in order to solve the anomaly detection
becomes very complex [10, 3]. We will consider that               problem, the outlier class modeling can be senseless and
edge is an outlier if it connects nodes from different            unproductive. Instead of this, one should pay attention
dense clusters of nodes. The most popular methods are             to the structure of the normal data, its laws of
based on the random graph theory, matrix factorization            distribution.
and spectral analysis techniques [10]. Another problem
                                                                      The machine learning methods can be divided in
in this section is to detect subgraphs with a deviant
                                                                  three groups: supervised, semi-supervised and
behavior and to determine its structure and attribute
                                                                  unsupervised methods. The first group is the most
extraction [37].
                                                                  learned. It requires the labeled "training" dataset, and
    Concrete definition of the outlier node or edge can           this is exactly the situation described above: the
differ according to a specific problem. For example, in           information about the outlier class is used to tune a
[38] several types of anomaly are considered: near-star,          model of it in order to predict it's structure, which has
near-clique, heavy-vicinity and dominant edge.                    often very complex or random nature. The semi-
Anomalous subgraphs are often detected using the                  supervised methods use information only about the
Minimal Description Length principle [39, 40, 41]. One            "normal" class, so these methods have better
of the most important application today is Social                 specifications for anomaly detection problem as well as
Network Data – many popular modern techniques are                 unsupervised methods, which do not use any
used in this area: Bayesian Models [42], Markov                   information besides the structure and configuration of
Random Field, Ising Model [43], EM algorithm [44] as              the unlabeled data.
well as LOF [45].
                                                                      Another important specific feature of the anomaly
    In [44] authors perform anomaly detection methods             detection problem is that usually abnormal objects are
for social networks. Social network contains                      significantly rare (compared to the non-outlier objects).
information about its members and their meetings. The             This effect makes hard to construct a reliable training
problem statement is to find abnormal meeting and to              dataset for supervised methods. Also, if this effect is not
measure its degree of abnormality. The problem                    presented in the data, most of known methods will
specificity is that the number of meetings is very small          suffer from high alarm rates [47, 48].
compared to the number of members, that makes
challenging to use common statistical methods. In order           7 Conclusion
to solve the problem authors use the notion of
hypergraph. The vertices of the hypergraph are                        In this paper we introduced an approach to classify
considered as members of the social network and the               different anomaly detection problems according to the
edges are considered as meetings of the members (each             way the data are presented. We reviewed different
edge of a hypergraph connects some set of vertices                applications of the outlier analysis in various cases. At
together). The anomalies are detected through density             the end we summarized specific features of the methods
estimation of p-dimensional hypercube (the EM                     suitable for the outlier analysis problem. Our future
algorithm tunes a two-component mixture). The method              plans include preparing of a university master level
is tested on a synthetic data and shows relatively low            course focused on the anomaly detection as well as
estimation error. It is also considered to be a scalable          working on the anomaly detection in various fields (e.g.
method, which makes it very valuable to use on large              finding peculiar objects in massive digital sky
social networks.                                                  astronomy surveys).

6 Specific features of the anomaly detection                      References
methods comparing to the general machine                           [1] Chandola, V., Banerjee, A., & Kumar, V. (2009).
learning and statistics methods                                        Anomaly detection: A Survey. ACM Computing
                                                                       Surveys, 41(3), 1–58.
    In this article we show the application for the                    Doi:10.1145/1541880.1541882
anomaly detection of various data mining methods that
                                                                   [2] Kou, Y., Lu, C., & Sinvongwattana, S. (2004).
can re-use of the general machine learning and
                                                                       Survey of Fraud Detection Techniques Yo-Ping
statistical algorithms. The anomaly detection problem
                                                                       Huang, 749–754.



                                                             23
 [3] Pan Y., Ding X. Anomaly based web phishing                       Advances in Intrusion Detection Lecture Notes in
     page detection // Computer Security Applications                 Computer Science. Vol. 2820, 36–54.
     Conference, 2006. ACSAC'06. 22nd Annual. –                [20]   Purarjomandlangrudi A., Ghapanchi A. H.,
     IEEE, 2006. – С. 381–392.                                        Esmalifalak M. A Data Mining Approach for
 [4] Tzeng, J.-Y., Byerley, W., Devlin, B., Roeder, K.,               Fault Diagnosis: An Application of Anomaly
     & Wasserman, L. (2003). Outlier Detection and                    Detection Algorithm // Measurement. – 2014.
     False Discovery Rates for Whole-Genome DNA                [21]   Abdi, H., & Williams, L. J. (2010). Principal
     Matching. Journal of the American Statistical                    component analysis. Wiley Interdisciplinary
     Association, 98(461), 236–246.                                   Reviews: Computational Statistics, 2(4), 433–
     doi:10.1198/016214503388619256                                   459. doi:10.1002/wics.101
 [5] Wu, B. (2007). Cancer outlier differential gene           [22]   Dutta H. et al. Distributed Top-K Outlier
     expression detection. Biostatistics (Oxford,                     Detection from Astronomy Catalogs using the
     England), 8(3), 566–75.                                          DEMAC System // SDM. – 2007.
     doi:10.1093/biostatistics/kxl029                          [23]   Cansado, A., & Soto, A. (2008). Unsupervised
 [6] Lourenço A. et al. Outlier detection in non-                     Anomaly Detection in Large Databases Using
     intrusive ECG biometric system // Image Analysis                 Bayesian Networks. Network, 1–37.
     and Recognition. – Springer Berlin Heidelberg,            [24]   Zhu, X. (2007). CS838-1 Advanced NLP : The
     2013. – С. 43–52.                                                EM Algorithm K-means Clustering, (6), 1–6.
 [7] Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2012).            [25]   Spence, C., Parra, L., & Sajda, P. (2001).
     Isolation-Based Anomaly Detection. ACM                           Detection, Synthesis and Compression in
     Transactions on Knowledge Discovery from Data,                   Mammographic Image Analysis with a
     6(1), 1–39. doi:10.1145/2133360.2133363                          Hierarchical Image Probability Model, 3–10.
 [8] Djorgovski, S. G., Brunner, R. J., Mahabal, A. A.,        [26]   Pelleg, D., & Moore, A. (n.d.). Active Learning
     & Odewahn, S. C. (2001). Exploration of Large                    for Anomaly and Rare-Category Detection.
     Digital Sky Surveys. Observatory, 1–18.
                                                               [27]   Fawzy A., Mokhtar H. M. O., Hegazy O. Outliers
 [9] Djorgovski, S. G., Mahabal, A. A., Brunner, R. J.,               detection and classification in wireless sensor
     Gal, R. R., Castro, S., Observatory, P., Carvalho,               networks // Egyptian Informatics Journal. – 2013.
     R. R. De, et al. (2001a). Searches for Rare and                  – Т. 14, № 2. – С. 157–164.
     New Types of Objects, 225, 52–63.
                                                               [28]   Aggarwal C. C., Philip S. Y. An effective and
[10] Aggarwal, C. C. (2013). Outlier Analysis                         efficient algorithm for high-dimensional outlier
      (introduction). doi:10.1007/978-1-4614-6396-2                   detection // The VLDB journal. – 2005. – Т. 14,
[11] Chandola, V., Banerjee, A., & Kumar, V. (2009).                  № 2. – С. 211–221.
      Anomaly detection: A Survey. ACM Computing               [29]   De Vries, T., Chawla, S., &Houle, M. E. (2010).
      Surveys, 41(3), 1–58.                                           Finding Local Anomalies in Very High
      doi:10.1145/1541880.1541882                                     Dimensional Space. 2010 IEEE
[12] Berti-équille, L. (2009). Data Quality Mining :                  InternationalConferenceonDataMining, 128–137.
      New Research Directions. Current.                               doi:10.1109/ICDM.2010.151
[13] Stevens, K. N., Cover, T. M., & Hart, P. E.               [30]   Chandola V., Banerjee A., Kumar V. Anomaly
      (1967). Nearest Neighbor Pattern Classification.                detection for discrete sequences: A survey
      EEE Transactions on Information Theory 13, I,                   // Knowledge and Data Engineering, IEEE
      21–27.                                                          Transactions on. – 2012. – Т. 24, № 5. – С. 823–
[14] Breunig, M. M., Kriegel, H., Ng, R. T., & Sander,                839.
      J. (2000). LOF?: Identifying Density-Based Local         [31]   Budalakoti S., Srivastava A. N., Otey M. E.
      Outliers, 1–12.                                                 Anomaly detection and diagnosis algorithms for
[15] Borne, K., &Vedachalam, A. (2010).                               discrete symbol sequences with applications to
      EFFECTIVE OUTLIER DETECTION IN                                  airline safety // Systems, Man, and Cybernetics,
      SCIENCE DATA STREAMS. ReCALL, 1–15.                             Part C: Applications and Reviews, IEEE
[16] Borne, K. (n.d.). Surprise Detection in                          Transactions on. – 2009. – Т. 39, №. 1. – С. 101–
      Multivariate Astronomical Data.                                 113.
[17] Henrion, M., Hand, D. J., Gandy, A., &Mortlock,           [32]   Wang M., Zhang C., Yu J. Native API based
      D. J. (2013). CASOS: a Subspace Method for                      windows anomaly intrusion detection method
      Anomaly Detection in High Dimentional                           using SVM // Sensor Networks, Ubiquitous, and
      Astronomical Databases. Statistical Analysis and                Trustworthy Computing, 2006. IEEE
      Data Mining, 6(1), 1–89.                                        International Conference on. – IEEE, 2006. –
[18] Networks, K. (n.d.). Data Mining Self –                          Т. 1. – С. 6.
      Organizing Maps, 1–20.                                   [33]   Dasgupta D., Forrest S. Novelty detection in time
[19] Manikantan Ramadas, Shawn Ostermann, Brett                       series data using ideas from immunology
      TjadenDetecting Anomalous Network Traffic                       // Proceedings of the international conference on
      with Self-organizing Maps.(2003) Recent                         intelligent systems. – 1996. – С. 82–87.




                                                          24
[34] Susan T. Dumais (2005). "Latent Semantic                  [46] Portnoy L., Eskin E., Stolfo S. Intrusion
     Analysis". Annual Review of Information                        Detection with Unlabeled Data Using Clustering
     Science and Technology 38: 188.                                (2001) // ACM Workshop on Data Mining
     doi:10.1002/aris.1440380105                                    Applied to Security (DMSA 01).
[35] Allan, J., Papka, R., & Lavrenko, V. (1998). On-          [47] Laorden C. et al. Study on the effectiveness of
     line New Event Detection and Tracking.                         anomaly detection for spam filtering
[36] Laorden C. et al. Study on the effectiveness of                // Information Sciences. – 2014. – Т. 277. –
     anomaly detection for spam filtering                           С. 421–444.
     // Information Sciences. – 2014. – Т. 277. –              [48] Fawzy A., Mokhtar H. M. O., Hegazy O. Outliers
     С. 421–444.                                                    detection and classification in wireless sensor
[37] Kil, H., Oh, S.-C., Elmacioglu, E., Nam, W., &                 networks // Egyptian Informatics Journal. – 2013.
     Lee, D. (2009). Graph Theoretic Topological                    – Т. 14, № 2. – С. 157–164.
     Analysis of Web Service Networks.                         [49] Yu M. A nonparametric adaptive CUSUM
     WorldWideWeb, 12(3), 321–343.                                  method and its application in network anomaly
     doi:10.1007/s11280-009-0064-6                                  detection // International Journal of
[38] Akoglu L., McGlohon M., Faloutsos C. Oddball:                  Advancements in Computing Technology. –
     Spotting anomalies in weighted graphs                          2012. – Т. 4, № 1. – С. 280–288.
     // Advances in Knowledge Discovery and Data               [50] Muniyandi A.P., Rajeswari R., Rajaram R.
     Mining. – Springer Berlin Heidelberg, 2010. –                  Network anomaly detection by cascading
     С. 410–421.                                                    k-Means clustering and C4. 5 decision tree
[39] Noble C. C., Cook D. J. Graph-based anomaly                    algorithm // Procedia Engineering. – 2012. –
     detection // Proceedings of the ninth ACM                      Т. 30. – С. 174–182.
     SIGKDD international conference on Knowledge              [51] Muda Z. et al. A K-Means and Naive Bayes
     discovery and data mining. – ACM, 2003. –                      learning approach for better intrusion detection
     С. 631–636.                                                    // Information technology journal. – 2011. –
[40] Eberle W., Holder L. Discovering structural                    Т. 10, №. 3. – С. 648–655.
     anomalies in graph-based data // Data Mining              [52] Kavuri V. C., Liu H. Hierarchical clustering
     Workshops, 2007. ICDM Workshops 2007.                          method to improve transrectal ultrasound-guided
     Seventh IEEE International Conference on. –                    diffuse optical tomography for prostate cancer
     IEEE, 2007. – С. 393–398.                                      imaging // Academic radiology. – 2014. – Т. 21,
[41] Chakrabarti D. Autopart: Parameter-free graph                  № 2. – С. 250–262.
     partitioning and outlier detection // Knowledge           [53] Li S., Tung W. L., Ng W. K. A novelty detection
     Discovery in Databases: PKDD 2004. – Springer                  machine and its application to bank failure
     Berlin Heidelberg, 2004. – С. 112–124.                         prediction // Neurocomputing. – 2014. – Т. 130. –
[42] Heard N. A. et al. Bayesian anomaly detection                  С. 63–72.
     methods for social networks //The Annals of               [54] Cogranne R., Retraint F. Statistical detection of
     Applied Statistics. – 2010. – Т. 4, № 2. – С. 645–             defects in radiographic images using an adaptive
     662.                                                           parametric model // Signal Processing. – 2014. –
[43] Horn C., Willett R. Online anomaly detection                   Т. 96. – С. 173–189.
     with expert system feedback in social networks            [55] Daneshpazouh A., Sami A. Entropy-Based
     // Acoustics, Speech and Signal Processing                     Outlier Detection Using Semi-Supervised
     (ICASSP), 2011 IEEE International Conference                   Approach with Few Positive Examples // Pattern
     on. – IEEE, 2011. – С. 1936–1939.                              Recognition Letters. – 2014.
[44] Silva J., Willett R. Detection of anomalous               [56] Rahmani A. et al. Graph-based approach for
     meetings in a social network //Information                     outlier detection in sequential data and its
     Sciences and Systems, 2008. CISS 2008. 42nd                    application on stock market and weather data
     Annual Conference on. – IEEE, 2008. – С. 636–                  // Knowledge-Based Systems. – 2014. – Т. 61. –
     641.                                                           С. 89–97.
[45] Bhuyan M., Bhattacharyya D., Kalita J. Network
     anomaly detection: methods, systems and tools. –
     2013.




                                                          25