=Paper=
{{Paper
|id=Vol-2327/ESIDA2
|storemode=property
|title=eX2: a framework for interactive anomaly detection
|pdfUrl=https://ceur-ws.org/Vol-2327/IUI19WS-ESIDA-2.pdf
|volume=Vol-2327
|authors=Ignacio Arnaldo,Kalyan Veeramachaneni,Mei Lam
|dblpUrl=https://dblp.org/rec/conf/iui/ArnaldoVL19
}}
==eX2: a framework for interactive anomaly detection==
<pdf width="1500px">https://ceur-ws.org/Vol-2327/IUI19WS-ESIDA-2.pdf</pdf>
<pre>
                 eX2: a framework for interactive anomaly detection
                 Ignacio Arnaldo                                                Mei Lam                           Kalyan Veeramachaneni
             iarnaldo@patternex.com                                      mei@patternex.com                             kalyanv@mit.edu
                    PatternEx                                                PatternEx                                     LIDS, MIT
                San Jose, CA, USA                                        San Jose, CA, USA                            Cambridge, MA, USA

ABSTRACT                                                                               outlier analysis, given that new attacks are expected to be rare and
We introduce eX 2 (coined after explain and explore), a framework                      exhibit distinctive features. At the same time, special attention is
based on explainable outlier analysis and interactive recommenda-                      dedicated to providing interpretable, actionable results for analyst
tions that enables cybersecurity researchers to efficiently search                     consumption. Finally, the framework exploits human-data interac-
for new attacks. We demonstrate the framework with both pub-                           tions to recommend the exploration of regions of the data deemed
licly available and real-world cybersecurity datasets, showing that                    problematic by the analyst.
eX 2 improves the detection capability of stand-alone outlier analy-
sis methods, therefore improving the efficiency of so-called threat                    2   RELATED WORK
hunting activities.                                                                    Anomaly detection methods have been extensively studied in the
                                                                                       machine learning community [1, 6, 10]. The strategy based on
CCS CONCEPTS                                                                           Principal Component Analysis used in this work is inspired by [14],
• Security and privacy → Intrusion/anomaly detection and                               while the method introduced to retrieve feature contributions based
malware mitigation; • Human-centered computing → User                                  on the analysis of feature projections into the principal components
interface management systems; • Information systems →                                  is closely related to [7].
Recommender systems;                                                                      Given the changing nature of cyber-attacks, many researchers re-
                                                                                       sort to anomaly detection for threat detection. The majority of these
KEYWORDS                                                                               works focus on building sophisticated models [13, 15], but do not
Anomaly detection; interactive machine learning; explainable ma-                       exploit analyst interactions with the data to improve detection rates.
chine learning; cybersecurity; recommender systems                                     Recent works explore a human-in-the-loop detection paradigm by
                                                                                       leveraging a combination of outlier analysis, used to identify new
ACM Reference Format:
Ignacio Arnaldo, Mei Lam, and Kalyan Veeramachaneni. 2019. eX2 : a frame-
                                                                                       threats, and supervised learning to improve detection rates over
work for interactive anomaly detection. In Joint Proceedings of the ACM IUI            time [2, 8, 16]. However, these works do not consider two critical
2019 Workshops, Los Angeles, USA, March 20, 2019 , 5 pages.                            aspects in cybersecurity. First, they do not provide explanations
                                                                                       for the anomalies (note that [2] provides predefined visualizations
1    INTRODUCTION                                                                      based on prior attack knowledge, but it does not account for new
The cybersecurity community is embracing machine learning to                           attacks exhibiting unique patterns). Second, neither of these works
transition from a reactive to a predictive strategy for threat detec-                  exploit interactive strategies upon the confirmation of a new attack
tion. At the same time, most research works at the intersection                        by an analyst, therefore missing an opportunity to improve the
of cybersecurity and machine learning focus on building complex                        detection recall and the label acquisition process.
models for a specific detection problem [11], but rarely translate
into real-world solutions. Arguably one of the biggest weakspots                       3   FINDING ANOMALIES
of these works is the use of datasets that lack generality, realism,                   We leverage Principal Component Analysis (PCA) to find cases that
and representativeness [3].                                                            violate the correlation structure of the main bulk of the data. To
   To break out of this situation, the first step is to devise efficient               detect these rare cases, we analyze the projection from original
strategies to obtain representative datasets. To that end, intelligent                 variables to the principal components’ space, followed by the in-
tools and interfaces are needed to enable security researchers to                      verse projection (or reconstruction) from principal components to
carry out threat hunting activities, i.e., to search for attacks in                    the original variables. If only the first principal components (the
real-world cybersecurity datasets. Threat hunting solutions remain                     components that explain most of the variance in the data) are used
vastly unexplored in the research community, and open challenges                       for projection and reconstruction, we ensure that the reconstruction
in combining the fields of outlier analysis, explainable machine                       error will be low for the majority of the examples, while remaining
learning, and recommendation systems.                                                  high for outliers. This is because the first principal components ex-
   In this paper, we introduce eX 2 , a threat hunting framework                       plain the variance of normal cases, while last principal components
based on interactive anomaly detection. The detection relies on                        explain outlier variance [1].
                                                                                          Let X be a p-dimensional dataset. Its covariance matrix Σ can be
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                     decomposed as: Σ = P × D × P T , where P is an orthonormal matrix
Copyright © 2019 for the individual papers by the papers’ authors. Copying permitted   where the columns are the eigenvectors of Σ, and D is the diagonal
for private and academic purposes. This volume is published and copyrighted by its     matrix containing the corresponding eigenvalues λ 1 . . . λp , where
editors.
                                                                                       the eigenvectors and their corresponding eigenvalues are sorted in
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                                                              Ignacio Arnaldo, Mei Lam, and Kalyan Veeramachaneni

 0.015                            0.015                             0.020                          2.5                                              8                                      4
                                                                                                                                                    7                                    3.5
                                                                    0.015                               2                                           6                                      3
 0.010                            0.010                                                                                                             5
                                                                                                   1.5                                                                                   2.5
                                                                    0.010                                                                           4                                      2
                                                                                                        1                                           3                                    1.5
 0.005                            0.005
                                                                    0.005                                                                           2                                      1
                                                                                                   0.5
                                                                                                                                                    1                                    0.5
 0.000                            0.000                              0.000                              0                                           0                                      0
         0   200 400 600 800 1, 000     0   500   1, 000 1, 500 2, 000     0   200 400 600 800 1, 000       0       0.2   0.4   0.6   0.8       1       0    0.2   0.4   0.6   0.8   1         0   0.2   0.4   0.6   0.8    1


     (a) Credit card                      (b) KDDCup                           (c) ATO                      (d) Credit card                                  (e) KDDCup                             (f) ATO


Figure 1: Score distributions (a, b, c) and normalized scores (d, e, f) for three datasets obtained with the PCA method and the
corresponding fitted distributions.


decreasing order of significance (the first eigenvector accounts for                                            4         EXPLAINING AND EXPLORING
the most variance etc).                                                                                                   ANOMALIES
    The projection of the dataset into the principal component space
                                                                                                                Interpretability in machine learning can be achieved by explain-
is given by Y = X P. Note that this projection can be performed with
                                                                                                                ing the model that generates the results, or by explaining each
a reduced number of principal components. Let Y j be the projected
                                                                                                                model outcome [9]. In this paper, we focus on the latter, given that
dataset using the top j principal components: Y j = X × P j . In the
                                                                                                                the goal is to provide explanations for each individual anomaly.
same way, the reverse projection (from principal component space
                                                                                                                More formally, we consider an anomaly detection strategy given by
to original space) is given by R j = (P j × (Y j )T )T , where R j is the
                                                                                                                b(X p ) = S where b is a black-box detector, X p is a dataset with p
reconstructed dataset using the top j principal components.
                                                                                                                features, and S is the space of scores generated by the detector. The
    We define the outlier score of point X i = [x i1 . . . x ip ] as:
                                                                                                                goal is to find an explanation e ∈ ϵ for each x ∈ X p , where ϵ repre-
                                                                                                                sents the domain of interpretable explanations. We approach this
                                                                                                                problem as finding a function f such that for each vector x ∈ X p ,
                                     p
                     
                     
                     score(X i ) =
                                     Í           j
                                        (|X i − Ri |) × ev(j)                                                   the corresponding explanation is given by e = f (x, b).
                                                                                                                   In this paper, we introduce a procedure f tailored to PCA that
                     
                                    j=1
                     
                     
                     
                                 j
                                                                                            (1)                 generates explanations e = {C, V }, where C contains the contribu-
                     
                     
                                    λk
                                   Í

                       ev(j) = k =1                                                                             tion of each feature to the score, and V is a set of visualizations
                     
                     
                                p
                     
                                                                                                                that highlight the difference between the analyzed example and the
                     
                                    λk
                     
                               Í
                     
                               k =1
                                                                                                                bulk of the population.
                     
                                                                                                                Retrieving feature contributions: In this first step, we retrieve
   Note that ev(j) represents the cumulative percentage of variance                                             the contribution of each feature of the dataset to the final outlier
explained with the top j principal components. This means that,                                                 score via model inspection. Note that we leverage matrix opera-
the higher is j, the most variance will be accounted for within the                                             tions to simultaneously retrieve the feature contributions for all
components from 1 to j. With this score definition, large deviations                                            the examples; we proceed as follows:
in the top principal components are not heavily weighted, while
                                                                                                                     (1) Project one feature at a time using all principal components.
deviations in the last principal components are. Outliers present
                                                                                                                         For feature i, the projected data is given by Yi = X i × P,
large deviations in the last principal components, and thus will
                                                                                                                         where the matrix P contains all p eigenvectors.
receive high scores.
                                                                                                                     (2) Compute the feature contribution Ci of feature i as:
Normalizing outlier scores: As shown in Figure 1, the outlier                                                                                                      p
detection method assigns a low score to most examples, and the                                                                                                             j
                                                                                                                                                                   Õ
                                                                                                                                                            Ci =         Yi × ev(j)                                        (2)
distribution presents a long right tail. At the same time, the range                                                                                               j=1
of the scores depends on the datasets, which limits the method’s
interpretability. To overcome this situation, we project all scores                                                                         j
                                                                                                                         where Yi is the projected value of the i-th feature on the j-th
into a same space, in such a way that scores can be interpreted as                                                       principal component, and ev(j) is the cumulative percentage
probabilities. To that end, we model PCA-based outlier scores with                                                       of variance explained with the top j principal components
a Weibull distribution (overlaid in the figures in red). Note that                                                       given in Equation 1. In other words, the higher the absolute
the Weibull distribution is flexible and can model a wide variety of                                                     values projected with the last principal components, the
shapes. For a given score S, its outlier probability corresponds to                                                      higher the contribution of the feature to the outlier score.
the cumulative density function evaluated in S: F (S) = P(X ≤ S).                                                    (3) In a last step, we normalize the feature contributions to
Figure 1 shows the final scores F for each of the analyzed datasets.                                                     obtain a unit vector C for each sample:
We can see that, with this technique, the final scores approximately
follow a long-right tailed distribution in the [0, 1] domain. Note that                                                                                              Ci
                                                                                                                                                               Ci = p                                                      (3)
these scores can be interpreted as the probability that a randomly
                                                                                                                                                                      Cj
                                                                                                                                                                    Í
picked example will present a lower or equal score.                                                                                                                      j=1
eX2 : a framework for interactive anomaly detection                                                         IUI Workshops’19, March 20, 2019, Los Angeles, USA

            num_cred_cards                                               100                                                                100

                 58.29%                                                  80                                                                  80


                                                        num_addresschg


                                                                                                                           num_addresschg
                                                                         60                                                                  60
                                                                         40                                                                  40
                                       5.13%     rest                    20                                                                  20
                                      7.09%
                                               new_ip                     0                                                                   0
              17.94%         11.40%
                                                                               0    5   10   15       20                                          0    5   10   15         20
        num_address_chg   addr_verify_fail                                         num_cred_cards                                                     num_cred_cards
                          (a)                                                        (b)                                                                (c)

Figure 2: Explanation of an outlier (in red) of the account takeover dataset (ATO): (a) feature contributions; (b) distribution of
the population in the subspace formed by the top 2 contributing features; (c) nearest neighbors (green) in the 2D subspace.


This way, for each outlier, we obtain a contribution score in the                           5     EXPERIMENTAL WORK
[0, 1] domain for each feature in the dataset. To illustrate this step,                     Datasets: We evaluate the framework’s capability to find, explain,
we show in Figure 2a the feature contributions to the score of                              and explore anomalies with four outlier detection datasets, out of
an outlier of the ATO dataset; we can see that num_cred_cards                               which three are publicly available (WDBC, KDDCup, and Credit
contributed the most to the example’s score (58.29%), followed                              Card) and one is a real-world dataset built with logs generated by
by num_address_chд and addr _veri f y_f ail (17.94% and 11.40%                              an online application:
respectively).                                                                              - WDBC dataset: this dataset is composed of 367 rows, 30 numer-
Visualizing anomalies: Once the feature contributions are ex-                               ical features, and includes 10 anomalies. We consider the version
tracted, the system generates a series of visualizations to show each                       available at [5] introduced by Campos et al. [4]. Note that this is
outlier in relation with the rest of the population. For ease of in-                        not a cybersecurity dataset, but has been included to cover a wider
terpretation, these visualizations are generated in low dimensional                         range of scenarios.
feature subspaces as follows:                                                               - KDDCup 99 data (KDD): We consider the pre-processed version
                                                                                            introduced in [4] in which categorical values are one-hot encoded
   (1) Retrieve the top-m features ranked by contribution score
                                                                                            and duplicates are eliminated. This version is composed of 48113
   (2) For each pair of features (x i , x j ) in the top-m, display the
                                                                                            rows, 79 features, and counts 200 malicious anomalies.
       joint distribution of the population in a 2D-scatter plot as
                                                                                            - Credit card dataset (CC): used in a Kaggle competition [12],
       shown in Figure 2b. Note that in the example m = 2 and
                                                                                            the dataset is composed of 284807 rows, 29 numerical features, and
       that the analyzed outlier is highlighted in red. In cases of
                                                                                            counts 492 anomalies.
       large datasets, the visualizations are limited to 10K randomly
                                                                                            - Account takeover dataset (ATO): this real-world dataset was
       picked samples.
                                                                                            built using web logs from an online application during three months.
With this approach, we obtain intuitive visualizations in low-dimension                     Each row corresponds to the summarized activity of a user during
subspaces of the original features, in such a way that outliers are                         a 24 hour time window (midnight to midnight). It is composed
likely to stand out with respect to the rest of the population.                             of 317163 rows, 25 numerical features, and counts 318 identified
Exploring via recommendations in feature subspaces: As the                                  anomalies.1
analyst interacts with the visualizations and confirms relevant find-                          Detection rates and analysis of top outliers: Table 1 shows
ings, the framework recommends the investigation of entities with                           the detection metrics of the PCA-based method and Local Outlier
similar characteristics. These recommendations are interactive and                          Factor (LOF), a standard outlier analysis baseline, on each of the
correspond to searching the top-k nearest neighbors in the feature                          datasets. The detection performance of LOF is superior for the
subspaces used to visualize the data (as opposed to using all the                           smaller dataset, WDBC. However, PCA-based outlier analysis out-
features for distance computation). As shown in Figure 2c, the rec-                         performs LOF in the three cybersecurity datasets (KDD, CC, and
ommendations highlighted in green help narrow down the search                               ATO). This observation validates the choice of PCA, given that not
of further anomalies.                                                                       only it outperforms LOF, but it also provides interpretability as
   This strategy, recommending based on similarities computed                               explained in Section 4.
in feature subsets, exploits user interactions with the data. The                              Despite improving the results of LOF in the cybersecurity datasets,
intuition is that, upon confirmation of the relevance of an outlier                         we can see that the precision and recall metrics of the PCA-based
with the provided visualizations, the user identifies discriminant                          method remain low. For instance, when looking at the top 100 out-
feature sets that are not known a priori. Thus, points close to the                         liers, the precision of our method (noted as P@100 in the table) is
identified anomaly in the resulting subspaces are likely to be in                           1 As most real-world datasets, ATO is not fully labeled, therefore the metrics presented
turn relevant.                                                                              in the following need to be interpreted accordingly.
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                                                                             Ignacio Arnaldo, Mei Lam, and Kalyan Veeramachaneni

                                           Dataset          Method    AUROC      AUPR                   Pr@10        R@10    P@50     R@50    P@100    R@100                P@200     R@200     P@500    R@500
                                                             LOF       0.982     0.834                   0.800       0.800   0.180    0.900    0.100    1.000                0.050     1.000     0.020    1.000
                                            WDBC
                                                             PCA       0.899     0.219                   0.300       0.300   0.160    0.800    0.090    0.900                0.050     1.000     0.020    1.000
                                                             LOF       0.606     0.029                   0.000       0.000   0.240    0.060    0.170    0.085                0.105     0.105     0.054    0.135
                                          KDDCup
                                                             PCA       0.977     0.136                   0.300       0.015   0.260    0.065    0.210    0.105                0.220     0.220     0.138    0.345
                                                             LOF       0.654     0.015                   0.000       0.000   0.000    0.000    0.000    0.000                0.000     0.000     0.000    0.000
                                          Credit card
                                                             PCA       0.954     0.255                   0.400       0.008   0.620    0.063    0.480    0.098                0.500     0.203     0.282    0.287
                                                             LOF       0.568     0.004                   0.000       0.000   0.020    0.003    0.010    0.003                0.005     0.003     0.004    0.006
                                      Account takeover
                                                             PCA       0.861     0.010                   0.100       0.003   0.020    0.003    0.020    0.006                0.020     0.013     0.014    0.022

   Table 1: Anomaly detection metrics of Local Outlier Factor (LOF) and the method based on PCA used in our framework.


                            20                                                                            1                                                                      1
      num_cred_cards


                                                                                                        0.8

                                                                                     addr_verify_fail
                                                                                                                                                                                0.8


                                                                                                                                                             addr_verify_fail
                            15
                                                                                                        0.6                                                                     0.6
                            10
                                                                                                        0.4                                                                     0.4
                            5                                                                           0.2                                                                     0.2
                            0                                                                             0                                                                      0
                                      0      20 40 60 80                       100                               0       20 40 60 80                   100                            0         5   10   15       20
                                             num_addresschg                                                              num_addresschg                                                        num_cred_cards

Figure 3: Visualization of the top ATO outlier with respect to the bulk of the population in 2D feature subspaces of interest.
The recommendations performed by the system are shown in green.


0.210, 0.480, and 0.020 for KDDcup, CC, and ATO respectively. This                                                                   outlier (shown in red) stands out with respect to the population
observation indicates that not all outliers are malicious, and justifies                                                             (blue), ie outliers fall in sparse regions of the selected subspaces. The
the effort dedicated to providing interactive exploration of the data                                                                top 3 contributing features retrieved by the framework are the num-
to increase anomaly detection rates.                                                                                                 ber of address changes (num_addresschg), the number of credit
Explain and explore: We show in Figure 3 the visualizations and                                                                      cards used (num_cred_cards), and whether the user failed the ad-
recommendations generated for the top ATO outlier. The frame-                                                                        dress verification (addr_verify_fail). In the first plot (num_addresschg
work appropriately selects feature subsets such that the analyzed                                                                    vs num_cred_cards), we can clearly see why the highlighted user is
                                                                                                                                     suspicious: he/she used four credit cards, and changed the delivery
                                                                                                                                     address more that 90 times. The plot also shows five additional
                                                     AD-WDBC          IAD-WDBC                                                       users recommended by the system upon confirmation of the threat
                                                      AD-KDD           IAD-KDD                                                       by an analyst. The recommended users present an elevated number
                                                       AD-CC            IAD-CC
                                                      AD-ATO           IAD-ATO                                                       of address changes, and used one or more credit cards.
                                                                                                                                        To further evaluate the exploratory strategy based on recom-
                             1
                                                                                                                                     mendations, Figure 4 shows the detection rate obtained with PCA
                                                                                                                                     alone, versus the metrics obtained with the combination of PCA
                            0.9
                                                                                                                                     and recommendations. To obtain the latter metrics, we simulate
                            0.8                                                                                                      investigations for the top-m (m ∈ [10, 25, 50, 100, 200, 500]) outliers
                            0.7                                                                                                      (ie we reveal the ground truth) and consider the top-10 recom-
                            0.6                                                                                                      mended entries for the confirmed threats. In all cases, interactive
                Precision


                            0.5                                                                                                      anomaly detection improves the precision. In particular, we can see
                            0.4
                                                                                                                                     a significant precision improvement for the KDD and CC datasets
                                                                                                                                     for investigation budgets in the 50-200 range.
                            0.3
                            0.2
                                                                                                                                     6   CONCLUSION
                            0.1
                                                                                                                                     We have introduced the eX 2 framework for threat hunting activities.
                             0
                                  0           50           100        150        200                     250                         The framework leverages principal component analysis to generate
                                                        Investigation budget
                                                                                                                                     interpretable anomalies, and exploits analyst-data interaction to
                                                                                                                                     recommend the exploration of problematic regions of the data. The
Figure 4: Precision versus investigation budget of anomaly                                                                           results presented in this work with three cybersecurity datasets
detection alone based on PCA (AD) and interactive anomaly                                                                            show that eX 2 outperforms detection strategies based on stand-
detection combining both PCA and recommendations (IAD).                                                                              alone outlier analyis.
eX2 : a framework for interactive anomaly detection                                       IUI Workshops’19, March 20, 2019, Los Angeles, USA


REFERENCES
 [1] Charu C. Aggarwal. 2013. Outlier Analysis. Springer. https://doi.org/10.1007/
     978-1-4614-6396-2
 [2] Anaël Beaugnon, Pierre Chifflier, and Francis Bach. 2017. ILAB: An Interactive
     Labelling Strategy for Intrusion Detection. In RAID 2017: Research in Attacks,
     Intrusions and Defenses. Atlanta, United States. https://hal.archives-ouvertes.fr/
     hal-01636299
 [3] E. Biglar Beigi, H. Hadian Jazi, N. Stakhanova, and A. A. Ghorbani. 2014. Towards
     effective feature selection in machine learning-based botnet detection approaches.
     In 2014 IEEE Conference on Communications and Network Security. 247–255.
 [4] Guilherme O. Campos, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello,
     Barbora Micenková, Erich Schubert, Ira Assent, and Michael E. Houle. 2016.
     On the evaluation of unsupervised outlier detection: measures, datasets, and
     an empirical study. Data Mining and Knowledge Discovery 30, 4 (01 Jul 2016),
     891–927.
 [5] Guilherme O. Campos, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello,
     Barbora Micenková, Erich Schubert, Ira Assent, and Michael E. Houle. 2018.
     Datasets for the evaluation of unsupervised outlier detection. www.dbs.ifi.lmu.
     de/research/outlier-evaluation/DAMI/
 [6] Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly Detection:
     A Survey. ACM Comput. Surv. 41, 3, Article 15 (July 2009), 58 pages. https:
     //doi.org/10.1145/1541880.1541882
 [7] XuanHong Dang, Barbora MicenkovÃą, Ira Assent, and RaymondT. Ng. 2013.
     Local Outlier Detection with Interpretation. In Machine Learning and Knowledge
     Discovery in Databases, Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen,
     and Filip Å¡eleznÃ¡ (Eds.). Lecture Notes in Computer Science, Vol. 8190. Springer
     Berlin Heidelberg, 304–320. https://doi.org/10.1007/978-3-642-40994-3_20
 [8] S. Das, W. Wong, T. Dietterich, A. Fern, and A. Emmott. 2016. Incorporating
     Expert Feedback into Active Anomaly Discovery. In 2016 IEEE 16th International
     Conference on Data Mining (ICDM). 853–858. https://doi.org/10.1109/ICDM.2016.
     0102
 [9] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca
     Giannotti, and Dino Pedreschi. 2018. A Survey of Methods for Explaining Black
     Box Models. ACM Comput. Surv. 51, 5, Article 93 (Aug. 2018), 42 pages. https:
     //doi.org/10.1145/3236009
[10] Victoria Hodge and Jim Austin. 2004. A Survey of Outlier Detection Method-
     ologies. Artif. Intell. Rev. 22, 2 (Oct. 2004), 85–126. https://doi.org/10.1023/B:
     AIRE.0000045502.10941.a9
[11] Heju Jiang, Jasvir Nagra, and Parvez Ahammad. 2016. SoK: Applying Machine
     Learning in Security-A Survey. arXiv preprint arXiv:1611.03186 (2016).
[12] Kaggle. 2018. Credit Card Fraud Detection Dataset. www.kaggle.com/isaikumar/
     creditcardfraud
[13] Benjamin J. Radford, Leonardo M. Apolonio, Antonio J. Trias, and Jim A. Simpson.
     2018. Network Traffic Anomaly Detection Using Recurrent Neural Networks.
     CoRR abs/1803.10769 (2018). arXiv:1803.10769 http://arxiv.org/abs/1803.10769
[14] Mei-ling Shyu, Shu ching Chen, Kanoksri Sarinnapakorn, and Liwu Chang. 2003.
     A novel anomaly detection scheme based on principal component classifier.
     In in Proceedings of the IEEE Foundations and New Directions of Data Mining
     Workshop, in conjunction with the Third IEEE International Conference on Data
     Mining (ICDMâĂŹ03. 172–179.
[15] Aaron Tuor, Samuel Kaplan, Brian Hutchinson, Nicole Nichols, and Sean Robin-
     son. 2017. Deep Learning for Unsupervised Insider Threat Detection in Struc-
     tured Cybersecurity Data Streams. CoRR abs/1710.00811 (2017). arXiv:1710.00811
     http://arxiv.org/abs/1710.00811
[16] K. Veeramachaneni, I. Arnaldo, V. Korrapati, C. Bassias, and K. Li. 2016. AI 2 :
     Training a Big Data Machine to Defend. In 2016 IEEE 2nd International Conference
     on Big Data Security on Cloud (BigDataSecurity). 49–54.

</pre>