<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Domain Knowledge Aided Explainable Artificial Intelligence for Intrusion Detection and Response</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Sheikh Rabiul Islam, William Eberle, Sheikh K. Ghafoor, Ambareen Siraj, Mike Rogers Departmemt of Computer Science Tennessee Technological University Cookeville, U.S</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>23</fpage>
      <lpage>25</lpage>
      <abstract>
        <p>Artificial Intelligence (AI) has become an integral part of modern-day security solutions for its ability to learn very complex functions and handling “Big Data”. However, the lack of explainability and interpretability of successful AI models is a key stumbling block when trust in a model's prediction is critical. This leads to human intervention, which in turn results in a delayed response or decision. While there have been major advancements in the speed and performance of AI-based intrusion detection systems, the response is still at human speed when it comes to explaining and interpreting a specific prediction or decision. In this work, we infuse popular domain knowledge (i.e., CIA principles) in our model for better explainability and validate the approach on a network intrusion detection test case. Our experimental results suggest that the infusion of domain knowledge provides better explainability as well as a faster decision or response. In addition, the infused domain knowledge generalizes the model to work well with unknown attacks, as well as opens the path to adapt to a large stream of network traffic from numerous IoT devices.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Most of the recent advancements in Artificial Intelligence
(AI), and more specifically Machine Learning (ML), have
come from complex non-linear models such as Deep
Neural Networks, Ensemble Methods, and Support Vector
Machines. These models are also known as “black box” models
as they are complex to interpret and explain, which arises
from their inherent non-linear capabilities, multiple
parameters, and very complex transformations. In addition, some
algorithms require a very large number of samples (i.e., large
training sets) to work efficiently, where it is very difficult
to figure out what the model learned from the dataset and
which portion of the data set has more influence on the
output (Kabul 2018).</p>
      <p>
        Due to these challenges, the black box models lacks
explainability and interpretability, ultimately resulting in a lack
of trust in the model and prediction, as well as possibly
leading to a delayed human response/decision. This
limitation also involves ethical issues in a few sensitive domains
like finance (e.g., credit approval), health care (e.g.,
disease diagnosis), and security (e.g., identifying target). For
instance, AI and ML are becoming an integral part of
security solutions and defense. To mitigate the unethical use
of AI as well as to promote the responsible use of AI
systems, various governments have started taking different
precautionary initiatives. Recently, the European Union
implemented the rule of “right of explanation”, where a user can
ask an explanation of algorithmic decision (Goodman and
Flaxman 2017). In addition, more recently the US
government introduced a new bill, the “Algorithmic Accountability
Act”, which would require companies to assess their
machine learning systems for bias and discrimination, with a
need to take corrective measures
        <xref ref-type="bibr" rid="ref30 ref31">(Wyden 2019)</xref>
        . The U.S.
Department of Defense (DoD) has identified explainability
as a key stumbling block in the adoption of AI-based
solutions in many of their projects. Their DARPA division has
invested $2 billion on an Explainable Artificial Intelligence
(XAI) program (Turek
        <xref ref-type="bibr" rid="ref19 ref21 ref24 ref25">2019; Rankin 2019</xref>
        ).
      </p>
      <p>
        Network intrusions are a common cyber-crime activity,
estimated to cost around $6 trillion annually in damages by
2021
        <xref ref-type="bibr" rid="ref12">(Doyle 2019)</xref>
        . In order to combat these attacks, an
Intrusion Detection System (IDS) is a security system to
monitor network and computer systems (Hodo et al. 2016).
Research in AI-based IDS has shown promising results (Hodo
et al. 2016),
        <xref ref-type="bibr" rid="ref28">(Shone et al. 2018)</xref>
        , (Kim et al. 2016), (Javaid et
al. 2016), (Li, Sun, and Wang 2012), and has become an
integral part of security solutions due to its capability of
learning complex, nonlinear functions and analyzing large data
streams from numerous connected devices. A recent survey
by
        <xref ref-type="bibr" rid="ref11">(Dong and Wang 2016)</xref>
        suggests that deep learning-based
methods are accurate and robust to a wide range of attacks
and sample sizes. However, there are concerns regarding the
sustainability of current approaches (e.g., intrusion
detection/prevention systems) when faced with the demands of
modern networks and the increasing level of human
interaction
        <xref ref-type="bibr" rid="ref28">(Shone et al. 2018)</xref>
        . In the age of IoT and Big Data,
an increasing number of connected devices and associated
streams of network traffic have exacerbated the problem.
In addition, the delay in detection/response increases the
chance of zero day exploitation, whereby a previously
unknown vulnerability is just discovered by the attacker, and
the attacker immediately initiates an attack. However,
improved explainability of an AI model could quicken
interpretation, making it more feasible to accelerate the response.
      </p>
      <p>
        Explainability is the extent to which the internal working
mechanism of the machine or AI system can be explained
in human terms. And interpretability is the extent to which
a cause and effect (i.e., understanding of what’s
happening) can be observed within a system. In other words,
interpretability is a form of abstract knowledge about what’s
happening and explainability is about the detailed
step-bystep knowledge of what is happening
        <xref ref-type="bibr" rid="ref17">(Montavon, Samek,
and Mu¨ller 2018)</xref>
        , (Turek 2019). However, while some
literature treat interpretability and explainability as the same,
they are actually two different traits of a model. Just because
a model can be interpreted does not mean that it can be
explained, and explainability needs to go beyond the algorithm
(Lipton 2016).
      </p>
      <p>
        Explainability and interpretability of a model could be
achieved before, during, and after modeling. From the
literature, we find that interpretability in pre-modeling (i.e.,
before modeling) is under-focused.
        <xref ref-type="bibr" rid="ref14">(Miller 2018)</xref>
        argue that
explainability should incorporate knowledge from a
different domain such as philosophy, psychology, and cognitive
science, so that the explanation is not just based on the
researcher’s intuition of what constitutes a good explanation.
However, we also find that the use of domain knowledge for
explainability is under-focused. In this work, we introduce
a novel approach for an AI-based explainable intrusion
detection and response system, and demonstrate its
effectiveness by infusing a popular network security principle (CIA
principle) into the model for better explainability and
interpretability of the decision.
      </p>
      <p>
        We use a recent and comprehensive IDS dataset
(CICIDS2017) which covers necessary criteria with
common updated attacks such as DDoS, Brute Force, XSS,
SQL Injection, Infiltration, Portscan, and Botnet. We
infuse CIA principle in the model that provides a concise
and interpretable set of important features. Computer
security rest on CIA principles, C stands for
confidentiality—concealment of information or resources, I stands for
integrity—trustworthiness of data or resources, and A stands
for availability—ability to use the information or resource
desired
        <xref ref-type="bibr" rid="ref13">(Matt and others 2006)</xref>
        . For instance, security
compromise in confidentiality could happen through
eavesdropping unencrypted data, compromise in integrity could
happen through an unauthorized attempt to change data, and the
compromise in availability could happen through the
deliberate arrangement of denial to access data or service.
      </p>
      <p>We also convert the domain knowledge infused features
into three features C, I, and A by quantitatively
computing compromises associated with each of those for each
record. Output expressed as these generalized and newly
constructed set of features provides better explainability
with negligible compromises in performance. We also found
that generalization provides more resiliency against
unknown attacks.</p>
      <p>In summary, our contributions in this work are as follows:
(1) we demonstrate a method for the collection and use of
domain knowledge in an intrusion detection/response
system; (2) we introduce a way to bring popular security
principles (e.g., CIA principles) to aid in interpretability and
explainability; (3) our experimental results show that infusing
domain knowledge into “black box” models can make them
better explainable with little or no compromise in
performance ; and (4) domain knowledge infusion increases
generalizability, which leads to better resiliency against unknown
attack.</p>
      <p>We start with a background of related work (Section 2)
followed by a description of our proposed approach, an
intuitive description of standard supervised algorithms, and an
overview of the dataset (Section 3) used in this work. In
Section 4, we describe our experiments, followed by Section 5
which contains discussion on results from the experiments.
We conclude with limitations and future work in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Background</title>
      <p>
        Research in Explainable Artificial Intelligence (XAI) is a
reemerging field, after the earlier work of (Chandrasekaran,
Tanner, and Josephson 1989), (Swartout and Moore 1993),
and (Swartout 1985). Previous work focused on primarily
explaining the decision process of knowledge-based
systems and expert systems. The classical learning paradigm
Explanation-Based Learning (EBL), introduced in the early
’80s, can also be regarded as a precursor of explainability.
EBL involves learning a problem-solving technique by
observing and analyzing solutions to a specific problem
        <xref ref-type="bibr" rid="ref10">(DeJong 1981)</xref>
        , (Mitchell, Keller, and Kedar-Cabelli 1986). The
main reason for the renewed interest in XAI research has
stemmed from recent advancements in AI and ML and their
application to a wide range of areas, as well as concerns over
unethical use and undesired biases in the models. In
addition, recent concerns and laws by different governments are
necessitating more research in XAI.
      </p>
      <p>
        <xref ref-type="bibr" rid="ref32">(Yang and Shafto 2017)</xref>
        , use Bayesian Teaching, where
a smaller subset of examples is used to train the model
instead of the whole dataset. The subset of examples is chosen
by domain experts as the examples are most relevant to the
problem of interest. However, for this purpose, choosing the
right subset of examples in the real-world is challenging.
      </p>
      <p>
        <xref ref-type="bibr" rid="ref11">(Lei, Barzilay, and Jaakkola 2016)</xref>
        propose an approach
for sentiment analysis where a subset of text from the whole
text is selected as the rationale for the prediction. In addition,
the selected subset of text is concise and sufficient enough
to act as a substitute for the original text, and still capable of
making the correct prediction. Although their approach
outperforms available attention-based models (from deep
learning) with variable-length input (e.g., a model for document
summarization) , it is limited to only text analysis.
      </p>
      <p>
        When the explanation is based on feature importance, it is
necessary to keep in mind that features that are globally
important may not be important in the local context, and vice
versa
        <xref ref-type="bibr" rid="ref11 ref23">(Ribeiro, Singh, and Guestrin 2016)</xref>
        .
        <xref ref-type="bibr" rid="ref11 ref23">(Ribeiro, Singh,
and Guestrin 2016)</xref>
        propose a novel explanation technique
capable of explaining the prediction of any classifier (i.e., in
the model agnostic way) with a locally interpretable model
(i.e., in the vicinity of the instance being predicted) around
the prediction. Their concern is on two issues: (1) whether
the user should trust the prediction of the model and act on
that, and (2) whether the user should trust a model to
behave reasonably-well when deployed. In addition, they
involve human judgment in their experiment (i.e., human in
the loop) to decide whether to trust the model or not.
      </p>
      <p>(Kim et al. 2017) propose a concept attribution-based
approach (i.e., sensitivity to concept) that provides an
interpretation of the neural network’s internal state in terms of
human-friendly concepts. Their approach, Testing with CAV
(TCAV), quantifies the prediction’s sensitivity to the high
dimensional concept. For example, a user-defined set of
examples that defines the concept “striped”, TCAV can quantify
the influence of “striped” in the prediction of “zebra” as a
single number. To learn the high dimensional concepts they
use a Concept Activation Vector (CAV) —CAVs are learned
from training a linear classifier that can distinguish between
the activations produced by a particular concept’s examples
and examples in any layer.</p>
      <p>
        Most of these approaches try to find out how the
prediction deviates from the base/average scenario. Lime
        <xref ref-type="bibr" rid="ref11 ref23">(Ribeiro,
Singh, and Guestrin 2016)</xref>
        tries to generate an
explanation by locally (i.e., using local behavior) approximating the
model with an interpretable model (e.g., decision trees,
linear model). However, it is limited by the use of the only
linear model to approximate the local behavior. (Lundberg
and Lee 2017) propose “SHAP” which unifies seven
previous approaches: LIME
        <xref ref-type="bibr" rid="ref11 ref23">(Ribeiro, Singh, and Guestrin 2016)</xref>
        ,
DeepLIFT
        <xref ref-type="bibr" rid="ref29">(Shrikumar, Greenside, and Kundaje 2017)</xref>
        , Tree
Interpreter
        <xref ref-type="bibr" rid="ref1">(Ando 2019)</xref>
        , QII
        <xref ref-type="bibr" rid="ref11 ref9">(Datta, Sen, and Zick 2016)</xref>
        ,
Shapley sampling values (Sˇ trumbelj and Kononenko 2014),
Shapley regression values (Lipovetsky and Conklin 2001),
and Layer-wise relevance propagation
        <xref ref-type="bibr" rid="ref2">(Bach et al. 2015)</xref>
        to
make the explanation of prediction for any machine
learning model. While SHAP comes with theoretical guarantees
about consistency and local accuracy from game theory, it
needs to run many evaluations of the original model to
estimate a single vector of feature importance (Lundberg 2019).
ELI5 also uses the LIME algorithm internally for
explanations. In addition, ELI5 is not truly model agnostic, mostly
limited to tree-based and other parametric or linear models.
Furthermore, Tree Interpreter is limited to only tree-based
approaches (e.g., Random Forest, Decision Trees).
      </p>
      <p>
        AI-based IDSs have continued to show promising
performance (Hodo et al. 2016),
        <xref ref-type="bibr" rid="ref28">(Shone et al. 2018)</xref>
        ,(Kim et al.
2016),(Javaid et al. 2016),(Li, Sun, and Wang 2012).
        <xref ref-type="bibr" rid="ref28">(Shone
et al. 2018)</xref>
        propose an approach in combination of both
shallow (Random Forest) and deep learning (Auto Encoder),
capable of analyzing a wide range of network traffic,
outperforming mainstream Deep Belief Networks (DBN). In a
literature survey on traditional IDS vs deep learning IDS by
        <xref ref-type="bibr" rid="ref11">(Dong and Wang 2016)</xref>
        , they suggest deep learning-based
methods provide better accuracy for a wide range of
samples sizes and a variety of network traffic or attacks
        <xref ref-type="bibr" rid="ref11">(Dong
and Wang 2016)</xref>
        . However, in all of the previous work, there
are still long training times and a reliance on a human
operator
        <xref ref-type="bibr" rid="ref28">(Shone et al. 2018)</xref>
        .
      </p>
      <p>
        However, incorporating domain knowledge for
explainability has garnered little attention. Previously, we
introduced the concept of infusing domain knowledge (Islam et
al. 2019), albeit for bankruptcy prediction with a limited
focus.
        <xref ref-type="bibr" rid="ref14">(Miller 2018)</xref>
        have argued that incorporating
knowledge from different domains will provide better
explainability. In addition, (Kim et al. 2017) use the prediction’s
sensitivity to high dimensional concepts (e.g., the concept
“striped” to “Zebra”) for explaining the prediction.
Furthermore, both LIME
        <xref ref-type="bibr" rid="ref11 ref23">(Ribeiro, Singh, and Guestrin 2016)</xref>
        and
SHAP (Lundberg and Lee 2017) use a simplified input
mapping—mapping the original input to a simplified set of input.
To the best of our knowledge, none of the models
incorporate domain knowledge with a focus towards better
explainability and interpretability. Although our proposed
conceptual model comes with a negligible compromise in accuracy,
it comes with better explainability and interpretibility, and
scalability to big data problems.
      </p>
      <p>3</p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>The proposed approach consists of two components: a
feature generalizer, which gives a generalized feature set with
the help of domain knowledge in two different ways; and an
evaluator that produces and compares the results from the
“black box” model for multiple configurations of features:
domain knowledge infused features, newly constructed
features from domain knowledge infused features, selected
features, and all features.
The feature generalizer (Figure 1, top portion), takes
original features of the dataset (X1, X2, .... Xn 2 X where X
is the set of all features) and infuse domain knowledge to
produce/re-construct a concise and better interpretable
feature set (X1’, X2’, ..... Xk’ 2 X’ where X’ is the universal
set of original/transformed/constructed features, but here k
is much smaller than n) in two different ways:
Feature Mapping As stated earlier, we use CIA
principles as domain knowledge, which stands for confidentiality,
integrity, and availability. We analyze all types of attacks
for associated compromises in each component of CIA
principles (see Table 1). The Heartbleed vulnerability is related
to a compromise in confidentiality as an attacker could gain
access to the memory of the systems protected by the
vulnerable version of the OpenSSL. A Web attack (e.g., Sql
injection) is related to a compromise in confidentiality and
integrity (e.g., read/write data using injected query), and
availability (e.g., flooding the database server with injected
complex queries like a cross join). Infiltration attack is related to
a compromise in confidentiality as it normally exploits
software vulnerability (e.g., Adobe Acrobat Reader) to create a
backdoor and reveal information (e.g., IP’s). Port scan attack
is related to a compromise in confidentiality as the attacker
sends packets with varying destination ports to learn the
services and operating systems from the reply. All DoS and
DDoS attacks are related to a compromise in availability as
it aims to hamper the availability of service or data.
Furthermore, SSH patator and FTP patator are brute force attacks
and are usually responsible for a compromise in
confidentiality. Botnet (i.e., robot network—a network of
malwareinfected computers) could provide a remote shell, file
upload/download option, screenshot capture option, and key
logging options which has potential for all of the
confidentiality, integrity, and availability related compromises.</p>
      <p>
        Furthermore, from the feature ranking of the
original dataset provider
        <xref ref-type="bibr" rid="ref26">(Sharafaldin, Lashkari, and Ghorbani
2018)</xref>
        , for each type of attack, we take the top three
features according to their importance (i.e., feature importance
from Random Forest Regressor) and calculate the mapping
(Table 2) with related compromises under CIA principles.
For example, the feature Average Packet Size is renamed as
Avg Packet Size - A where -A indicates that it is a key feature
for the compromise of availability (see Table 2). To get this
mapping between feature and associated compromises, we
first find the mapping between an attack and related
compromises (from Table 1, formulated as Equation 2). In other
words, Formula 1 gives the name of the associated attack
where the feature is in the top three feature to identify that
particular attack and Formula 2 gives associated
compromises in C, I, or A from the attack name. Thus, with the
help of domain knowledge, we keep 22 features (see Table
2) out of a total of 78 features. We will refer to these features
as the domain features. The feature descriptions in Table 2
are taken from the data processing software’s website (net
2019).
      </p>
      <p>f (f eature) ! attack
f (attack) ! C; I; orA
(1)
(2)
Feature Construction We also construct three new
features, C, I, and A, from the domain features by
quantitatively calculating compromises associated with each of the
domain features. For that purpose, we calculate the
correlation coefficient vector of the dataset to understand whether
the increase in the value of a feature has a positive or
negative impact on the target variable. We then convert the
correlation coefficient (a.k.a coeff ) vector V in to a 1 or -1 based
on whether the correlation coefficient is positive or negative
accordingly. We also group the domain features and
corresponding coeff tuple into three groups. Using formula 3, 4,
and 5, we aggregate each group (from C, I, and A) of domain
features into the three new features C, I, and A. We also scale
all feature values from 0 to 1 before starting the aggregation
process. During the aggregation for a particular group (e.g.,
C), if the correlation coefficient vector (e.g., Vi) for a feature
(e.g., Ci) of that group has a negative value, then the product
of the feature value and the correlation coefficient for that
feature is deducted, and vice-versa if positive. In addition,
when a feature is liable for more than one compromise, the
feature value is split between the associated elements of CIA
principles.</p>
      <p>n
C = X
I =</p>
      <p>CiV i
i=0
n
X IiV i
i=0
n
A = X
i=0</p>
      <p>AiV i
(3)
(4)
(5)
3.3</p>
      <sec id="sec-3-1">
        <title>Evaluator</title>
        <p>The task of the evaluator (Figure 1, bottom side) is to
execute (supervised models or algorithms) and compare the
performance (in detecting malicious and benign records) of
four different types of configurations of features, as follows:
(1) using all features, (2) using selected features (selection
is done by feature selection algorithm), (3) using domain
knowledge infused features, and (4) using newly constructed
features C, I, and A from domain knowledge infused
features. In addition, the evaluator performs the following two
tests:
1. Explainability Test: The purpose of this test is to discover
the comparative advantages or disadvantages of
incorporating domain knowledge in the experiment; and
2. Generalizability Test: The purpose of this test is to
analyze how different approaches perform in unknown or
unseen attack detection. We delete all training records for
a particular attack one at a time and investigate the
performance of the model on the same test set, which includes
records from unknown or unseen attacks. Details of these
tests are described in Section 4.
3.4</p>
      </sec>
      <sec id="sec-3-2">
        <title>Algorithms</title>
        <p>We use six different algorithms for predicting malicious
records: one of those is a probabilistic classifier based on
Naive Bayes theorem, and the remaining five are supervised
“black box” models. The algorithm descriptions are taken
from our previous work (Islam et al. 2019).</p>
      </sec>
      <sec id="sec-3-3">
        <title>Artificial Neural Network (ANN) An Artificial Neural</title>
        <p>Network is a non-linear model, capable of mimicking
human brain functions to some extent. It consists of an input
layer, one or multiple hidden layer(s), and the output layer.
Each layer consists of multiple neurons that help to learn the
complex pattern. Each subsequent layer learns more abstract
concepts before it finally merges into the output layer.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Support Vector Machine (SVM) The Support Vector</title>
        <p>
          Machine (SVM) was first introduced by
          <xref ref-type="bibr" rid="ref3">(Boser, Guyon, and
Vapnik 1992)</xref>
          and has been used for many supervised
classification tasks. In addition to linear classification, the model
can learn an optimal hyperplane that separates instances of
different classes using a highly non-linear implicit mapping
of input vectors in high dimensional feature space (i.e.,
kernel trick) (Hooman et al. 2016). When the number of
samples is too high (i.e., millions) then it is very costly in terms
of computation time.
        </p>
        <p>In top 3 features of attack</p>
        <p>Renamed feature
SSH-Patator ACK Flag Count - C
DoS Slowhttp, Infiltration Active Mean - AC
DoS Slowhttp Active Min - A
DDoS Avg Packet Size - A
DoS slowloris Bwd IAT Mean - A
DoS Hulk, DoS GoldenEye, DDoS, Bwd Packet Length Std - AC
Heartbleed, DoS Hulk
Bot, PortScan
DoS slowloris
DoS slowloris, DoS GoldenEye
Benign, Bot
FTP-Patator
FTP-Patator</p>
        <p>Bwd Packets/s - CIA
Fwd IAT Mean - A
Fwd IAT Min - A
Fwd Packet Length Mean - CIA
Fwd Packets/s - C</p>
        <p>Fwd PSH Flags - C
DDoS, DoS slowloris, DoS Hulk,
DoS Slowhttp, Infiltration,
Heartbleed
DoS GoldenEye
DoS GoldenEye
DDoS, DoS Slowhttp, DoS Hulk
Web Attack
PortScan
Benign, SSH-Patator, Web Attack,
Bot, Heartbleed, Infiltration
FTP-Patator
Benign, SSH-Patator, Web Attack,
Bot, Heartbleed, Infiltration</p>
        <p>
          Flow Duration - AC
Flow IAT Mean - A
Flow IAT Min - A
Flow IAT Std - A
Init Win Bytes Fwd - CIA
PSH Flag Count - C
Subflow Fwd Bytes - CIA
SYN Flag Count - C
Total Length of Fwd Packets - CIA
Random Forest (RF) A Random Forest is a tree-based
ensemble technique developed by
          <xref ref-type="bibr" rid="ref4">(Breiman 2001)</xref>
          for the
supervised classification task. In RF, many trees are
generated from the bootstrapped subsamples (i.e., random sample
drawn with replacement) of the training data. In each tree,
the splitting attribute is chosen from a smaller random
subset of attributes of that tree (i.e., the chosen split attribute
that is the best among that random subset). This
randomness helps to make trees less correlated as correlated trees
make the same kinds of prediction errors and can overfit the
model. In less correlated trees, a few trees may be wrong
but many others will be right and as a group the trees can
move in the right direction as the output from all the trees
are averaged for the final prediction.
        </p>
        <p>Extra Trees (ET) Extremely Randomized Trees or
Extra Trees (ET) is a tree-based ensemble technique simialr
to RF. The only difference is in the process of splitting
attribute selections and determining the threshold (cutoff)
value, both are chosen in an extremely random fashion
(Islam, Ghafoor, and Eberle 2018). Similar to RF, a random
subset of features are taken into consideration for the split
selection, but instead of choosing the most discriminative
cut off threshold, ET cut off thresholds are set to random
values. Thus, the best of these randomly chosen values is set
as the threshold for the splitting rule (ens 2019) on a
particular node. Unlike DT, RF has multiple trees which leads to
a reduced variance. However, bias is introduced, as a subset
of the whole feature set is chosen for each tree instead of
all features. ET was proposed by (Geurts, Ernst, and
Wehenkel 2006), and has achieved a state of the art
performance in some anomaly/intrusion detection research (Islam
2018), (Islam, Eberle, and Ghafoor 2018),(Islam, Ghafoor,
and Eberle 2018).</p>
        <p>Gradient Boosting (GB) (Friedman 2001), generalized
Adaboost to a Gradient Boosting algorithm to allow a
variety of loss function. Here the shortcoming of weak learners
is identified using the gradient instead of highly weighted
data points as in Adaboost. Gradient Boosting (GB) is a
classifier/regression model in the form of an ensemble of
weak prediction models, such as Decision Trees. It works
sequentially like the AdaBoost algorithm, in that each
subsequent model tries to minimize the loss function (i.e., Mean
Squared Error) by paying special focus on instances that
were hard to get right in the previous model.</p>
        <p>
          Naive Bayes (NB) Naive Bayes algorithm is based on
Bayes Theorem, which was formulated in the seventeenth
century. It is a supervised, simple, and comparatively fast
algorithm based on statistics. In a real-world problem, it is
unusual that all features are independent. However, Naive
Bayes assumes conditional independence among features
and surprisingly works well in many cases. It also requires a
small amount of training data to estimate the necessary
parameters (nai 2019). This assumption of Naive Bayes helps
to avoid lots of computations (e.g., computing the
conditional probability for each feature with others) and makes
it a faster algorithm. Besides, the avoidance of a
conditional probability calculation helps (the class conditional
feature distribution can be independently estimated as
onedimensional distribution) in Big Data problems where the
curse of dimensionality is a concern. However, NB is a bad
estimator of a probabilty
          <xref ref-type="bibr" rid="ref33">(Zhang 2004)</xref>
          . We use the Bernoulli
Naive Bayes (Manning, Raghavan, and 2010) for our
experiments where each feature is assumed to be binary-valued.
3.5
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>Data</title>
        <p>
          In this work, we use a recent and comprehensive IDS dataset
namely CICIDS2017, published in 2017, covers necessary
criteria with common updated attacks such as DoS, DDoS,
Brute Force, XSS, SQL Injection, Infiltration, Portscan,
and Botnet. In fact, this dataset is created to eliminates
the shortcomings (e.g., lack of traffic diversity and
volume, lack of variety of attacks, anonymized packet
information, and out of date) of previous well known IDS dataset
such as DARPA98, KDD99, ISC2012, ADFA13, DEFCON,
CAIDA, LBNL, CDX, Kyoto, Twente, and UMASS since
1998. This is a labeled dataset containing 78 network
traffic features (some features are listed in Table 2) extracted
and calculated from pcap file using CICFlowMeter
software (Lashkari et al. 2017) for all benign and intrusive flows
          <xref ref-type="bibr" rid="ref26">(Sharafaldin, Lashkari, and Ghorbani 2018)</xref>
          . This new IDS
dataset includes seven common updated family of attacks
satisfying real-world criteria, also publicly available at here:
https://www.unb.ca/cic/datasets/ids-2017.html .
        </p>
        <p>
          Each record of the dataset is labeled by the particular type
of attack. We make a new feature “Class”, which is the
target feature. We set the value of the “Class” attribute to 1 for
all records labeled as any of 14 types of attacks, as those
are malicious/intrusive, and set the value to 0 for the
remaining records as those are benign. Following that, in the
whole dataset, there are total 2,830,743 records for 14
different attacks, 2,273,097 are benign and 557,646 are malicious.
Approximately 24.5% of the records are malicious, giving
us an imbalanced dataset which impacts the performance
(e.g., bias to the class of majority samples) of some machine
learning algorithms. To overcome this problem, we use the
well-known oversampling technique SMOTE
          <xref ref-type="bibr" rid="ref7">(Chawla et al.
2002)</xref>
          to oversample the minority class. In
          <xref ref-type="bibr" rid="ref11">(Dong and Wang
2016)</xref>
          , the author uses SMOTE to overcome the issue in
their empirical study on the comparison of traditional vs
deep learning-based IDS. SMOTE creates synthetic samples
rather than just oversampling with replacement. The
minority class is oversampled by creating new examples along
with the line segments joining any or all of k nearest
minority samples, where k is chosen based on the percentage
of oversampling required (i.e., hyperparameter to the
algorithm)
          <xref ref-type="bibr" rid="ref7">(Chawla et al. 2002)</xref>
          .
        </p>
        <p>4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>4.1</p>
      <sec id="sec-4-1">
        <title>Experimental Setup</title>
        <p>
          We execute the experiments in a GPU enabled Linux
machine with 12GB of RAM and core i7 processor. All
supervised machine learning algorithms are implemented using
the Python-based Scikit-learn (sci 2019) library. In addition,
we use Tensorflow (ten 2019) for the Artificial Neural
Network. Due to resource limitations, instead of using the whole
dataset, we take a stratified sample of the data which is big
enough (i.e., 300K records) for a single GPU enabled
commodity machine. We make the sampled dataset available to
the research community at (sam 2019). Furthermore, we use
70% of the data for training the models and kept 30% of the
data as a holdout set to test the model. We confirm the target
class had the same ratio in both sets. To avoid the adverse
effect of class imbalance in classification performance, we
resample the minority class of the training set using SMOTE
          <xref ref-type="bibr" rid="ref7">(Chawla et al. 2002)</xref>
          to balance the dataset. However, we do
not re-sample the test set, as real-world data is skewed and
oversampling the test set could exhibit an overoptimistic
performance.
        </p>
        <p>
          We run all supervised machine learning algorithm using
four different approaches:
1. With all features: using all 78 features of the dataset
without discarding any features.
2. With selected features: using Random Forest
Regressor (adapting with the work of
          <xref ref-type="bibr" rid="ref26">(Sharafaldin, Lashkari,
and Ghorbani 2018)</xref>
          ) to select important features of the
dataset, giving us 50 important features having a nonzero
influence on the target variable;
3. With domain knowledge infused features: using infused
domain knowledge features (see Section 3.2), we will use
the term domain features interchangeably to express it in
short form; and
4. With newly constructed features from domain
knowledge infused features: using newly constructed features
C, I, and A (see Section 3.2) from domain knowledge
infused features, we will use the term domain
featuresconstructed interchangeably to express it in short form.
        </p>
        <p>The following are two types of experiments using each of
the four feature settings.
For this test, we run six supervised algorithms RF, ET, SVM,
GB, ANN, and NB using the four described feature settings
and report the results Section 5.1. Unlike NB, other
classifiers are “black box” in nature. NB is a probabilistic
classifier based on Bayes Theorem with strong conditional
independence assumption among features. The main purpose to
include NB in the experiment is the generalizability test.
For testing the generalizability of the approach, we train the
classifier without the representative of a particular attack,
and test it with the presence of the representative of that
particular attack, in order to classify it malicious/benign. To be
more specific, we delete all records of a particular attack
from the training set, train the classifier with the records of
the remaining 13 attacks, and test the classifier with all 14
attacks. We report the percentage of deleted attacks that are
correctly detected as malicious (see Section 5.2). We repeat
this one by one for all 14 attacks. We make the source code
available to the research community to replicate the
experiments at (pro 2019).</p>
        <p>5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>The following sections discuss results from the two
categories of experiments previously described.
5.1</p>
      <sec id="sec-5-1">
        <title>Findings from Explainability Test</title>
        <p>Comparing the performance using all features vs selected
features, Table 3 shows that models using all features
(denoted with an appended -A, for instance RF-A) tend to show
better results in terms of all performance metrics. However,
while the difference with the selected features setting is
negligible (&lt;.0007 for RF) for any performance metric, that
might be a result of the elimination of features with little
significance. In addition, Random Forest outperforms other
algorithms SVM, ET, GB, ANN, and NB under this feature
setting (i.e., using all features). So we consider the results
using all features as a baseline to compare against our
proposed approach.</p>
        <p>Before starting the comparison of results from our
approach with all features (i.e., baseline), we seek the best
feature setting among two domain related feature settings
of our proposed approach. In other words, in our attempt to
find the better approach among using domain knowledge
infused features vs newly constructed features (C, I, and A)
from domain knowledge infused features, we find that, in
almost all cases, the model with domain knowledge infused
features (denoted with an appended -D1, for instance
RFD1) performs better than the counterpart (see Table 4).
Although for RF, the maximum performance gap is .2 in the
recall, for ET that gap is .048 with a similar precision. As
the domain features (22 features) contain a lot more detail
than the newly constructed features C, I, and A (3 features),
it loses few details. In terms of individual algorithms, RF
is again a clear winner this time using domain features.
Although NB and ANN exhibit better recall using constructed
features, it comes with compromises in precision. So,
overall we consider the domain features setting as the best over
the constructed features.</p>
        <p>
          While we know the best feature setting is the all features,
as shown in the comparison of all features vs selected
features in the Table 3), we also know the best feature setting
domain features from domain features vs constructed
features (see Table 4). So we further compare the performance
of models using the two best settings all features (i.e.,
baseline) vs domain features. We find that, among all models, RF
using all features (denoted with an appended -A, for instance
RF-A) performs better than all other algorithms (see
Table 5 and Figure 3). Interestingly, RF using domain
knowledge infused features (denoted with an appended -D1, for
instance RF-D1) also shows promising performance. The
difference between these two in terms of any performance
metrics is negligible (&lt;.005). In fact, the result of RF
using the domain knowledge infused feature settings is better
than what
          <xref ref-type="bibr" rid="ref26">(Sharafaldin, Lashkari, and Ghorbani 2018)</xref>
          reports using the same dataset. The slight improvement might
stem from the experimental settings (e.g., training test set
split, re-sampling techniques). Furthermore, in the domain
knowledge infused feature setting we are using only 22
features out of 78 total, where each feature indicates the
associated compromises (e.g., confidentiality, integrity, or
availability), capable of producing better explainable and
interpretable results compared to the counterpart. The prediction
for a particular sample can be represented as:
        </p>
        <p>G
P (D) = b + X contribution(g)
g=0
(6)
where b is the model average and g is the generalized
domain feature (e.g., ACK Flag Count - C), P(D) is the
probability value of the decision. Instead of using contributions
from each of the domain features, we can express the output
in terms of the contribution from each element of the
domain concept. For that, we need to aggregate contributions
from all features into three groups (C, I, and A). This will
enable an analyst to understand the nature of the attack more
quickly (Figure 2). For instance, when the greater portion of
a feature contribution for a sample is from features tagged
with -A (i.e., Availability) then it might be a DDoS attack,
which usually comes with very high compromises in
availability of data or service. We use the iml package from the
programming language R to generate the breakdown of
feature contributions of a particular sample’s prediction (Figure
2).
Recall that the purpose of this test is to test the resiliency
against unknown attacks. First, we use Random Forest (RF),
the best performing algorithm so far, using all four settings
of features. As shown in Table 6 and Figure 4, we see that
except for the constructed feature settings (denoted by Cons.),
the performances of other feature settings (all, selected, and
domain) are similar. The constructed features fail to provide
comparable performance for RF as it has only three
features and loses data details (i.e., too much generalization).
Surprisingly, a few unknown attacks are only detectable
using the domain knowledge infused features. For instance,
Web Attack Sql Injection is detected as suspicious only by
domain knowledge infused features. Overall, although the
domain knowledge infused feature setting perform slightly
worse than the all feature setting, it comes with an
explainable features set with the added capability of identifying a
few unknown attacks.</p>
        <p>To reiterate, the constructed features set consists of only
three features (C, I, and A) constructed from aggregating
domain knowledge infused features. As this feature setting is
composed of only three features, it is an extreme
generalization of features and it loses a lot of details of data. However,
this time it comes with an exceptional capability which we
realize after applying a statistical approach (Naive Bayes)
on the dataset. We find that (see Table 7), for NB, the newly
constructed feature setting is best as NB is also able to detect
unknown attacks with similar accuracy compared to other
feature settings by RF in Table 6. The most interesting thing
about this capability is that this feature set is composed of
only three features (C, I, and A), takes comparatively less
time to execute, and comes with the added benefit of very
good explainability. Once the prediction is expressed as a
percentage of influence from each of C, I, and A, the analyst
would be able to perceive the level of compromise more
intuitively from the hints about the type of attack (e.g., DDoS
will show a high percentage of A—compromise in
Availability).</p>
        <p>
          However, from Table 3, 4, and 5, we can see that NB’s
performance comes at a cost of precision and recall (i.e.,
produces comparatively more false positives and false
negatives). In addition, NB is a bad probability estimator of
the predicted output
          <xref ref-type="bibr" rid="ref33">(Zhang 2004)</xref>
          . However, NB with
constructed features setting could be recommended as an
additional IDS for quick interpretation of huge traffic data given
the decision is treated as tentative with the requirement of
a further sanity check. We also calculate the average time
taken by each algorithm for all four feature settings and
found that NB is the fastest algorithm. RF, ET, GB, ANN,
and SVM take 2.80, 9.27, 77.06, 15.07, and 444.50 times
more execution time compared to NB. Besides, the best
algorithm, RF (1st in terms of the performance metric and 2nd
in terms of execution time), can be executed in parallel using
an Apache Spark for a far better run-time
          <xref ref-type="bibr" rid="ref8">(Chen et al. 2016)</xref>
          making it highly scalable to big data problems.
        </p>
        <p>Overall, domain knowledge infusion provides better
explainability with negligible compromises in performance. In
addition, the generalization provides better execution time
and resiliency with unknown attacks.</p>
        <p>6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and Future Work</title>
      <p>AI-based approaches have become an integral part of
security solutions due to the potential for handling “Big Data”
and handling diverse network traffic data.
Cybercrimerelated damages continue to rise, and network intrusions are
a key tactic. Although AI-based IDS provides accelerated
speeds in intrusion detection, response is still at a human
speed where there is human in the loop. The lack of
explainability of an AI-based model is a key reason for this
bottleneck. To mitigate this problem, we infuse the CIA principle
(i.e., domain knowledge) in the AI-based black box model
for better explainability and generalizability of the model.
Our experimental results show realizable successes in better
explainability with a comprehensive, up to date, and
realworld network intrusion dataset. In addition, the infused
domain knowledge helps in detecting an unknown attack as it
generalizes the problem, which ultimately opens the door to
accommodate big data.</p>
      <p>Going forward, finding an optimal solution to segregate
the contribution of each participating feature (sample wise)
considering interactions (i.e., correlations among features
complicate explanations) among features will aid in better
explainability of an individual prediction (i.e., per sample).
Besides, to ensure trust, estimating the level of uncertainty
in the model will be another extension of this work. There
are some open challenges surrounding explainability and
interpretability such as an agreement of what an explanation
is and to whom, a formalism for the explanation, and
quantifying the human comprehensibility of the explanation.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgment</title>
      <p>Thanks to Tennessee Tech’s Cyber-security Education,
Research and Outreach Center (CEROC) for supporting this
research.
forests.
2019. Ensemble methods.
ble/modules/ensemble.html.
https://scikit-learn.org
/staFriedman, J. H. 2001. Greedy function approximation: a
gradient boosting machine. Annals of statistics 1189–1232.
Geurts, P.; Ernst, D.; and Wehenkel, L. 2006. Extremely
randomized trees. Machine learning 63(1):3–42.
Goodman, B., and Flaxman, S. 2017. European union
regulations on algorithmic decision-making and a “right to
explanation”. AI Magazine 38(3):50–57.</p>
      <p>Hodo, E.; Bellekens, X.; Hamilton, A.; Dubouilh, P.-L.;
Iorkyase, E.; Tachtatzis, C.; and Atkinson, R. 2016. Threat
analysis of iot networks using artificial neural network
intrusion detection system. In 2016 International Symposium on
Networks, Computers and Communications (ISNCC), 1–6.
IEEE.</p>
      <p>Hooman, A.; Marthandan, G.; Yusoff, W. F. W.; Omid, M.;
and Karamizadeh, S. 2016. Statistical and data mining
methods in credit scoring. The Journal of Developing
Areas 50(5):371–381.</p>
      <p>Islam, S. R.; Eberle, W.; Bundy, S.; and Ghafoor, S. K. 2019.
Infusing domain knowledge in ai-based” black box” models
for better explainability with application in bankruptcy
prediction. arXiv preprint arXiv:1905.11474.</p>
      <p>Islam, S. R.; Eberle, W.; and Ghafoor, S. K. 2018. Credit
default mining using combined machine learning and heuristic
approach. arXiv preprint arXiv:1807.01176.</p>
      <p>Islam, S. R.; Ghafoor, S. K.; and Eberle, W. 2018.
Mining illegal insider trading of stocks: A proactive approach.
In 2018 IEEE International Conference on Big Data (Big
Data), 1397–1406. IEEE.</p>
      <p>Islam, S. R. 2018. An efficient technique for mining bad
credit accounts from both olap and oltp. Ph.D. Dissertation,
Tennessee Technological University.</p>
      <p>Javaid, A.; Niyaz, Q.; Sun, W.; and Alam, M. 2016. A
deep learning approach for network intrusion detection
system. In Proceedings of the 9th EAI International Conference
on Bio-inspired Information and Communications
Technologies (formerly BIONETICS), 21–26. ICST (Institute for
Computer Sciences, Social-Informatics and . . . .</p>
      <p>Kabul, I. K. 2018. Explainable ai. https://www.kdnuggets.
com/2018/11/interpretability-trust-ai-machinelearning.html.</p>
      <p>Kim, J.; Kim, J.; Thu, H. L. T.; and Kim, H. 2016. Long
short term memory recurrent neural network classifier for
intrusion detection. In 2016 International Conference on
Platform Technology and Service (PlatCon), 1–5. IEEE.
Kim, B.; Wattenberg, M.; Gilmer, J.; Cai, C.; Wexler, J.;
Viegas, F.; and Sayres, R. 2017. Interpretability beyond feature
attribution: Quantitative testing with concept activation
vectors (tcav). arXiv preprint arXiv:1711.11279.</p>
      <p>Lashkari, A. H.; Draper-Gil, G.; Mamun, M. S. I.; and
Ghorbani, A. A. 2017. Characterization of tor traffic using time
based features. In ICISSP, 253–262.</p>
      <p>Lei, T.; Barzilay, R.; and Jaakkola, T. 2016. Rationalizing
neural predictions. arXiv preprint arXiv:1606.04155.
Li, Z.; Sun, W.; and Wang, L. 2012. A neural network based
distributed intrusion detection system on cloud platform. In
2012 IEEE 2nd international conference on Cloud
Computing and Intelligence Systems, volume 1, 75–79. IEEE.
Lipovetsky, S., and Conklin, M. 2001. Analysis of
regression in game theory approach. Applied Stochastic Models in
Business and Industry 17(4):319–330.</p>
      <p>Lipton, Z. C. 2016. The mythos of model interpretability.
arXiv preprint arXiv:1606.03490.</p>
      <p>Lundberg, S. M., and Lee, S.-I. 2017. A unified approach
to interpreting model predictions. In Advances in Neural
Information Processing Systems, 4765–4774.</p>
      <p>Lundberg, S. 2019. Shap vs lime. https://github.com
/slundberg/shap/issues/19.</p>
      <p>Manning, C.; Raghavan, P.; and , H. 2010.
Introduction to information retrieval. Natural Language Engineering
16(1):100–103.
Sˇ trumbelj, E., and Kononenko, I. 2014. Explaining
prediction models and individual predictions with feature
contributions. Knowledge and information systems 41(3):647–665.
Swartout, W. R., and Moore, J. D. 1993. Explanation in
second generation expert systems. In Second generation expert
systems. Springer. 543–585.</p>
      <p>Swartout, W. R. 1985. Rule-based expert systems: The
mycin experiments of the stanford heuristic programming
project: Bg buchanan and eh shortliffe,(addison-wesley,
reading, ma, 1984); 702 pages.
2019. Tensorflow. https://www.tensorflow.org/.</p>
      <p>Turek, M. 2019. Explainable ai. https://www.darpa.mil/
program/explainable-artificial-intelligence.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Ando</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2019</year>
          . Interpreting random http://blog.datadive.net/interpreting-random-forests/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Bach</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Binder</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Montavon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Klauschen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ; Mu¨ller, K.-R.; and
          <string-name>
            <surname>Samek</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation</article-title>
          .
          <source>PloS one 10</source>
          <volume>(7)</volume>
          :
          <fpage>e0130140</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Boser</surname>
            ,
            <given-names>B. E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Guyon</surname>
            ,
            <given-names>I. M.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V. N.</given-names>
          </string-name>
          <year>1992</year>
          .
          <article-title>A training algorithm for optimal margin classifiers</article-title>
          .
          <source>In Proceedings of the fifth annual workshop on Computational learning theory</source>
          ,
          <volume>144</volume>
          -
          <fpage>152</fpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2001</year>
          .
          <article-title>Random forests</article-title>
          .
          <source>Machine learning 45(1)</source>
          :
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          1989.
          <article-title>Explaining control strategies in problem solving</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>IEEE Intelligent Systems (1)</source>
          :
          <fpage>9</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Chawla</surname>
            ,
            <given-names>N. V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bowyer</surname>
            ,
            <given-names>K. W.</given-names>
          </string-name>
          ; Hall,
          <string-name>
            <given-names>L. O.</given-names>
            ; and
            <surname>Kegelmeyer</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. P.</surname>
          </string-name>
          <year>2002</year>
          .
          <article-title>Smote: synthetic minority over-sampling technique</article-title>
          .
          <source>Journal of artificial intelligence research</source>
          <volume>16</volume>
          :
          <fpage>321</fpage>
          -
          <lpage>357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bilal</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Weng</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>A parallel random forest algorithm for big data in a spark cloud computing environment</article-title>
          .
          <source>IEEE Transactions on Parallel and Distributed Systems</source>
          <volume>28</volume>
          (
          <issue>4</issue>
          ):
          <fpage>919</fpage>
          -
          <lpage>933</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Datta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and Zick,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems</article-title>
          .
          <source>In 2016 IEEE symposium on security and privacy (SP)</source>
          ,
          <fpage>598</fpage>
          -
          <lpage>617</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>DeJong</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <year>1981</year>
          .
          <article-title>Generalizations based on explanations</article-title>
          .
          <source>In IJCAI</source>
          , volume
          <volume>81</volume>
          ,
          <fpage>67</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Comparison deep learning method to traditional methods using for network intrusion detection</article-title>
          .
          <source>In 2016 8th IEEE International Conference on Communication Software and Networks (ICCSN)</source>
          ,
          <fpage>581</fpage>
          -
          <lpage>585</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Doyle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Don't be lulled into a false sense of security</article-title>
          . https://www.securityroundtable.org/dont-lulled
          <string-name>
            <surname>-</surname>
          </string-name>
          falsesense-cybersecurity/.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Matt</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , et al.
          <year>2006</year>
          . Introduction to computer security.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Explanation in artificial intelligence: Insights from the social sciences</article-title>
          .
          <source>Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          1986.
          <article-title>Explanation-based generalization: A unifying view</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>Machine learning 1(1)</source>
          :
          <fpage>47</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Montavon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Samek</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; and Mu¨ller, K.-R.
          <year>2018</year>
          .
          <article-title>Methods for interpreting and understanding deep neural networks</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>Digital Signal Processing</source>
          <volume>73</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          2019.
          <article-title>Naive bayes</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>https://scikit-learn.org/ sta2019. Netflow meter. http://netflowmeter.ca/ netflowmeter.html.</mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          2019.
          <article-title>domain-knowledge-aided code</article-title>
          . https://github.com/ SheikhRabiul/domain
          <article-title>-knowledge-aided-explainable-ai-forintrusion-detection-and-response.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Rankin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>The dark secret at the heart of ai</article-title>
          . https://www.technologyreview.com/s/604087/the-darksecret
          <article-title>-at-the-heart-of-ai/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Ribeiro</surname>
          </string-name>
          , M. T.;
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Why should i trust you?: Explaining the predictions of any classifier</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</source>
          ,
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          2019.
          <article-title>domain-knowledge-aided dataset</article-title>
          . https://github.com/ SheikhRabiul/domain
          <article-title>-knowledge-aided-explainable-aifor -intrusion-detection-and-response/tree/master/data/ combined sampled</article-title>
          .
          <source>zip.</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          2019.
          <article-title>Scikit-learn: Machine learning in python</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Sharafaldin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lashkari</surname>
            ,
            <given-names>A. H.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Ghorbani</surname>
            ,
            <given-names>A. A.</given-names>
          </string-name>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <article-title>Toward generating a new intrusion detection dataset and intrusion traffic characterization</article-title>
          .
          <source>In ICISSP</source>
          ,
          <fpage>108</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Shone</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ngoc</surname>
            ,
            <given-names>T. N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Phai</surname>
          </string-name>
          , V. D.; and
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>A deep learning approach to network intrusion detection</article-title>
          .
          <source>IEEE Transactions on Emerging Topics in Computational Intelligence</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>41</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>Shrikumar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Greenside</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Kundaje</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Learning important features through propagating activation differences</article-title>
          .
          <source>In Proceedings of the 34th International Conference on Machine Learning-</source>
          Volume
          <volume>70</volume>
          ,
          <fpage>3145</fpage>
          -
          <lpage>3153</lpage>
          . JMLR. org.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>Wyden</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2019</year>
          .
          <article-title>Algorithmic accountability</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          https://www.wyden.senate.gov/imo/media/doc/Algorithmic%20
          <source>Accountability%20Act%20of%202019%20Bill%20Text.pdf.</source>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>Yang</surname>
          </string-name>
          , S. C.
          <article-title>-</article-title>
          H., and
          <string-name>
            <surname>Shafto</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Explainable artificial intelligence via bayesian teaching</article-title>
          .
          <source>In NIPS 2017 workshop on Teaching Machines</source>
          , Robots, and Humans.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>The optimality of naive bayes</article-title>
          .
          <source>AA</source>
          <volume>1</volume>
          (
          <issue>2</issue>
          ):
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>