<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Comparing rule mining approaches for classification with reasoning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Kopp</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lukáš Bajer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marek Jílek</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Holenˇa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cisco Systems, Cognitive Research Team in Prague</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Information Technology, Czech Technical University in Prague Thákurova 9</institution>
          ,
          <addr-line>160 00 Prague</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Computer Science, Academy of Sciences of the Czech Republic Pod Vodárenskou veˇží 2</institution>
          ,
          <addr-line>182 07 Prague</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>2203</volume>
      <fpage>52</fpage>
      <lpage>58</lpage>
      <abstract>
        <p>Classification serves an important role in domains such as network security or health care. Although these domains require understanding of the classifier's decision, there are only a few classification methods trying to justify or explain their results. Classification rules and decision trees are generally considered comprehensible. Therefore, this study compares the classification performance and comprehensibility of a random forest classifier with classification rules extracted by Frequent Item Set Mining, Logical Item Set Mining and by the Explainer algorithm, which was previously proposed by the authors.</p>
      </abstract>
      <kwd-group>
        <kwd>Classification</kwd>
        <kwd>Comprehensibility</kwd>
        <kwd>Random Forest</kwd>
        <kwd>Rule Mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Classification is one of the main directions of machine
learning research. In the past decades, researchers
developed classification models which, when learnt properly,
can beat humans in tasks we believed they would never
succeed, like handwritten text recognition [?, ?], or even
tasks that were specifically designed to be unsolvable for
computers, like CAPTCHAs [?] and other human
interaction proofs [?].</p>
      <p>Current state-of-the-art classifiers excel in predictive
performance, but they lack other quality characteristics
for human decision process: deep neural networks, for
instance, do not provide explanation of their decisions. Yet,
the reasoning justifying decisions is critical in many
application domains such as medicine or network security. In
medicine, physicians would rather believe a less precise
model, if the classification is justified by explanation they
can understand. In the network security domain, analyst
are often overwhelmed by alarms. But investigation
without providing context or explanation, why the alarm was
raised, takes a substantial amount of time. Current
ransomware affairs [?] have shown that time is critical when
infection occurs.</p>
      <p>This work is focused on comparison of context provided
by a random forest classifier and three different rule
mining approaches. Random forests were chosen as a
representative of the state-of-the-art classifiers which is also
able to provide a reasonable justification of its decisions.
The precision of classifiers based on sets of rules highly
depends on the process of mining rules from data. We
tested the quality of rules mined by Frequent Item Set
Mining [?], Logical Item Set Mining [?] and extracted by the
Explainer algorithm [?]. The comparison focuses on the
precision and recall over time and also the quality of
presented context. The experiments were done on a dataset
from the network security domain.</p>
      <p>The rest of the work is organized as follows. The next
section covers the related work in the field of
comprehensible classification. Section ?? briefly introduces all used
rule mining approaches that are experimentally evaluated
in Section ??, followed by the conclusion and future plans.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>The pressing issue of comprehensible predictive models
has been intensively studied in the field of anomaly
detection. The first work considering anomaly explanation
was [?]. It defined an explanation as “provision of a
description or an explanation of why an identified
anomalous sample is exceptional”. The proposed method first
detected all distance-based anomalies in the whole attribute
space. Then, it identified the smallest subspace in which
the anomaly could be still detected and used that subspace
as an explanation.</p>
      <p>In the approach presented in [?], artificial samples are
generated in the vicinity of each sample x. Then a
classifier is trained to separate the artificial samples from real
samples. If x was an anomaly, then artificial samples
should be separated easily, which would result in the
classifier having low error and vice-versa.</p>
      <p>The method proposed in [?] derives the anomaly score
from the frequency of histogram bins, from which the
method also extracts context and explanation of the
anomaly.</p>
      <p>Authors of [?] used the probabilistic RBF kernels to
extract and compare feature impact (positive and negative)
for each class in multi-class classification problems.</p>
      <p>Most of the recent prior art stops the explanation
after identifying the set of features by which the sample
under investigation can be differentiated from the rest. The
Explainer [?] describes the sample by a set of association
rules.</p>
      <p>A comparison by means of comprehensibility of the
well established classifiers such as decision trees, nearest
neighbours or Bayesian networks was done in [?].</p>
      <p>The authors of [?] compared multiple rule mining
approaches, their general similarities and differences and
also their computational efficacy. This paper is focused
on the comprehensibility and complexity of a context
provided to the domain experts by different rule mining
approaches.</p>
      <p>The most advanced approach we are aware of is [?]. It
introduced a framework for measuring not only a
comprehensibility (if a prediction is presented in a way that it is
easy to understand) but also a justifiability (if a model is
in line with the domain knowledge). We would like to use
this framework in our future work.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Rule mining approaches</title>
      <p>Association rules are defined as an implication of the form
A ⇒ C, where A, C ⊆ I and I = {i1, i2, ..., id } is a set of
binary attributes called items. Association rules are typically
mined from a database X = {x1, x2, ..., xn}. Each line x ∈ X
represents a set of items x ⊆ I and is called a transaction.
In this paper, we work with a specific type of association
rules, often called classification rules, that are also defined
as A ⇒ C. Here, A ⊆ I and C ∈ C = {c1, c2, ..., ck} is a
single item representing a particular class.</p>
      <p>The rest of this section surveys rule mining approaches
used in this paper.
3.1</p>
      <sec id="sec-3-1">
        <title>Frequent item set mining</title>
        <p>The Frequent Item Set Mining (FISM) is a general
framework for discovering groups of items (item sets) that are
often seen together. The first algorithm mining association
rules called GUHA was published by Petr Hájek in [?]. In
fact, it was a mathematical method for automatic search
of hypotheses valid in given data, based on generalized
quantifiers of Boolean predicate logic, and the quantifier
corresponding to association rules was called founded
almost implication. The very same approach was
rediscovered for data mining purposes as the Apriori algorithm by
Agrawal almost thirty years later [?].</p>
        <p>Since then, a variety of improvements for the original
Apriori algorithm was proposed [?, ?], and also several
new algorithms for frequent items mining were invented,
e.g., FP-Growth [?], LCM [?] or Eclat [?]. This paper uses
the FP-Growth algorithm as the most time and memory
efficient representative of the FISM algorithms.</p>
        <p>FP-Growth algorithm starts by building a specific prefix
tree called frequent pattern tree (FP-tree). First, the
frequencies of all items i ∈ I are calculated. Then, all items
i with frequency lower than the user specified threshold
θminFreq are filtered out from all transactionsx ∈ X. Items
remaining in the filtered transactions are sorted in
descending order according to their frequency. The prefix tree is
built by inserting the filtered and sorted transactions.</p>
        <p>Once the FP tree is built, it is recursively traversed in
a bottom-up manner, mining frequent item sets laying on
the path from leaf to root. Based on the FP-tree
construction process, each transaction is mapped to a path in the
FP-tree. The FP-tree structure also guarantees that all
frequent item sets present in the database X can be found on
the path from some leaf to the root. Moreover, one path
in the FP-tree may represent frequent item sets in
multiple transactions.1 This property also ensures the memory
efficiency of FP-Growth.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Logical item set mining</title>
        <p>The Logical Item Set Mining (LISM) [?] is an alternative
approach to mining association rules from data. The key
difference from FISM is that LISM captures logical
relations not only between frequent items, but it also extracts
strong relations between rarely occurring items. By
leveraging indirect relationships between items, it can also
discover relations between item sets that are not present in a
dataset. The algorithm has counting, consistency,
denoising and discovery phases.</p>
        <p>During the counting phase, co-occurrence counts,
marginal counts and total counts are calculated.</p>
        <p>Co-occurrence count ψ(ia, ib) for every pair of items
(ia, ib) ∈ I × I , where ia 6= ib, is defined as the number
of transactions in which both items co-occurred:
ψ(ia, ib) =</p>
        <p>n
X δ (ia ∈ x j)δ (ib ∈ x j),
j=1
where δ (condition) is an indicator function which is 1
if the condition is true and 0 otherwise. The results are
stored in the symmetrical matrix Φ = [φ (ia, ib)], which is
usually very sparse.</p>
        <p>Marginal count ψ(ia) is defined as the number of pairs
in which the item ia ∈ I occurred with some other item:
ψ(ia) =</p>
        <p>ψ(ia, ib).</p>
        <p>
          X
ib∈I,ia6=ib
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
Total count ψ0 is defined as the total number of pairs in
which some item co-occurred with some other item:
ψ0 = 12 X ψ(ia) = 12 X
ia∈I
ia∈I ib∈I
        </p>
        <p>
          X ψ(ia, ib)
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
        </p>
        <p>These three results are then used as estimates of the
cooccurrence and marginal probabilities</p>
        <p>P(ia, ib) =
ψ(ia, ib) ,
ψ0</p>
        <p>P(ia) =
ψ(ia) .</p>
        <p>ψ0</p>
        <p>The consistency phase reduces the effect of noise and
amplifies the importance of rare items that are consistently
1Items in item sets are ordered in the descending order, frequent
items are arranged closer to the top of the FP-tree and more likely to be
shared.
seen together. A variety of distance measures can be
employed, e.g., cosine, Jaccard or point-wise mutual
information. The cosine similarity defined as
φ (ia, ib) =</p>
        <p>
          P(ia, ib)
pP(ia)P(ib)
∈ [0, 1]
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
was used in our experiments.
        </p>
        <p>The iterative denoising phase uses the co-occurrence
consistencies obtained in the previous iteration to remove
noisy co-occurrence counts in Φ. Then, marginal and total
counts are recomputed solely from the updated Φ, there
is no need to touch data again. As a last step of every
iteration a consistency counts are updated.</p>
        <p>In the discovery phase, a graph is created from the
denoised co-occurrence consistency matrix Φ. Given Φ, a
logical item set is defined as a set of items where each
item has a high co-occurrence consistency with all other
items in the set. Such sets are found by application of an
algorithm for finding maximal cliques, e.g., [?], on the
cooccurrence consistency graph.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Explainer</title>
        <p>The Explainer [?] is a tree based algorithm designed to
explain why a sample xa ∈ X is an anomaly with respect
to the rest of the data in X. The output can be a set of
the most important features or a set of association rules.
Properties of extracted rules were studied in [?].</p>
        <p>To explain an anomaly xa, the Explainer trains a set of
specific decision trees to separatexa from rest of the data
in X. Rules are extracted from each of those trees and
assembled into a set of association rules. The key steps
are summarized in Algorithm ??.</p>
        <p>Algorithm 1 Summary of the Explainer algorithm for a
single anomaly xa.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Input:</title>
        <p>data – input dataset; xa – anomalous sample; size –
training set size; nT – the number of trees to be trained.
Output:</p>
        <p>rules – rules explaining xa
1: Forest ← ∅
2: for i ← 1 . . . nT do
3: T ← createTrainingSet(data, size, xa)
4: t ← trainTree(T)
5: Forest ← Forest ∪ t
6: end for
7: rules ← extractRules(Forest)</p>
        <p>During the training set selection, a dataset X =
{x1, x2, . . . , xn} is split into two disjoint sets Xa, containing
anomalous samples, and Xn, containing normal samples.
Then, a training set T contains the anomaly xa as one class
and a subset of Xn as the other. The simplest strategy is to
select k samples randomly from Xn with uniform
probability. This approach is computationally effective and was
proven to work well when compared to more sophisticated
approaches [?].</p>
        <p>Training a tree is very similar to standard random
forests [?]. A candidate node is found and the optimal
splitting function is applied on that node. This greedy
procedure repeats until the specified stopping criteria are met.</p>
        <p>The node that contain xa is always the one being split
in the Explainer algorithm. The standard procedure to find
the splitting function h is maximizing the information gain
over the space of all possible splitting functions H. But as
there is only a single point xa in the anomaly class, the
information gain is equivalent to minimizing the size of
the node containing xa:
arg min |Sa(h)|,</p>
        <p>h∈H
where Sa ⊂ T denotes the subset of the training set
containing the anomaly xa after the split. The training is
stopped when all leaves are pure (contain samples from
a single class).</p>
        <p>
          Once a tree is trained, it is used for rule extraction.
Let h f1,θ1 , . . . , h fd,θd be the set of splitting functions, with
features f1, . . . , fd and threshold θ1, . . . , θd , used in inner
nodes on a path from the root to the leaf with xa. Then xa
is explained as a conjunction of atomic conditions:
(x f1 &gt; θ1) ∧ (x f2 &gt; θ2) ∧ . . . ∧ (x fd &gt; θd ),
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
which is the output of the algorithm. This conjunction can
be read as “the sample is anomalous because it is greater
than threshold θ1 in feature f1 and greater than θ2 in
feature f2 and . . . than majority of samples”. Each tree
provides one such conjunction, that are then aggregated.
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>This section experimentally compares the classification
performance of the three described rule mining approaches
with the random forest classifier. The section starts with a
description of the dataset used in our research and with the
setting of the performed experiments. The comparison is
then based on the precision and recall measures, and the
final part concludes with a thorough discussion of
differences of explanations provided by each approach.
4.1</p>
      <sec id="sec-4-1">
        <title>Dataset description</title>
        <p>The dataset in this research consists of 8 consecutive
weeks of network traffic collected by the Cognitive Threat
Analytics (CTA) intrusion detection system. It contains
about 9 million samples, where each sample is a
collection of all network events observed on a particular network
host within the 24-hours window. In the words used in the
market basket analysis, each sample x is a transaction
containing a subset of all events/items x ⊆ I.</p>
        <p>CTA distinguishes about 300 events falling into four
categories:</p>
        <p>09 16 23
dataset starting date (Sep 2017)</p>
        <p>FISM
LISM
explainer
RF
FISM
LISM
explainer
RF
30
30
• signature based (SB) – produced by matching
behavioural signatures handmade by a domain expert.
• classifier based (CB) – created by supervised
classifiers trained on historical data.
• anomaly based (AB) – created by the anomaly
detection engine which consists of 70 individual detectors
(statistical, volumetric, proximity, targeted, domain
specific).
• contextual events (CE) – describe various network
behaviours to provide an additional context, e.g., file
download, direct access on raw IP, software update.</p>
        <p>These events together serve as high level behavioural
indicators that can be used to create a behavioural profile
of a malware family. This behavioural profile can be used
to identify malware infections in their early stages and stop
them before they do any harm.</p>
        <p>The database was labelled by the CTA engine in a way
that transactions of a network hosts infected by either
banking trojan, click fraud, information stealer or malware
distribution were marked as positives/malicious (4801 and
6463 transactions in training and testing sets respectively)
and the other transactions were labelled negative/benign
(3.75 mil. and 5.23 mil. transactions respectively).
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Experimental setup</title>
        <p>All following experiments used 3 weeks of data,
approximately 3.75 million of transaction, for training/rule mining
and 5 weeks, 5.23 million of transactions, for testing.</p>
        <p>Parameters of the random forest were set as follows:
number of trees = 19; maximal depth of a tree = 25;
number of features per split = 100.</p>
        <p>The Explainer was set to train 10 trees per positive
sample while selecting a random training set of size 1000
samples.</p>
        <p>The FP-growth algorithm was set to produce only rules
with support higher than 3 transactions.</p>
        <p>The parameters of LISM was: similarity measure =
cosine (??); θcooc = 1 · 10−7; θcons = 2. Cliques discovered
using this setting were split into 3–5 items long rules.
Details about the optimization of parameters can be found
in [?].</p>
        <p>As a last step, all rules were filtered to have precision on
the training set at least 80%. The filtered rules were then
used in the following experiments.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Classification efficacy</title>
        <p>The described rule filtering, based on the minimal
precision 80% on the training set, resulted in very few false
positive predictions: all classifiers reached more than
90%precision on all the testing sets, as depicted in the upper
graph in Figure ??. While the rules mined by the Explainer
and FISM algorithm provide stable results with precision
reaching roughly 99% during all five testing weeks, the
rules generated by the LISM and random forest exhibit
greater variance in time.</p>
        <p>The ability to discover the malicious content in the
testing data, as measured by the classifiers’ recall,
diversifies considerably more. Both Explainer’s and random
forest’s sets of rules were able to identify more than 80%
of the malicious samples in the testing sets (except the
last week). The graph in Figure ??, on the other hand,
clearly shows that the highest precision rate of the
LISMgenerated rules is accompanied with the lowest degree of
recall.</p>
        <p>The LISM is able to generate complex and very
descriptive rules that, however, can be hardly located in the data.
That results in only 34 generated rules reaching the
training precision threshold 80%, which is probably the cause
behind the lowest recall. The 100%-precision in 4 out of 5
testing weeks is surprisingly good result, though.</p>
        <p>The performance of the FISM and Explainer differs
mainly in their recall, where the better results of the
Explainer is probably caused by the algorithm’s focus to the
shortest and strongest rules; only 14 rules reached the 80%
precision threshold. The FISM, on the other hand, is able
to generate longer rules, which are harder to match.</p>
        <p>Random forest classifier with its up to 25-levels deep
decision trees is able to identify the highest number of the
testing malicious transactions. On the contrary, the high
complexity of the trees, at the same time, causes the lowest
observed precision out of four compared classifiers.
4.4</p>
      </sec>
      <sec id="sec-4-4">
        <title>Context comparison</title>
        <p>This section presents the comparison of context provided
by the random forest classifier, Explainer, Logical Item Set
Mining and Frequent Item Set Mining. Examples of mined
rules are presented and discussed in detail.</p>
      </sec>
      <sec id="sec-4-5">
        <title>Random Forest</title>
        <p>In general, random forests can provide two types of
context; feature importance and classification rules.</p>
        <p>The feature importance provides less information than
rules. It represents aggregated global knowledge for all
classes in the training set. It can be extracted per class
if a random forest is trained for each class separately.
Our database represents a dichotomy problem with both
classes being in fact mixtures of multiple sub-classes that
we are unable to separate.</p>
        <p>Classification rules provide more specific/local
information. Unfortunately, trees in a random forest are
typically trained deep to be as precise as possible. In other
words, the path from a root to a leaf will be long and
therefore, the extracted rule will be also long. The other
drawback of training random forests directly on
transactions is so called negative evidence. In the network
security domain, the negative evidence is when a network host
is being marked as infected because of some behaviour it
doesn’t have, e.g. “a host is infected, because it didn’t
downloaded an image”.</p>
        <p>The real example of a rule with multiple negative
evidences extracted directly from the random forest used in
the experiments is: “If you are not infected by a click fraud
and you are not infected by an information stealer and you
are not infected by the Sality trojan and you are not
infected by the malware called Gamarue and you don’t have
encrypted connection then you are infected.” As you can
see from this simple yet real example, rules extracted from
a random forest may be more puzzling than explaining.</p>
      </sec>
      <sec id="sec-4-6">
        <title>Explainer</title>
        <p>The Explainer can be easily set to provide rules with only
positive evidences. The trouble is, that it was designed
to extract the smallest set of the shortest possible rules.
The extracted rules contained the real causes of incidents
but not any additional context which would simplify the
investigation. From the rules created by the Explainer, 14
had a precision higher than 80%. They typically contained
only one event produced by a supervised classifier. The
second event was presents in only 3 cases and there was
no rule longer than that. Selected examples of longer rules
follow:</p>
        <sec id="sec-4-6-1">
          <title>1. AB:ShadowUser CB:ClickFraud</title>
          <p>2. CB:SuspAdvertising CB:MaliciousAdvertising
The first example contains an anomaly based
event AB:ShadowUser and a classifier based event
CB:ClickFraud. AB:ShadowUser identifies network hosts
that are visiting a high number of network domains which
nobody else visit. The CB:ClickFraud event is created by
the random forest classifier trained to discover malware
from the click fraud family. The click fraud malware
family is known for visiting a lot of weird pages without
user’s knowledge and is often used for clicking on web
advertisements to generate money.</p>
          <p>The second rule contains two classifier based events
indicating a host visited a lot of suspicious advertisement
pages(CB:SuspAdvertising ), some of them probably
malicious (CB:MaliciousAdvertising). Here, the malware
started by showing additional unwanted advertisements,
banners and pop-ups and ended by ex-filtrating sensitive
data.</p>
        </sec>
      </sec>
      <sec id="sec-4-7">
        <title>LISM</title>
        <p>Rules extracted by the Logical Item Set Mining create a
very logical and justifiable connections between items. All
items in a rule are always very strong indicators of a
particular threat. Unfortunately, they would appear all at once
very rarely, but if they do, it is for sure a serious
infection. LISM created 34 rules with precision over 80%. The
length of the rules is ranging from 3 to 5. Examples
follow:</p>
        <sec id="sec-4-7-1">
          <title>1. SB:Blocked CB:MalBinary SB:Sality</title>
          <p>2. CB:ClickFraud CB:Malwartising CB:MalwareDistr
The first example shows a download of malicious
binary (CB:MalBinary) from a black listed domain
(SB:Blocked). Furthermore, it has a well known
signature of a malware called Sality (SB:Sality). Each of these
events is strong enough evidence to trigger an immediate
re-mediation alert on its own.</p>
          <p>The second rule shows a nice example of a malicious
escalation. At first, a host was infected by a click fraud
malware (CB:ClickFraud), which was visiting a shady
advertising sites (CB:MalAdvertising). Then, the host started
to download and distribute additional malicious modules
(CB:MalwareDistr).</p>
        </sec>
      </sec>
      <sec id="sec-4-8">
        <title>FISM</title>
        <p>The Frequent Item Set Mining provided the best
explanation/context from all compared method. Rules contain
all important indicators while providing a reasonable
additional context. FISM created 454 rules, with length
ranging from 3 to 5, that satisfied 80% threshold on precision,
examples follow:
1. CE:jQuery AB:SuspDomain CB:ClickFraud</p>
        <sec id="sec-4-8-1">
          <title>2. AB:PathCount AB:ShadowUser CE:WPManagment</title>
        </sec>
        <sec id="sec-4-8-2">
          <title>CB:ClickFraud</title>
          <p>The first example shows a host that downloaded a
modified javascript library jQuery(CE:jQuery) from a
suspicious network domain (AB:SuspDomain). This modified
library was the source of infection, which was later
detected by the random forest classifier trained for detection
of a click fraud malware family (CB:ClickFraud).</p>
          <p>The second rule shows a sophisticated example of click
fraud capabilities (CB:ClickFraud). In that case, the
malware had a password cracker module installed and was
used to crack into administration sections of web pages
created using Wordpress(CE:WPManagment). Again, we
can see a host visiting large amount of low probability
network domains (AB:ShadowUser), WordPress blogs in that
case, where on most of these pages was visited the
outlying number of paths (AB:PathCount), specifically only
one.2
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This paper compared the classification performance and
comprehensibility of a random forest classifier with
classification rules extracted by the Frequent Item Set Mining,
Logical Item Set Mining and by the Explainer algorithm,
which was previously proposed by the authors. All the
algorithms showed surprisingly similar precision with rules
extracted by LISM performing slightly better, with
exception on one week. From the recall point of view the best
performing algorithm was the random forest followed by
rules extracted by the Explainer.</p>
      <p>The comparison of provided explanations revealed that
rules extracted from random forests trained on transaction
data can be more puzzling than comprehensible. The
relationships between items mined by LISM were not only
comprehensible but also justifiable (in line with the
domain knowledge). Unfortunately, they are very rarely seen
within the real data. Rules extracted by the Explainer tend
to reveal the root cause of an incident but provide only
a little to non additional context. Rules mined by FISM
appeared as an optimal mix of root cause events together
with a reasonable amount of additional context.</p>
      <p>As the future work, we would like to extend our research
into the other application domains and also to implement
the work of Martens et al. [?] and compare their numerical
representations of comprehensibility and justifiability with
our domain knowledge.</p>
      <sec id="sec-5-1">
        <title>Acknowledgement</title>
        <p>The research reported in this paper has been supported
by the Czech Science Foundation (GA CˇR) grant
1701251 and by Czech Technical University student grant
SGS17/209/OHK3/3T/18.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Rakesh</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , Tomasz Imielin´ski, and Arun Swami.
          <article-title>Mining association rules between sets of items in large databases</article-title>
          .
          <source>In ACM SIGMOD Record</source>
          , volume
          <volume>22</volume>
          , pages
          <fpage>207</fpage>
          -
          <lpage>216</lpage>
          . ACM,
          <year>1993</year>
          .
          <article-title>2When a webpage is opened via a browser it downloads a lot of different directories and files such as cascade style sheets, html file, image folder, javascript libraries, etc</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Rakesh</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ramakrishnan</given-names>
            <surname>Srikant</surname>
          </string-name>
          , et al.
          <article-title>Fast algorithms for mining association rules</article-title>
          .
          <source>In Proceedings of the International Conference on Very Large Data Bases (VLDB)</source>
          , volume
          <volume>1215</volume>
          , pages
          <fpage>487</fpage>
          -
          <lpage>499</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Fabrizio</given-names>
            <surname>Angiulli</surname>
          </string-name>
          , Fabio Fassetti, and
          <string-name>
            <given-names>Luigi</given-names>
            <surname>Palopoli</surname>
          </string-name>
          .
          <article-title>Detecting outlying properties of exceptional objects</article-title>
          .
          <source>ACM Transactions on Database Systems (TODS)</source>
          ,
          <volume>34</volume>
          (
          <issue>1</issue>
          ):
          <fpage>7</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Zachary</surname>
            <given-names>K</given-names>
          </string-name>
          <string-name>
            <surname>Baker and Viktor K Prasanna.</surname>
          </string-name>
          <article-title>Efficient hardware data mining with the apriori algorithm on fpgas</article-title>
          .
          <source>In Field-Programmable Custom Computing Machines</source>
          ,
          <year>2005</year>
          .
          <source>FCCM</source>
          <year>2005</year>
          .
          <article-title>13th Annual IEEE Symposium on</article-title>
          , pages
          <fpage>3</fpage>
          -
          <lpage>12</lpage>
          . IEEE,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Borgelt</surname>
          </string-name>
          .
          <article-title>Efficient implementations of apriori and eclat</article-title>
          .
          <source>In FIMI03: Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Leo</given-names>
            <surname>Breiman</surname>
          </string-name>
          .
          <article-title>Random forests</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>45</volume>
          (
          <issue>1</issue>
          ):
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Randy</given-names>
            <surname>Carraghan and Panos M Pardalos</surname>
          </string-name>
          .
          <article-title>An exact algorithm for the maximum clique problem</article-title>
          .
          <source>Operations Research Letters</source>
          ,
          <volume>9</volume>
          :
          <fpage>375</fpage>
          -
          <lpage>382</lpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Kumar</given-names>
            <surname>Chellapilla</surname>
          </string-name>
          , Kevin Larson, Patrice Simard, and
          <string-name>
            <given-names>Mary</given-names>
            <surname>Czerwinski</surname>
          </string-name>
          .
          <article-title>Designing human friendly human interaction proofs (hips)</article-title>
          .
          <source>In Proceedings of the SIGCHI conference on HumanFactors in Computing Systems</source>
          , pages
          <fpage>711</fpage>
          -
          <lpage>720</lpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Salvador</given-names>
            <surname>Espana-Boquera</surname>
          </string-name>
          , Maria Jose Castro-Bleda, Jorge Gorbe-Moya, and Francisco Zamora-Martinez.
          <article-title>Improving offline handwritten text recognition with hybrid hmm/ann models</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>33</volume>
          (
          <issue>4</issue>
          ):
          <fpage>767</fpage>
          -
          <lpage>779</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Alex</surname>
            <given-names>Alves</given-names>
          </string-name>
          <string-name>
            <surname>Freitas</surname>
          </string-name>
          .
          <article-title>Comprehensible classification models: a position paper</article-title>
          .
          <source>SIGKDD Explorations</source>
          ,
          <volume>15</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Petr</surname>
            <given-names>Hájek</given-names>
          </string-name>
          , Ivan Havel, and
          <string-name>
            <given-names>Metodeˇj</given-names>
            <surname>Chytil</surname>
          </string-name>
          .
          <article-title>The GUHA method of automatic hypotheses determination</article-title>
          .
          <source>Computing</source>
          ,
          <volume>1</volume>
          (
          <issue>4</issue>
          ):
          <fpage>293</fpage>
          -
          <lpage>308</lpage>
          ,
          <year>1966</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Jiawei</surname>
            <given-names>Han</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Jian</given-names>
            <surname>Pei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Yiwen</given-names>
            <surname>Yin</surname>
          </string-name>
          .
          <article-title>Mining frequent patterns without candidate generation</article-title>
          .
          <source>In ACM SIGMOND Record</source>
          , volume
          <volume>29</volume>
          , pages
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          . ACM,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Jochen</surname>
            <given-names>Hipp</given-names>
          </string-name>
          , Ulrich Güntzer, and
          <string-name>
            <given-names>Gholamreza</given-names>
            <surname>Nakhaeizadeh</surname>
          </string-name>
          .
          <article-title>Algorithms for association rule mining-a general survey and comparison</article-title>
          .
          <source>ACM SIGKDD Explorations newsletter</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>58</fpage>
          -
          <lpage>64</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Marek</given-names>
            <surname>Jilek</surname>
          </string-name>
          .
          <article-title>Machine learning based context extraction for network security incidents</article-title>
          .
          <source>Master's thesis</source>
          , Faculty of Information Technology, Czech Technical University in Prague,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Edwin</given-names>
            <surname>Knorr and Raymond T Ng</surname>
          </string-name>
          .
          <article-title>Algorithms for mining distancebased outliers in large datasets</article-title>
          .
          <source>In Proceedings of the International Conference on Very Large Data Bases (VLDB)</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Martin</given-names>
            <surname>Kopp</surname>
          </string-name>
          and
          <article-title>Martin Holenˇa. Evaluation of association rules extracted during anomaly explanation</article-title>
          .
          <source>In ITAT 2015: Information Technologies - Applications and Theory</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Martin</surname>
            <given-names>Kopp</given-names>
          </string-name>
          , Mateˇj Nikl, and
          <article-title>Martin Holenˇa. Breaking captchas with convolutional neural networks</article-title>
          .
          <source>In ITAT 2017: Information Technologies - Applications and Theory</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Shailesh</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V</given-names>
            <surname>Chandrashekar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and CV</given-names>
            <surname>Jawahar</surname>
          </string-name>
          .
          <article-title>Logical itemset mining</article-title>
          .
          <source>In Data Mining Workshops (ICDMW)</source>
          ,
          <year>2012</year>
          IEEE 12th International Conference on, pages
          <fpage>603</fpage>
          -
          <lpage>610</lpage>
          . IEEE,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>David</given-names>
            <surname>Martens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Jan</given-names>
            <surname>Vanthienen</surname>
          </string-name>
          , Wouter Verbeke, and
          <string-name>
            <given-names>Bart</given-names>
            <surname>Baesens</surname>
          </string-name>
          .
          <article-title>Performance of classification models from a user perspective</article-title>
          .
          <source>Decision Support Systems</source>
          ,
          <volume>51</volume>
          (
          <issue>4</issue>
          ):
          <fpage>782</fpage>
          -
          <lpage>793</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Barbora</surname>
            <given-names>Micenková</given-names>
          </string-name>
          , Raymond T Ng,
          <string-name>
            <surname>Xuan-Hong Dang</surname>
            , and
            <given-names>Ira</given-names>
          </string-name>
          <string-name>
            <surname>Assent</surname>
          </string-name>
          .
          <article-title>Explaining outliers by subspace separability</article-title>
          .
          <source>In IEEE 13th International Conference on Data Mining (ICDM</source>
          <year>2013</year>
          ),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Lukáš</given-names>
            <surname>Neumann and Jirˇí Matas</surname>
          </string-name>
          .
          <article-title>Real-time scene text localization and recognition</article-title>
          .
          <source>In Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <source>2012 IEEE Conference on</source>
          , pages
          <fpage>3538</fpage>
          -
          <lpage>3545</lpage>
          . IEEE,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Tomáš</given-names>
            <surname>Pevný</surname>
          </string-name>
          and
          <string-name>
            <given-names>Martin</given-names>
            <surname>Kopp</surname>
          </string-name>
          .
          <article-title>Explaining anomalies with sapling random forests</article-title>
          .
          <source>In Information Technologies - Applications and Theory Workshops</source>
          , Posters, and
          <string-name>
            <surname>Tutorials</surname>
          </string-name>
          (ITAT
          <year>2014</year>
          ),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Marko</given-names>
            <surname>Robnik-Šikonja</surname>
          </string-name>
          , Igor Kononenko, and Erik Štrumbelj.
          <article-title>Quality of classification explanations with prbf</article-title>
          .
          <source>Neurocomputing</source>
          ,
          <volume>96</volume>
          :
          <fpage>37</fpage>
          -
          <lpage>46</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Margaret</given-names>
            <surname>Rouse. Ransomware</surname>
          </string-name>
          ,
          <year>2018</year>
          . https://searchsecurity.techtarget.com/definition/ransomware [Cited 15 May
          <year>2018</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Takeaki</surname>
            <given-names>Uno</given-names>
          </string-name>
          , Masashi Kiyomi, and
          <string-name>
            <given-names>Hiroki</given-names>
            <surname>Arimura</surname>
          </string-name>
          .
          <article-title>Lcm ver. 2: Efficient mining algorithms for frequent/closed/- maximal itemsets</article-title>
          .
          <source>In FIMI</source>
          , volume
          <volume>126</volume>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Mohammed</surname>
            <given-names>Javeed</given-names>
          </string-name>
          <string-name>
            <surname>Zaki</surname>
          </string-name>
          .
          <article-title>Scalable algorithms for association mining</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering (TKDE)</source>
          ,
          <volume>12</volume>
          (
          <issue>3</issue>
          ):
          <fpage>372</fpage>
          -
          <lpage>390</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>