<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluation of Association Rules Extracted during Anomaly Explanation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Kopp</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Holenˇ a</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cisco Systems, Cognitive Research Team in Prague</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Information Technology, Czech Technical University in Prague Thákurova 9</institution>
          ,
          <addr-line>160 00 Prague</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Computer Science, Academy of Sciences of the Czech Republic Pod Vodárenskou veˇží 2</institution>
          ,
          <addr-line>182 07 Prague</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>143</fpage>
      <lpage>149</lpage>
      <abstract>
        <p>Discovering anomalies within data is nowadays very important, because it helps to uncover interesting events. Consequently, a considerable amount of anomaly detection algorithms was proposed in the last few years. Only a few papers about anomaly detection at least mentioned why some samples were labelled as anomalous. Therefore, we proposed a method allowing to extract rules explaining the anomaly from an ensemble of specifically trained decision trees, called sapling random forest. Our method is able to interpret the output of an arbitrary anomaly detector. The explanation is given as conjunctions of atomic conditions, which can be viewed as antecedents of association rules. In this work we focus on selection, post processing and evaluation of those rules. The main goal is to present a small number of the most important rules. To achieve this, we use quality measures such as lift and confidence boost. The resulting sets of rules are experimentally and empirically evaluated on two artificial datasets and one real-world dataset.</p>
      </abstract>
      <kwd-group>
        <kwd>Anomaly detection</kwd>
        <kwd>anomaly interpretation</kwd>
        <kwd>association rules</kwd>
        <kwd>confidence boost</kwd>
        <kwd>random forest</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        According to an IBM research [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] there were 2,7
zettabytes of data in the digital universe at April 2012 and this
amount is doubling approximately every 40 months.
      </p>
      <p>
        Not only it is almost impossible to process such huge
amounts of data, we are actually not interested in the
raw data, but rather in the salient knowledge and
interesting patterns contained in them. This is the reason why
anomaly detection, especially unsupervised anomaly
detection, becomes more and more important [
        <xref ref-type="bibr" rid="ref1 ref23">1, 23</xref>
        ].
Despite it can be formalised as a binary classification, it
entails different issues and challenges than those in
supervised classification. For example, anomalous events
often adapt to appear normally and even normal behaviour
evolve over time. Furthermore, defining a normal regions
is very difficult, especially when the boundary between
normal and anomalous is not always precise.
      </p>
      <p>
        For the purposes of this paper consider anomalies equal
to outliers as defined by Hawkins [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]: “ An outlier is an
observation which deviates so much from the other
observations as to arouse suspicions that it was generated by
a different mechanism.”
      </p>
      <p>The more formal definition would necessarily reduce
the amount of plausible anomaly detectors and/or
application domains. This is in conflict with our goal to provide
a solution as general as possible.</p>
      <p>
        Even though anomaly detection techniques are aimed at
only a minority of samples, the importance and demand
for them grows rapidly. The real world applications range
from the network security [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], bioinformatics [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] or
financial fraud detection [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] to the astronomy and space
exploration [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>The identification of anomalies is only a half of the
whole task. The second and equally important half is the
interpretation. In high dimensional domains, like the
network security or bioinformatics, where hundreds or even
thousands of features are common, the proper
interpretation is crucial.Therefore, anomalies have to be interpreted
clearly, as a feature subset that explains its deviation from
ordinary data, or even better as a set of association rules.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] we proposed method of anomaly explanation
based upon specifically trained ensembles of decision trees
called sapling random forest (SRF). The main idea
behind it is to view the explanation as a feature selection
and classification problem. Specifically, the goal is to find
features in which the margin betweenı anomalous sample
and the normal samples is maximised. Therefore, SRF
returns subset of features, respectively rules on these
features describing why this sample has been identified as an
anomaly.
      </p>
      <p>The main drawback of the direct rule extraction from
our sapling random forests is the big number of rules with
some of them introduced by unfortunate training set
selection. Partially, these issues can be solved by confidence
and / or support thresholds. But for our ultimate goal to
present the minimal number of rules containing the
maximal amount of useful information, such a simple approach
is insufficient. Therefore, in this paper we focus on proper
selection, post processing and evaluation of rulesets
extracted from sapling random forests during anomaly
explanation. We tested association rules quality measures such
as lift and confidence boost. This paper is work in progress
and we would like to extend the number of tested quality
measures by some subjective measures like novelty.</p>
      <p>The rest of this paper is organised as follows. The next
section briefly reviews related work. Section 3 describes
the SRF principles and its training followed by the rule
extraction process in Section 4. The selected quality
measures of association rules are presented in Section 5.
Experimental evaluation is described in Section 6 and
Section 7 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        For more information about the anomaly detection we
refer to the recent book of Aggarwall [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This book
provides an exhaustive listing of anomaly detection
algorithms and their applications in different domains.
Another source may be [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which is briefer but very well
written. To our best knowledge, there have been only few
works addressing not only identification of anomalies, but
also their explanation.
      </p>
      <p>
        Knorr et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] focused on what kind of knowledge
should be extracted and provided to the user. Strong and
weak outliers were defined and searched within data by
distance-based algorithms described in detail in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Dang et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] presented an algorithm identifying and
explaining anomalies. The algorithm starts by selecting
a set of neighbouring samples based on quadratic entropy
that are presented to a Fisher linear discriminant
classifier to seek for an optimal half-space, in which a detected
anomaly is well separated. The process of interpretation
is entangled with the presented method of identification
of anomalies. The difference to our work is that SRF can
be used after an arbitrary anomaly detection algorithm to
interpret its results.
      </p>
      <p>
        The most similar to our approach and most recent
is [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Their approach, as well as ours, can interpret
output of an arbitrary anomaly detector as a subset of
features. They use classification accuracy for outlier ranking.
The main drawback of this approach is that it needs
balanced training sets which are created by sampling artificial
samples around the anomalous point. With respect to this
work, our approach can handle unbalanced training sets
easily and returns not only feature subsets but feature
subsets with rules on them, providing even more information
about the anomaly. Furthermore, we simplify the analysis
by clustering, which enables to interpret similar anomalies
at once [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        On the other hand, there are many papers about
association rules. This paper was inspired mainly by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which
is about measuring redundancy and information quality
of sets of association rules. The author presents a
measure called confidence boost and an algorithm to produce
a small set of association rules using this measure. A
really extensive list of interestingness measures can be found
in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. There is a lot of inspiration for our future work.
      </p>
      <p>
        An alternative approach, well described in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], may be
so called subjective measures. A typical example is the
novelty, sometimes called unexpectedness, of a rule with
respect to user provided domain knowledge or against the
another rule set. Because these terms are ambiguous there
are multiple approaches of measuring them. An approach
in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] inspired us for our future work.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Sapling Random Forest</title>
      <p>
        This section outlines principles of sapling random forests.
SRF is a method able to explain an output of an
arbitrary anomaly detector, proposed by us in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. It is
a random forest of specifically trained decision trees.
Because produced trees are small they are called saplings
rather than trees. Produced explanations show features in
which inspected samples differ the most from the rest of
data. These features are used to produce association rules,
which are more informative than only a set of features. An
outline of the whole method is at Algorithm 1.
      </p>
      <p>Algorithm 1 Algorithm summary
y ← anomalyDetector(data)
for all data(y ==anomaly) do</p>
      <p>T ← createTrainingSet(size, method)
t ← trainTree(T )</p>
      <p>SRF ← t
end for
extractRules(SRF )
3.1</p>
      <sec id="sec-3-1">
        <title>Training Set Selection</title>
        <p>
          Dataset X = {x1, x2, . . . , xl }, where x ∈ Rd , can be split
into two disjoint sets X a, containing anomalous samples,
and X n,containing normal samples. Then, a training set T
contains the anomaly xa as one class and a subset of X n
as the other. The first strategy of creating training sets is
to select k nearest neighbours of xa from X n. This
strategy is sensible for algorithms detecting local anomalies,
as according to [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] they are more general than algorithms
detecting global anomalies. The drawback of this strategy
is a computational complexity.
        </p>
        <p>
          The second strategy is to select k samples randomly
from X n with uniform probability. The advantage of this
approach is a possibility to generate more than one
training set per anomaly by repeating the sampling process.
More training sets lead to more saplings per anomaly and
to more robust explanation, but at the expense of the more
complicated aggregation of rules extracted from them (see
Section 4). A comparison of both approaches can be found
in [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Training a Sapling</title>
        <p>
          For simplicity consider sapling a binary decision tree with
typical height 1-3. In the SRF method, there are always
two leaves at the maximal depth, one of which contains
only an anomaly xa and the other containing only normal
samples. The saplings small height has two reasons. First,
training sets are relatively small. Second, according to the
anomaly isolation approach [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], if the analysed sample
is an anomaly, it should be separated easily from the rest
of data, resulting again into small trees. Therefore, if the
height of a sapling is higher than expected it should be
taken into consideration that the explained sample may not
be an anomaly.
        </p>
        <p>The standard procedure, to find the splitting function h
for a new internal node, is maximising an information gain
over the space of all possible splitting functions H as
arg max −
h∈H</p>
        <p>∑
b∈{L,R}
|Sb(h)| H(Sb),
|S|
where S is the subset of the training set T reaching the
leaf being split, SL(h) = {x ∈ S|h(x) = +1} and SR(h) =
{x ∈ S|h(x) = −1} and H(S) is an entropy of S.</p>
        <p>The second commonly used approach involves
minimising the Gini impurity.</p>
        <p>arg min ∑
h∈H b∈{L,R}
1 −</p>
        <p>|xa|
(Sb(h))
2
−</p>
        <p>|xn|
(Sb(h))
2</p>
        <p>For experiments presented in this paper we used
information gain.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Extraction and Evaluation of Rules</title>
      <p>Once a sapling is grown, it is used to explain the
anomaly xa. Let h j1,θ1 , . . . , h jd,θd be the set of splitting
functions, with features j1, . . . , jd and threshold θ1, . . . , θd ,
used in the inner nodes on the path from the root to the leaf
with the anomaly xa. Then xa is explained as a conjunction
of atomic conditions :
c = (x j1 &gt; θ1) ∧ (x j2 &gt; θ2) ∧ . . . ∧ (x jd &gt; θd ),
(3)
which is the output of the algorithm. This conjunction can
be read as “the sample is anomalous because it is greater
than threshold θ1 in feature j1 and greater than θ2 in
feature j2 and . . . than majority (or nearest neighbour)
samples.” Because resulting trees are very small, the
explanation is compact.</p>
      <p>The situation is more difficult, when more saplings per
anomaly have been grown, as each sapling provides one
conjunction of type (3). Using more than one sapling per
anomaly improves robustness for training sets created by
uniform sampling. The problem is that returning set of
all conjunctions C is undesirable, as the primary objective
— explanation of the anomaly to a human — would not be
met. Hence, the algorithm needs to aggregate conjunctions
in C.</p>
      <p>For simplicity of the following notation consider 2d
items, in such a way that 2 items are assigned to each
feature, one for "&lt;" rules, the other for "&gt;" rules. Denote this
2d set of items F . Then we can group rules into the rule
sets R f according to the item set f ⊆ F they share.</p>
      <p>Based on |R f | the algorithm discards groups of low
importance by sorting them in descend order, and then
using only the first k groups such, that their cumulative
frequency is greater than a threshold τ , which we recommend
r f = c f → y,
where c is a conjunction of atomic conditions like (3), y =
x ∈ X a and f ⊆ F . The r f in its full form then look as:
r f (x) = x f1 &gt; θ f1 ∧ x f2 &gt; θ f2 ∧ . . . ∧ x fn &gt; θ fn → x ∈ X a,
(8)
where n is a maximal index in the itemset f .</p>
      <p>
        For this kind of rules support [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is calculated as:
supp(c f ) = |{c f (x)|x ∈ X }| ,
|X |
      </p>
      <p>a
supp(y) = |X | . (10)
|X |
and gives the proportion of data points which satisfy the
antecedent c, respectively the consequent y. It is used to
measure the importance of a rule or as a frequency
constrain. The disadvantage of support is that infrequent rules
are often discarded. This is much bigger problem than it
could seem because we are generating rules for anomalies,
which are rare by definition.
to be 0.90 or 0.95. Using the adopted notation, k is
determined as</p>
      <p>1 k
k = arg min ∑ |R fi | &gt; τ ,</p>
      <p>k ∑ f ∈F |R f | i=1</p>
      <p>By this approach we obtain one representative rule for
each feature set f as:</p>
      <p>c¯f = x j1 &gt; θ¯1 ∧ x j2 &gt; θ¯2 ∧ . . . ∧ x jt &gt; θ¯t .
5</p>
    </sec>
    <sec id="sec-5">
      <title>Measuring Quality of Rules</title>
      <p>This section reviews selected quality measures of
association rules. Typical association rules are in the form
A → Y , where A, Y are item sets. In rules extracted from
SRF, items are atomic conditions and Y always means: “is
anomalous”. Therefore, our rules are in the form:
where it is assumed, that R f are sorted according to their
size to simplify the notation. We have also investigated
the complementary approach, where groups are selected,
if they were used with a frequency higher than a specified
threshold. But the presented strategy based on the
cumulative frequency showed more consistent results in our
experiments.</p>
      <p>Once the set of groups with decision rules is selected,
we create one rule r ¯f for every rule set R f .Thresholds for
each item f j are calculated as an average of all thresholds
within the rule set R f .</p>
      <p>θ¯j =
1 |R fi |</p>
      <p>∑ θi, j
|R f | i=1
(1)
(2)
(4)
(5)
(6)
(7)
(9)</p>
      <p>
        Another frequently used measure is confidence [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]:
conf(c f → y) =
supp(c f → y)
supp(c f )
.
      </p>
      <p>It estimates the conditional probability of the consequent
being true on condition that the antecendent is true. The
trouble with confidence is caused by its sensitivity to the
frequency of y. Because all rules extracted from SRF
have the same consequent the rule ranking produced by
lift a confidence would be the same.</p>
      <p>
        The third measure we used is lift [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]:
lift(c f → y) =
conf(c f → y)
supp(y)
,
(11)
(12)
which measures how many times more often the
antecedent c and consequent y occur together than expected
if they were statistically independent. Lift does not suffer
from the rare items problem. Because in our experiments
the consequent will always the same frequency there is no
need to measure both
      </p>
      <p>
        Finally, confidence boost introduced by Balcázar [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is
calculated as:
      </p>
      <p>conf(r f )
β (r f ) =
max{conf(r0 f 0 )|supp(r0 f 0 ) &gt; σ , r f 6≡ r0 f 0 , f 0 ⊆ f }
(13)
where σ is support threshold and r f 6≡ r0 f 0 denotes the
inequivalence of rules r f and r0f 0 , which for our simple case
where all consequents are the same, means that f 6= f 0.
From (13) is evident that f 0 ⊂ f .</p>
      <p>If the set of confidences in denominator is empty, the
confidence boost is by convention set to infinity.
,
6</p>
    </sec>
    <sec id="sec-6">
      <title>Experiments</title>
      <p>
        For the experimental evaluation we used the synthetical
three layer donut, the well known Fisher’s iris and the
Letter recognition data set from the UCI repository [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
6.1
      </p>
      <sec id="sec-6-1">
        <title>Three Layer Donut</title>
        <p>The three layer donut dataset contains 1000 normal
samples forming the two dimensional toroid (donut). There
are 200 anomalies, one half inside the toroid and the
second half out of it. For this dataset we created 10 rules
per anomaly, using SRF, resulting in 2000 rules. After the
simple aggregation, described in Section 4, only 8 rules
left. All of them printed in Table 1, sorted by their
respective support. All rules before aggregation were used to
calculate confidence boost. Otherwise, many rules would
have confidence boost equal to infinity because there are
no rules defined on any subset of their item sets f .</p>
        <p>The fact is that rules r5 − r8 have quite small support
but, according to the other measures and our intuition, they
are very important. The small supports is due to the small
rule
r1 = x1 &gt; −0.33 ∧ x2 &gt; −0.39
r2 = x1 &gt; −0.33 ∧ x2 &lt; 0.3
r3 = x1 &lt; 0.4 ∧ x2 &lt; 0.34
r4 = x1 &lt; 0.37 ∧ x2 &gt; −0.37
r5 = x2 &gt; 2.2
r6 = x1 &gt; 2.3
r7 = x2 &lt; −2.4
r8 = x1 &lt; −2.4
supp
0.38
0.37
0.37
0.37
0.02
0.02
0.01
0.01
number of data points explained, lets recall that anomalies
are only one sixth of all data points in this dataset. Both,
lift and confidence boost, mostly reflects our subjective
expectations.</p>
        <p>All rules are depicted at Figure 1. Its evident that
presented rules cannot separate anomalies from normal
samples perfectly. Especially difficult are anomalies inside
the donut. To separate those inner anomalies perfectly it
would be necessary to combine more rules together, for
example r1 and r3 or r2 and r4.
The virginica species were selected as anomalous class for
the iris data set. Five rules per anomaly were produced
using SRF, resulting in 250 rules. After aggregation we have
got 6 rules. They are written in Table 2 with their
respective quality measures. Confidence boost was calculated
using all 250 rules.</p>
        <p>The main problem with all those rules is that almost
every one of them can sufficiently separate anomalies from
rule
r1 = x1 &gt; 6 ∧ x4 &gt; 1.7
r2 = x4 &gt; 2
r3 = x3 &gt; 5.5
r4 = x2 &lt; 2.8 ∧ x4 &gt; 1.6
r5 = x1 &gt; 7.3
r6 = x2 &lt; 2.2
normal samples. No one of presented measures could help
in selecting the most informative, yet small as possible, set
of rules. Because presented rules are seen informationally
equivalent by all quality measures. This doesn’t say much
about difference between the quality measures but it
justifies the rule extraction process, because all generated rules
have high score.</p>
        <p>Figure 2 shows the iris dataset with rules r1, r2 and r5.
6.3</p>
      </sec>
      <sec id="sec-6-2">
        <title>Letter Recognition</title>
        <p>This dataset was created as a classification problem with
26 classes, one class for each letter in the English alphabet.
The charactersÒ were obtained from 20 different fonts and
randomly distorted to produce 20,000 unique samples
presented as 16 dimensional numerical vectors. Letter X was
selected as the anomaly class. SRF produced more than
15,000 rules, which were reduced by aggregation to 1955.
Aggregated rules with support higher than 0.10 are
presented in Table 3. The ranking of those rules is plotted
at Figure 3. Its evident that the ranking given by lift and
confidence boost differs substantially.</p>
        <p>It is nearly impossible to evaluate all rules. Therefore,
we have selected only those with confidence boost higher
than one (202 rules) and those with lift higher than one
lift
confidence boost
25
20
15
g
n
i
k
n
a
r
10
5
00</p>
        <p>rules
5
10
15
20
25
(446 rules). The confidence boost selected the smaller rule
set where almost all rules looked plausible. On the other
hand, they missed some really interesting ones most highly
rated by lift. The confidence boost tend to choose shorter
more similar rules, whereas lift prefer richer and more
heterogenous rules. Therefore, from our point of view the
top 10 lift rules
x8 &lt; 1.8 ∧ x15 &gt; 7.8
x2 &lt; 1.8 ∧ x3 &gt; 4.5 ∧ x9 &gt; 8.8
x5 &gt; 6.8 ∧ x8 &lt; 1.8 ∧ x15 &gt; 7
x5 &gt; 5 ∧ x9 &gt; 8.2 ∧ x10 &lt; 5.2
x4 &lt; 2.2 ∧ x6 &gt; 8.2 ∧ x9 &gt; 7.8
x2 &lt; 5.6 ∧ x3 &gt; 7.2 ∧ x8 &lt; 1.2
x1 &gt; 4.5 ∧ x8 &lt; 1.8 ∧ x15 &gt; 7.8
x7 &lt; 5.8 ∧ x8 &lt; 2.5 ∧ x15 &gt; 7.8
x2 &lt; 4.8 ∧ x9 &gt; 8 ∧ x11 &gt; 9.8
x11 &gt; 9 ∧ x14 &gt; 13
top 10 β rules
x4 &lt; 1.2 ∧ x9 &gt; 8
x2 &lt; 1.5 ∧ x9 &gt; 8.8
x8 &lt; 1.2 ∧ x9 &gt; 8.8
x9 &gt; 8 ∧ x14 &lt; 5.2
x11 &gt; 12 ∧ x16 &lt; 4.5
x6 &gt; 9.8 ∧ x8 &lt; 1.2
x4 &gt; 8.8 ∧ x14 &lt; 5.2
x5 &gt; 8.5 ∧ x14 &lt; 5.2
x8 &lt; 1.2 ∧ x16 &gt; 9.9
x12 &gt; 10 ∧ x16 &lt; 5.2
best selection strategy is choosing top k rules according to
the lift ranking. The top 10 rules chosen from the whole
set by lift and confidence boost, regardless their support,
are in Table 4.</p>
        <p>Still there are too much rules to make some conclusions,
in our future work we are going to investigate more
measures of interestingness and novelty, which will hopefully
help us to reduce the amount of extracted rules even more.
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>In this paper, we presented a novel approach for the
explanation of an output of an arbitrary anomaly detector
using sapling random forests. The explanation is given as
conjunctions of atomic conditions, which can be viewed
as antecedents of association rules. Due to an extraction
method, the individual rules are short and
comprehensible. The main drawback was that the rule sets for the
bigger dataset were large and redundant. Therefore, we
applied multiple quality measures to evaluate them and
select those rules with desired properties. Performed
experiments showed that no one of presented measures reflect
our expectation. From the considered measures the lift
looks the most promising. But this paper is just a work
in progress and we don’t view this observation as a final
conclusion.</p>
      <p>For our future work we would like to have a measure
that will rate the novelty of a rule with respect to the set of
previously selected rules. The first idea is to chose those
rules that describe anomalies not covered by the already
selected rules. The second idea is to select rules which
may describe already covered anomalies but using
completely different set of features. The last thing we would
like to work on is finding a way of concatenating mined
rules to make smaller yet precise rule sets.</p>
      <sec id="sec-7-1">
        <title>Acknowledgement</title>
        <p>The research reported in this paper has been supported by
the Czech Science Foundation (GA Cˇ R) grant 13-17187S.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Aggarwal</surname>
            ,
            <given-names>C. C.</given-names>
          </string-name>
          :
          <article-title>Outlier analysis</article-title>
          . Springer,
          <year>2013</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Imielin´ski, T.,
          <string-name>
            <surname>Swami</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Mining association rules between sets of items in large databases</article-title>
          .
          <source>In: ACM SIGMOD Record</source>
          , volume
          <volume>22</volume>
          ,
          <fpage>207</fpage>
          -
          <lpage>216</lpage>
          , ACM,
          <year>1993</year>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Balcázar</surname>
            ,
            <given-names>J. L.</given-names>
          </string-name>
          :
          <article-title>Formal and computational properties of the confidence boost of association rules. ACM Transactions on Knowledge Discovery from Data (TKDD) 7(4) (</article-title>
          <year>2013</year>
          ),
          <fpage>19</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Brin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motwani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ullman</surname>
            ,
            <given-names>J. D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsur</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Dynamic itemset counting and implication rules for market basket data</article-title>
          .
          <source>In: SIGMOD</source>
          <year>1997</year>
          ,
          <source>Proceedings ACM SIGMOD International Conference on Management of Data</source>
          ,
          <fpage>255</fpage>
          -
          <lpage>264</lpage>
          , Tucson, Arizona, USA, May 1997
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Chandola</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banerjee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Anomaly detection: a survey</article-title>
          .
          <source>ACM Computing Surveys (CSUR) 41(3)</source>
          (
          <year>2009</year>
          ),
          <fpage>15</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Christen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goiser</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Quality and complexity measures for data linkage and deduplication</article-title>
          .
          <source>In: Quality Measures in Data Mining</source>
          ,
          <fpage>127</fpage>
          -
          <lpage>151</lpage>
          , Springer,
          <year>2007</year>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Dang</surname>
            ,
            <given-names>X. -H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Micenková</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Assent</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
          </string-name>
          , R. T.:
          <article-title>Local outlier detection with interpretation</article-title>
          .
          <source>In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2013)</source>
          ,
          <year>2013</year>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8] de Vries,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Chawla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Houle</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. E.</surname>
          </string-name>
          :
          <article-title>Finding local anomalies in very high dimensional space</article-title>
          .
          <source>In: IEEE 10th International Conference on Data Mining (ICDM</source>
          <year>2010</year>
          ),
          <year>2010</year>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Fujimaki</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yairi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Machida</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>An approach to spacecraft anomaly detection problem using kernel feature space</article-title>
          .
          <source>In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining</source>
          ,
          <year>2005</year>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Garcia-Teodoro</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diaz-Verdejo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maciá-Fernández</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vázquez</surname>
          </string-name>
          , E.:
          <article-title>Anomaly-based network intrusion detection: techniques, systems and challenges</article-title>
          .
          <source>Computers &amp; Security</source>
          , 2009
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Hawkins</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          :
          <article-title>Identification of outliers</article-title>
          , volume
          <volume>11</volume>
          . Springer,
          <year>1980</year>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Jalali-Heravi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaïane</surname>
            ,
            <given-names>O. R.:</given-names>
          </string-name>
          <article-title>A study on interestingness measures for associative classifiers</article-title>
          .
          <source>In: Proceedings of the 2010 ACM Symposium on Applied Computing</source>
          ,
          <volume>1039</volume>
          -
          <fpage>1046</fpage>
          , ACM,
          <year>2010</year>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Karr</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Big data brings marketing big numbers</article-title>
          . [https://www.marketingtechblog.com/ ibm-big
          <string-name>
            <surname>-</surname>
          </string-name>
          data-marketing/],
          <year>2012</year>
          [Online; accessed 19-June-2015].
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Knorr</surname>
            ,
            <given-names>E. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
          </string-name>
          , R. T.:
          <article-title>Algorithms for mining distancebased outliers in large datasets</article-title>
          .
          <source>In: Proceedings of the International Conference on Very Large Data Bases</source>
          ,
          <year>1998</year>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Knorr</surname>
            ,
            <given-names>E. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
          </string-name>
          , R. T.:
          <article-title>Finding intensional knowledge of distance-based outliers</article-title>
          .
          <source>In: VLDB</source>
          ,
          <year>1999</year>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Kopp</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pevný</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holenˇa</surname>
          </string-name>
          , M.:
          <article-title>Interpreting and clustering outliers with sapling random forests</article-title>
          .
          <source>In: Information Technologies - Applications and Theory Workshops</source>
          , Posters, and
          <string-name>
            <surname>Tutorials</surname>
          </string-name>
          (ITAT
          <year>2014</year>
          ),
          <year>2014</year>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Lichman</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>UCI machine learning repository</article-title>
          . [http: //archive.ics.uci.edu/ml/]. University of California, Irvine, School of Information and Computer Sciences,
          <year>2013</year>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Using general impressions to analyze discovered classification rules</article-title>
          .
          <source>In: KDD</source>
          ,
          <fpage>31</fpage>
          -
          <lpage>36</lpage>
          ,
          <year>1997</year>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ting</surname>
            ,
            <given-names>K. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Z. -H.</given-names>
          </string-name>
          :
          <article-title>Isolation forest</article-title>
          . In: Eighth IEEE International Conference on Data
          <source>Mining (ICDM</source>
          <year>2008</year>
          ),
          <year>2008</year>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Micenková</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
          </string-name>
          , R. T.,
          <string-name>
            <surname>Dang</surname>
            ,
            <given-names>X. -H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Assent</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Explaining outliers by subspace separability</article-title>
          .
          <source>In: IEEE 13th International Conference on Data Mining (ICDM</source>
          <year>2013</year>
          ),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Pevný</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kopp</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Explaining anomalies with sapling random forests</article-title>
          .
          <source>In: Information Technologies - Applications and Theory Workshops</source>
          , Posters, and
          <string-name>
            <surname>Tutorials</surname>
          </string-name>
          (ITAT
          <year>2014</year>
          ),
          <year>2014</year>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Pradnya</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khanuja</surname>
            ,
            <given-names>H. K.</given-names>
          </string-name>
          :
          <article-title>Article: A survey on outlier detection in financial transactions</article-title>
          .
          <source>International Journal of Computer Applications</source>
          <volume>108</volume>
          (
          <issue>17</issue>
          ) (
          <year>December 2014</year>
          ),
          <fpage>23</fpage>
          -
          <lpage>25</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Rousseeuw</surname>
            ,
            <given-names>P. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leroy</surname>
            ,
            <given-names>A. M.:</given-names>
          </string-name>
          <article-title>Robust regression and outlier detection</article-title>
          . John Wiley &amp; Sons, 2005
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Tibshirani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hastie</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Outlier sums for differential gene expression analysis</article-title>
          .
          <source>Biostatistics</source>
          ,
          <year>2007</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>