<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LionForests: Local Interpretation of Random Forests</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ioannis Mollas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nick Bassiliades</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ioannis Vlahavas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Grigorios Tsoumakas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Informatics, Aristotle University of Thes- Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International</institution>
          ,
          <addr-line>CC BY 4.0</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Towards a future where ML systems will integrate into every aspect of people's lives, researching methods to interpret such systems is necessary, instead of focusing exclusively on enhancing their performance. Enriching the trust between these systems and people will accelerate this integration process. Many medical and retail banking/finance applications use state-of-the-art ML techniques to predict certain aspects of new instances. Thus, explainability is a key requirement for human-centred AI approaches. Tree ensembles, like random forests, are widely acceptable solutions on these tasks, while at the same time they are avoided due to their black-box uninterpretable nature, creating an unreasonable paradox. In this paper, we provide a methodology for shedding light on the predictions of the misjudged family of tree ensemble algorithms. Using classic unsupervised learning techniques and an enhanced similarity metric, to wander among transparent trees inside a forest following breadcrumbs, the interpretable essence of tree ensembles arises. An interpretation provided by these systems using our approach, which we call “LionForests”, can be a simple, comprehensive rule.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Machine learning (ML) models are becoming pervasive in our
society and everyday life. Such models may contain errors, or may be
subject to manipulation from an adversary. In addition, they may be
mirroring the biases that exist in the data from which they were
induced. For example, Apple’s new credit card is being recently
investigated over claims it gives women lower credit [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], IBM Watson
Health was accused of suggesting unsafe treatments for patients [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
and state-of-the-art object detection model YOLOv2 is easily tricked
by specially designed patches [
        <xref ref-type="bibr" rid="ref10 ref41">10, 41</xref>
        ]. Being able to understand how
an ML model operates and why it predicts a particular outcome is
important for engineering safe and unbiased intelligent systems.
      </p>
      <p>
        Unfortunately, many families of highly accurate (and thus
popular) models, such as deep neural networks and tree ensembles, are
opaque: humans cannot understand the inner workings of such
models and/or the reasons underpinning their predictions. This has
recently motivated the development of a large body of research on
interpretable ML (IML), concerned with the interpretation of black box
models [
        <xref ref-type="bibr" rid="ref1 ref13 ref14 ref21 ref23 ref29 ref39">1, 13, 14, 21, 23, 29, 39</xref>
        ].
      </p>
      <p>
        Methods for interpreting ML models are categorised, among other
dimensions [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], into global ones that uncover the whole logic and
structure of a model and local ones that aim to interpret a single
prediction, such as “Why has this patient to be immediately
hospitalized?”. This work focuses on the latter category. Besides their utility
in uncovering errors and biases, local interpretation methods are in
certain domains a prerequisite due to legal frameworks, such as the
General Data Protection Regulation (GDPR) [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] of the EU and the
Equal Credit Opportunity Act of the US2.
      </p>
      <p>
        Another important dimension, which IML methods can be
categorised, concerns the type of ML model that they are interpreting [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Model-agnostic methods [
        <xref ref-type="bibr" rid="ref31 ref37 ref38">31, 37, 38</xref>
        ] can be applied to any type of
model, while model-specific methods [
        <xref ref-type="bibr" rid="ref30 ref33 ref34 ref4">4, 30, 33, 34</xref>
        ] are engineered
for a specific type of model. Methods of the former category have
wider applicability, but they just approximately explain the models
they are applied to [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This work focuses on the latter category of
methods, proposing a technique specified for tree ensembles [
        <xref ref-type="bibr" rid="ref6 ref8">6, 8</xref>
        ],
which are very effective in several applications involving tabular and
time-series data [
        <xref ref-type="bibr" rid="ref45">45</xref>
        ].
      </p>
      <p>
        Past work on model-specific interpretation techniques, about tree
ensembles, is limited [
        <xref ref-type="bibr" rid="ref34 ref43">34, 43</xref>
        ]. iForest [
        <xref ref-type="bibr" rid="ref43">43</xref>
        ], a global and local
interpretation system of random forests (RF), provides insights for a
decision through a visualisation tool. Notwithstanding, such visual
explanation tool is very complex for non-expert users, while at the same
time requires user interaction in order to construct the local
interpretations. Another instance-level interpretation technique for RF [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ],
produces an interpretation in the form of a list of features with their
ranges, accompanied by an influence metric. If the list of features is
extensive, and the ranges are very narrow, the interpretation can be
considered as unreliable and untrustworthy, because small changes in
the features will render the interpretation useless. Finally, both
methods do not handle categorical data appropriately.
      </p>
      <p>
        To address the above problems, we introduce a local-based
modelspecific approach for interpreting an individual prediction of an RF
through a single rule, in natural language form. The ultimate goal is
mainly to reduce the number of features and secondly to broaden the
feature-ranges producing more robust, indisputable and intuitive
interpretations. Additionally, the categorical features are handled
properly, providing intelligible information about them throughout the
rules. The constructed rule will be presented as the interpretation. We
call this technique “LionForests” (Local Interpretation Of raNdom
FORESTS) and we use its path and feature selection ability, based
on unsupervised techniques like association rules [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and k-medoids
clustering [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] to process the interpretations in order to make them
more comprehensible.
      </p>
      <p>The rest of this work is structured the following way. First,
Section 2 introduces the related work, presenting approaches of
interpretation techniques applicable to RF models. In Section 3, we present
the methodology of our approach to the interpretation of RF models.
In section 4, we perform the quantitative and qualitative analysis.
Finally, we conclude and present possible future directions of our
strategy in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        A number of works concerning the tree ensembles interpretation
problem are either model-agnostic or model-specific solutions, with
a global or local scope, as presented with a similar taxonomy in a
recent survey [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
      <p>
        A family of model-agnostic interpretation techniques about
blackbox models, including tree ensembles, concerns the efficient
calculation of feature importance. These are variations of feature
permutation methods [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], partial dependence plots [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] and individual
conditional expectation [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], which are global-based. SHAP [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] is an
alternative method to compute feature importance for both global and
local aspects of any black-box model.
      </p>
      <p>
        Specifically, on tree ensembles, the most common techniques
include the processes of extracting, measuring, pruning and selecting
rules from the trees to compute the feature importance [
        <xref ref-type="bibr" rid="ref11 ref40">11, 40</xref>
        ]. A
highly studied by many researchers [
        <xref ref-type="bibr" rid="ref11 ref12 ref25 ref44">11, 12, 25, 44</xref>
        ] technique
attempts to globally interpret tree ensembles using single-tree
approximations. But this method, as its name implies, approximates the
performance of the model it seeks to explain. Thus, this approach is
extremely problematic and criticised, because it is not feasible to
summarise a complex model like tree ensembles to a single tree [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. An
additional approach on interpreting tree ensembles focuses on
clustering the trees of an ensemble using a tree dissimilarity metric [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        iForest [
        <xref ref-type="bibr" rid="ref43">43</xref>
        ], a model-specific global and local approach, utilises
a path distance metric to project an instance’s paths to a
twodimensional space via t-Distributed Stochastic Neighbour
Embedding (t-SNE) [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. The path distance metric they propose considers
distant two paths in two cases: a) if a feature exists in only one out of
the two paths the distance between those two paths is increasing, b) if
a feature exists in both paths, the distance is increasing according to
the non-common ranges of the feature on the paths divided by half.
The total distance, which is the aggregation of those cases for all the
features, is finally divided by the total number of features appearing
at least in one out of the two paths. Except the projection of the paths
(Figure 1a), they provide feature summary (Figure 1b) and decision
path flow (Figure 1c), which is a paths overview. In feature summary,
a stacked area plot visualises every path’s range for a specific feature,
while decision path flow plot visualises the paths themselves.
However, they do not provide this information automatically. The user has
to draw a lasso (Figure 1f) around some points-paths in the paths
projection plot in order to get the feature summary and paths overview.
But requiring the user to select the appropriate paths is critical,
simply because the user can easily choose wrong paths, a small set of
paths, or even paths entirely different to the paths being responsible
for his prediction. That may lead to incorrect feature summary and
paths overview, thus to a faulty interpretation.
      </p>
      <p>
        Lastly, one technique [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] interprets tree ensembles in
instancelevel (local-based technique) providing as interpretation a set of
features with their ranges, ranked based on their contribution (see
Figure 2). Thus, the interpretation process consists of two parts. Firstly,
they calculate the influence of a feature by monitoring the changes
of the activated nodes for a specific instance’s prediction. This
influence later will be used for the ranking process. The second step is to
find the narrowest range across all trees for every feature. However,
they do not attempt to expand these ranges, while they also claim that
their influence metric assigns zero influence to some features, and by
extension removing them, they could offer more compact
explanations. In spite of this, they do not know, by keeping only features
with a non-zero influence, that these features will at least be present
in half plus one paths to preserve the same prediction of an instance.
Finally, they do not manage categorical features properly.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>OUR APPROACH</title>
      <p>Our objective is to provide local interpretation of RF binary
classifiers. In RF, a set of techniques like data and feature sampling, is used
in order to train a collection of T weak trees. Then, these trained trees
vote for an instance’s prediction:
h(xi) =
1</p>
      <p>X ht(xi)
jT j t2T
(1)
where ht(xi) is the vote cast from the tree t 2 T for the instance
xi 2 X, representing the probability P (C = cj jX = xi) of xi to
be assigned to class cj 2 C = f0; 1g, thus
ht(xi) =
(1
0
if P (C = 1jX = xi)
if P (C = 0jX = xi)
0:5
0:5:</p>
      <p>Each decision tree t 2 T is a directed graph, and by deconstructing
its structure, we are able to derive a set Pt of paths from the root
to the leaves. Therefore, every instance can be classified with one
of these paths. A path p 2 Pt is a conjunction of conditions, and
the conditions are features and values with relations and &gt;. For
example, a path from the tree on Figure 3 could be the following:</p>
      <p>We are presenting LionForests, a framework for interpreting RF
models in the instance level. LionForests is a pipeline of actions:
a) feature-ranges extraction, reduction through b1) association rules,
b2) clustering and b3) random selection, c) categorical feature
handling, and, finally, d) interpretation composition.
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Feature-Ranges Extraction</title>
      <p>
        Our approach delivers local explanations for RF. Assume we have
an RF of N trees and an instance x, which is classified by the forest
as cj 2 C = f0; 1g. We focus on the K N2 trees of the forest
that classify x as cj . For each feature, we compute from each of
the K trees that it appears in, its values range as imposed from the
conditions involving this feature in the path from the root to the leaf.
Figure 4 shows an example of these ranges for feature ‘variance’ with
range 1:::0:1 from the Banknote dataset [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] for a particular RF of
50 trees and a particular instance whose value for the skew feature
is 0:179. Figure 6 shows the corresponding stacked area plot. The
highlighted (cyan/light grey) area represents the intersection of these
ranges, which they will always contain the instance’s value for the
specific feature. Moreover, no matter how much the feature value is
going to change, as long as it stays within this intersection range, the
instance will not follow a different decision path in these trees.
      </p>
      <p>In order to give a brief example, we have the following three paths:
p1 if f1 0:6 and ::: then Class_A
p2 if f1 0:6 and f1 &gt; 0:469 and ::: then Class_A
p3 if f1 &gt; 0 and f1 1 and ::: then Class_A
0:47 f1 0:6. This intersection range will always contain the
instance’s value for the specific feature. Moreover, no matter how
much the feature value is going to change, as long as it stays within
this intersection range, the decision paths are not going to change.
For example, if the value of the instance for the feature f1 was 0:5, if
the value will change to 0:52, each tree will take its decision through
the same path. Summarising the aforementioned, an interpretation
can have this shape:
1. A lot of paths can lead to an explanation with many features, by
extension to an unintelligible understanding and a frustrated user.
2. A lot of paths will lead to a small, strict and very specific feature
range. For example, f1 instance’s value was 0:5 and the
intersection range of all paths for this feature occurs to be 0:47 f1 0:6,
while the feature range is [ 1; 1]. A narrow range, like the
aforementioned, would result in a negative impression of the model,
which will be considered unstable and unreliable. Then, a broader
range will be less refutable.</p>
      <p>Consequently, we formulate the optimisation problem (Eq. 3) to
minimise the number of features that satisfy the paths of a subset
of the original trees, thereby retaining the same classification result
with the original set of trees and making the size of the total number
of trees equal to or greater than the quorum, in order to ensure the
consistency of the results of the original RF model.</p>
      <p>minimise</p>
      <p>F 0 F
subject to
jF 0j
p = ffi
quorum
vj jfi 2 F 0g; p 2 Pt8t 2 T 0;
1 1 X ht(xi) +
2 c = b jT j t2T
1
2 c;
(3)</p>
      <p>To give an example of the equation b jT1 j Pt2T ht(xi)+ 12 c. When
70 out of jT j = 100 trees are voting for class 1, then we have
b 1010 70 + 0:5c = b1:2c ! 1. On the other side, if 25 out of
jT j = 100 trees are voting class 1 (the minority), then we have
b 1010 25 + 0:5c = b0:75c ! 0. Therefore, we are aiming to find
the smallest T 0 T , which will produce the same classification as
the original T trees.
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Reduction through Association Rules</title>
      <p>
        The first step of the reduction process begins by using association
rules. Association rules [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] mining is an unsupervised technique,
which is used as a tool to extract knowledge from large datasets
and explore relations between attributes. In association rules, the
attributes are called items I = fi1; i2; : : : ; ing. Each dataset
contains sets of items, called itemsets T = ft1; t2; : : : ; tmg, where
ti I. Using all possible items of a dataset, we can find all the
rules X ) Y , where X; Y I. X is called antecedent, while Y
is called consequent. In association rules the goal is to calculate the
support and confidence of each rule in order to find useful relations.
A simple observation is that X is independent of Y when the
confidence is critically low. Furthermore, we can say that X with high
support, means it is probably very important.
      </p>
      <sec id="sec-5-1">
        <title>But how can we use association rules in random forests? We are</title>
        <p>
          going to do this at the path-level. The items I will contain the features
F of our original dataset. The dataset T , which we are going to use to
mine the association rules, will contain sets of features that represent
each path ti = fij jij = fj jfj vk 2 pi; pi 2 P g. It is significant
to mention that we keep only the presence of a feature in the path,
and we discard its value vj . Then, it is feasible to apply association
rules techniques, like apriori algorithm [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>The next step is to sort the association rules extracted by the
apriori algorithm based on the ascending confidence score of each rule.
For the rule X ) Y , with the lowest confidence, we will take the
X and will add its items to the list of features. Afterwards, we are
calculating the number of paths containing conjunctions, which are
satisfied with the new feature set. If the amount of paths is at least
half plus one paths of the total number of trees, we have found the
reduced set of paths. We have a quorum! Otherwise, we iterate and
add more features from the next antecedent of the following rule. By
using this technique, we reduce the number of features, and we have
the new feature set F 0 F . Reducing the features, can lead to a
reduced set of paths too, because paths containing conjunctions with
the redundant features will no longer be valid. Thus, for every path p
we have the following representation:</p>
        <p>Illustrating this, for a toy dataset of four features F =
[f1; f2; f3; f4] and an RF model with five estimators T =
[t1; t2; t3; t4; t5], for every instance x, from each ti 2 T we can
extract a path pi. Supposing that for the instance x, we have five
paths:
p1 if f1 and f2 and f4 then Class_A
p2 if f1 and f3 and f4 then Class_A
p3 if f1 and f2 and f4 then Class_A
p4 if f3 and f4 then Class_A
p5 if f4 then Class_A</p>
        <p>Then, we can compute the association rules using apriori. Our
objective is to create a set of features F 0 F . We take the first rule
f4 ) (f3; f1), the rule with the lowest confidence. This rule informs
us that f4, which has the highest support value, exists in 80% of
the paths without (f1; f3). Thus, the first thing we add to our
feature list is the antecedent of this rule, f4. By adding the feature, we
are counting how many paths can be satisfied with the features of
F 0 = [f4]. Only one path is valid (p5), and is not enough because we
need a quorum. Skipping all the association rules having the chosen
features at their antecedents, the next rule we have is f1 ) f3. f1
has 0:6 support value, and the rule has 0:33 confidence. This means
that in 66:6% of paths containing f1, the f3 is absent. We add f1 to
the feature list and now the F 0 = [f1; f4]. With this feature list only
the p5 is activated again. Hence, we need another feature. The next
rule we have is f3 ) (f1; f4) with 0:4 support of f3 and confidence
0:2. Adding f3 now the paths p2; p4 and p5 are valid.</p>
        <p>In the aforementioned example, we achieved to reduce the features
from four to three and the paths from five to three, as well. However,
applying this method to datasets with plenty of features and models
with more estimators, the reduction effect can be observed. Section 4
is seeking to explore that effect, through a set of experiments.
3.3</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Reduction through Clustering and Random</title>
    </sec>
    <sec id="sec-7">
      <title>Selection</title>
      <p>
        However, association rules may not be able to reduce the number of
features and, consequently, the number of paths. In that case, a
second reduction technique based on clustering is applied. Clustering
is yet another group of unsupervised ML techniques, aside from the
association rules. k-medoids [
        <xref ref-type="bibr" rid="ref27 ref5">5, 27</xref>
        ] is a well-known clustering
algorithm, which considers as cluster’s centre an existing element from
the dataset. This element is called medoid. k-medoids, like other
clustering techniques, needs a distance or a dissimilarity metric to
find the optimum clusters. Thus, performing clustering to paths will
require a path specific distance or dissimilarity metric.
      </p>
      <p>
        We designed a path similarity metric in Algorithm 1. This
similarity metric is close to the distance metric introduced in iForest [
        <xref ref-type="bibr" rid="ref43">43</xref>
        ],
but eliminates some minor problems of the aforementioned. Parsing
this algorithm, if a feature is absent from both paths, the similarity of
these paths increases by 1. When a feature is present in both paths,
the similarity increases by a value between 0 and 1, which is the
intersection of the two ranges normalised by the union of the two ranges.
The similarity metric is biased towards the absence of a feature from
both paths because it always assigns one similarity point. However,
this is a desirable feature for our goal of minimising the feature set.
Algorithm 1: Path similarity metric
input : pi, pj , f eature_names,
      </p>
      <p>min_max_f eature_values
return: similarityij
sij 0
for f in f eature_names do
if f in pi and f in pj then
find li, ui, lj , uj lower and upper bounds
inter min(ui; uj ) max(li; lj )
union max(ui; uj ) min(li; lj )
if inter &gt; 0 and union 6= 0 then</p>
      <p>sij sij + inter=union
end
else if f not in pi and f not in pj then</p>
      <p>sij sij + 1
end
end
return sij =len(f eature_names)</p>
      <p>In Algorithm 2, we calculate the k-medoids and their clusters
using the similarity metric of Algorithm 1. Afterwards, we perform an
ordering of the medoids based on the number of paths they cover
in their clusters. Then, we collect paths from the larger clusters into
a list, until we acquire at least a quorum. By summing larger
clusters first, the possibility of feature reduction is increasing, because
Algorithm 2: Paths reduction through k-medoids clustering
input : similarity_matrix, no_of _estimators,</p>
      <p>paths, no_of _medoids
return: paths
quorum no_of _estimators=2 + 1
m kmedoids(similarity_matrix; no_of _medoids)
sorted_m sort_by_key(m; descending = T rue)
count 0, size 0, reduced_paths []
while size &lt; quorum and count &lt; len(sorted_m) do
for j in m[sorted_m[count]] do
reduced_paths:append(paths[j])
count + 1 ,size</p>
      <p>len(reduced_paths)
end
count
end
return paths
end
if size
paths
quorum then</p>
      <p>reduced_paths
the paths inside a cluster tend to be more similar among them.
Additionally, the biased similarity metric would cluster paths with less
irrelevant features, leading to a subset of paths that are satisfied with
a smaller set of features.</p>
      <p>Performing clustering does not guarantee feature reduction, but
there is a probability of an unanticipated reduction of the feature set.
This procedure attempts to minimise the number of paths at least at
the quorum. Unlike the association rules method, which may not
accomplish to reduce features, clustering will significantly reduce the
number of paths. By the end of the reduction process through
clustering, random selection is applied to the paths to obtain the acceptable
minimum number of paths, in case of reduction via clustering has not
reached the quorum.
3.4</p>
    </sec>
    <sec id="sec-8">
      <title>Handling Categorical Features</title>
      <p>
        It is possible, even expected, to deal with a dataset containing
categorical features. Of course, a transformation through OneHot or
Ordinal [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ] encoding is used to make good use of these data.
This transformed type of information will then be acceptable from
ML systems. But is there any harm to interpretability caused
by the use of encoding methods? Sadly, yes! Using ordinal
encoding will transform a feature like country=[GR; U K; : : : ] to
country=[0; 1; 2; : : : ]. As a result we lose the intelligibility of the
feature. On the other hand, using OneHot encoding will increase
dramatically the amount of features leading to over-length and
incomprehensible interpretations by transforming the feature country to
country_GR=[0; 1], country_U K=[0; 1], and so forth. Since the
encoding transformations are part of the feature engineering and are
not invariable, we cannot construct a fully automated process to
inverse transform the features into human interpretable forms within
the interpretations.
      </p>
      <p>Nevertheless, LionForests provides two automated
encoding processes using either OneHot or Ordinal encoding and
their inverse transformation for the interpretation
extraction. Feature-ranges of Ordinal encoded data transform like
(1 country 2)!(country=[U K; F R]), while feature-ranges of
OneHot encoded data (0:5 country_U K 1)!(country=U K)
and feature-ranges like (0 country_GR 0:49) are removed.
The excluded OneHot encoded features for the categorical feature
should appear to the user as possible alternative values. If a feature
reduction method reduces one of the encoded OneHot features, it
will not appear in the feature’s list of alternative values, but in the
list of values that do not influence the prediction. For this reason,
the categorical features will appear in the interpretations with a
notation ‘c’ like ‘categorical_f eaturec = value’. Depending on
the application and the interpretation, the user will be able to request
a list of alternative values or will be able to simply hover over the
feature to reveal the list. Section 4.3.3 is showcasing transformations
of OneHot encoded features, as well as one example of a OneHot
encoded feature’s alternative values list.
3.5</p>
    </sec>
    <sec id="sec-9">
      <title>Interpretation Composition</title>
      <p>
        These processes are part of the LionForests technique, which, in the
end, produces an interpretation in the form of a feature-range rule.
Lastly, LionForests combines the ranges of the features in the
reduced feature set to a single natural language rule. The order of
appearance of the feature ranges in the rule is determined by using a
global interpretation method, such as the SHAP TreeExplainer [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]
or the Scikit [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] RF model’s built-in feature importance attribute,
for smaller and larger datasets, respectively. One notable example of
an interpretation is the following:
‘if 0
f1
0:5 and
0:5
f3
0:15 then class_A’.
      </p>
      <p>(5)
We interpret this feature-range rule like that: “As long as the value of
the f1 is between the ranges 0 and 0.5, and the value of f3 is between
the ranges -0.5 and 0.15, the system will classify this instance to class
A. If the value of f1, f3 or both, surpass the limits of their ranges then
the prediction may change. Note that the features are ranked through
their influence”. This type of interpretation is comprehensible and
human-readable. Thus, if we manage to keep them shorter, then they
could be an ideal way to explain an RF model. A way to encounter an
over-length rule could be to hide the last n feature-ranges, which they
will be the least important due to the ordering process. At the same
time, users will have the ability to expand their rules to explore the
feature-ranges. However, we do not completely exclude those
features, but we are only hiding them because otherwise, we would
affect the correctness of both the explanation and the prediction. An
example is showcased in Section 4.3.3.
4</p>
    </sec>
    <sec id="sec-10">
      <title>EXPERIMENTAL RESULTS</title>
      <p>We first discuss the setup of our experiments. Then, we present
quantitative results, followed by qualitative results for the explanation of
particular instances.
4.1</p>
    </sec>
    <sec id="sec-11">
      <title>Experimental Setup</title>
      <p>The implementation of LionForests as well as of all the experiments
in this paper is available in the LionLearn repository at GitHub3.</p>
      <p>
        Our experiments were conducted on the following three tabular
binary classification datasets: Banknote authentication [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], Heart
(Statlog) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and Adult Census [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. The top part of Table 1 shows
the number of instances and features of each dataset. Particularly,
Adult Census contains 6 numerical and 8 categorical features. The
eight categorical features are transformed through OneHot encoding
into 74 numerical, resulting in a total of 80 features.
      </p>
      <p>
        We used the RandomForestClassifier from Scikit-learn [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] and
the MinMaxScaler with feature range [ 1; 1]. In order to work with
3 https://github.com/intelligence-csd-auth-gr/LionLearn
optimised models, a 10-fold cross-validation grid search was carried
out on each dataset, using the following set of parameters and
values: max_depth {1, 5, 7, 10}, max_features {‘sqrt’, ‘log2’, 75%,
None4}, min_samples_leaf {1, 2, 5, 10, 10%}, bootstrap {True,
False}, n_estimators {10, 100, 500, 1000}. The parameters’ values
that achieved the best F1 score, along with the score itself are shown
in the middle and bottom part of Table 1, respectively.
= ‘sqrt’). Thus, the overlapping features among the different paths are
fewer, and by extension, techniques relying on feature selection, like
association rules, cannot perform the feature reduction, as well as
path reduction. However, LionForests is an ensemble of techniques,
and we have revealed with these quantitative experiments the
importance of each part. We may infer that the LionForests strategy is
considerably effective in reducing both features and paths.
Banknote
      </p>
      <p>Heart (Statlog)</p>
    </sec>
    <sec id="sec-12">
      <title>Quantitative Results</title>
      <p>For each dataset, we train one RF model, using all data, with the
parameters shown in Table 1. Then we apply LionForests to all
instances of each dataset and report the mean feature and path
reduction in Table 2.</p>
      <p>In terms of path reduction, we notice that among the three
reduction techniques, random selection consistently leads to the best
results across all datasets, achieving an average reduction of 45.69%.
Clustering is the second-best method with only slightly worse results
in the first two datasets, but much worse results in the Adult Census
dataset, achieving an average of 41.94% reduction. Association rules
is the worst technique, achieving zero reduction in the Adult
Census dataset and an average of 19.90% reduction. Combining random
selection with other techniques does not lead to improved results.</p>
      <p>With respect to feature reduction, we find that the association rules
strategy leads to the best results in the first two datasets, reaching
an average reduction of 26.04%. The other two techniques achieve
negligible reduction in these two datasets. In banknote
authentication combining the three methods improves the reduction slightly
from 30.70% to 30.85%. In Adult Census on the other hand,
association rules achieve zero reduction of features similarly to paths.
The best result in this dataset is achieved by random selection,
followed closely by clustering. Combining these two techniques
improves slightly the reduction to 10%.</p>
      <p>The weak performance of association rules in Adult Census is
related to the small number of estimators (100) and the huge number of
features (80). In particular, each of the estimators can have a
maximum of eight features, as resulted from the grid search (max features
4 None = all features</p>
      <sec id="sec-12-1">
        <title>Reduction Technique</title>
        <p>Association Rules Clustering Random Based
X X X
- X X
X - X
X X
X -
- X
- - X</p>
        <p>Feature Path
30.85 5:1e 2 49.47 5:5e 15
2.15 1:8e 1 49.47 5:5e 15
30.70 0:0 49.47 5:5e 15
30.84 4:4e 2 48.56 2:4e 2
30.70 0:0 27.02 0:0
2.12 1:3e 1 47.57 3:6e 2
0.0 0:0 49.47 5:5e 15
Banknote Authentication
4.3</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>Qualitative Results</title>
      <p>LionForests technique provides consistent and robust rules which are
more indisputable from other interpretations because they are more
compact, have broader ranges for the features, while at the same time
present categorical data in a human-comprehensible form. In this
section, we provide an example from each dataset to demonstrate how
RF models can be interpreted efficiently.
4.3.1</p>
      <sec id="sec-13-1">
        <title>Banknote Authentication</title>
        <p>We observe that the complete methodology, incorporating
association rules, clustering and random selection reduction, achieves the
highest performance on both feature and path reduction for the first
dataset, Banknote Authentication. We also provide an example of a
pair of explanations (1) without and (2) with LionForests:
1. ‘if 2:4 variance
1:82 curtosis
fake banknote’
2. ‘if 2:4 variance
then fake banknote’</p>
        <p>6:83 and
2:13 and</p>
        <p>The reduced rule (2) is smaller than the original by two features.
In addition, the “curtosis” feature has a wider range. The instance has
a value of 1:92 for the “curtosis” feature, and in the original rule this
value is marginal for the very narrow range 1:82 curtosis 2:13
indicating that a small change may lead to a different outcome, but
this is not the case for the reduced rule as well. In addition, changing
the skew value from 2:64 to 4, which is outside the range of the
feature in the original rule, will not change the prediction and will
produce the same reduced range rule. We observe the same result
when we change the value of the “entropy” feature, as well as when
we tweak both “skew” and “entropy”.
4.3.2</p>
      </sec>
      <sec id="sec-13-2">
        <title>Heart Disease</title>
        <p>Again, with LionForests we achieve both higher feature and path
reduction ratios. There are thirteen features included in this particular
dataset and, as a result, the interpretations may have a total of thirteen
features. In fact, in this case the reduction of features is essential to
provide comprehensible interpretations. We choose an example, and
we present the original rule (1) and the reduced rule (2):</p>
        <p>Feature Path
10.03 1:5e 2 44.18 5:5e 15
10.03 1:5e 2 44.18 5:5e 15
9.11 1:4e 2 44.18 5:5e 15
7.82 1:4e 2 36.26 2:6e 2</p>
        <p>0.0 0:0 0.0 0:0
7.82 1:4e 2 36.26 2:6e 2
9.11 1:4e 2 44.18 5:5e 15</p>
        <p>Adult Census</p>
        <sec id="sec-13-2-1">
          <title>Mean Reduction %</title>
          <p>Feature Path
20.75 45.69
4.06 45.69
20.39 45.69
20.01 42.57
17.36 19.90
3.32 41.94
3.04 45.69</p>
        </sec>
        <sec id="sec-13-2-2">
          <title>Reduction %</title>
          <p>Feature Path
21.37 0:0 43.41 0:0
8:6e 3 1:3e 2 43.41 0:0
21.37 0:0 43.41 0:0
21.37 0:0 42.88 2:2e 2
21.37 0:0 32.67 0:0
5:7e 3 1:1e 2 41.98 7:4e 2
2:9e 3 8:6e 3 43.41 0:0</p>
          <p>Heart Disease
1. ‘if 6:5 reversable def ect 7:0 and 3:5 chest pain 4:0
and 0:0 number of major vessels 0:5 and 1:55
oldpeak 1:7 and 0:5 exercise induced angina 1:0
and 128:005 maximum heart rate achieved 130:998
and 1:5 the slope of the peak exercise 2:5 and
sexc = M ale and 184:999 serum cholestoral
199:496 and 29:002 age 41:497 and 0:0
resting electrocardiographic results 0:5 and
119:0 resting blood pressure 121:491 and
0:0 f asting blood sugar 0:5 then presence’
2. ‘if 6:5 reversable def ect 7:0 and 3:5 chest pain 4:0
and 0:0 number of major vessels 0:5 and 1:55
oldpeak 1:7 and 0:5 exercise induced angina 1:0
and 128:005 maximum heart rate achieved 133:494
and 1:5 the slope of the peak exercise 2:5 and
184:999 serum cholestoral 199:496 and 119:0
resting blood pressure 121:491 then presence’</p>
          <p>The reduced rule is shorter than the original rule by four features.
In addition, the “maximum heart rate achieved” feature has a wider
range in the reduced rule. Changing the “sex” value from ‘Male’ (1)
to ‘Female’ (0) did not change the reduced rule at all. We tweak
“age” value from 35 to 15 and again the reduced rule remains the
same. Thus, features like “age”, “sex”, “resting electrocardiographic
results” and “fast blood sugar”, cannot influence the prediction.
4.3.3</p>
        </sec>
      </sec>
      <sec id="sec-13-3">
        <title>Adult Census</title>
        <p>Finally, we present a pair of explanations provided by LionForests
for an instance of Adult Census, without (1) and with (2) reduction:
1. ‘if marital statusc = M arried and sexc = F emale and
educationc = HS grad and workclassc = P rivate and
94721 f nlwgt 161182 and 47 age 53 and 15
hours per week 25 and native countryc = J amaica and
[other 2 feature-ranges] then income &gt;50K’
2. ‘if marital statusc = M arried and sexc = F emale and
educationc = HS_grad and workclassc = P rivate and
87337 f nlwgt 382719 and 47 age 63 and 15
hours per week 99 and native countryc = J amaica and
[other 2 feature-ranges] then income &gt;50K’</p>
        <p>The reduced rule is thirteen features smaller than the original rule.
This is not directly obvious because some OneHot categorical
features are not presented. For example, only valid features such as
“marital_status_Married” and “education_HS_Grad” are presented
as described in Section 3.4. Furthermore, we observe that the ranges
of “age”, “fnlwft” and “hours per week” are broader. Specifically,
“age” range from [47; 53] increased to [47; 63], while “hours per
week” range from [15; 25] expanded to [15; 99]. Moreover, we can
explore the categorical feature’s “native country” alternative values.
In Table 3, the first list refers to the values that may change the
prediction of an instance, while the second shows the values that cannot
affect the prediction. In the original rule this type of information was
not available because the non-affecting values were present.
5</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>CONCLUSION</title>
      <p>
        Providing helpful explanations is a challenging task. Providing
comprehensible and undeniable explanations is even tougher. Since
model-agnostic approaches produce approximations, i.e. the nearest
Possible values of “native_country”, which they
may affect the prediction preserve the prediction
‘‘J‘G‘‘PaI‘‘YreMpPueruahelmegnarixltno’aioi,pdsnc-‘lp’yRoaC,i’’vin,‘h,ciTe‘ia‘onSasU’’’ia,o,w,n’‘u‘,‘iNVattE‘heniDindc’e’,g,a-otnS‘lr‘maCaPatnagmioontdurle’i’utasc,u,’m’a‘g‘,,CnTba‘‘-SalChiR’amca,a’eoi,‘nblptLa‘alouInaaddtbndoaaild’las’iy,,c’’’,’,’,, ‘‘‘EGT‘lr‘Iu-inCSan‘dtauiHeidlbamvaoa’adn,’ad,&amp;d‘loaF‘uTHr’rr’,oaa,a‘nbsi‘P’tcaIiereg’ar,’oun,’’’,,,
optimal ones but not the optimal explanations, the attempt to
indifferently interpret each black-box model will not lead to the desired
outcome. In this work, we introduced a model-specific local-based
approach for obtaining real interpretations of random forests
predictions. Other works [
        <xref ref-type="bibr" rid="ref34 ref43">34, 43</xref>
        ] attempt to provide explanations of this
form, but they do not try to make them more comprehensible, either
indisputable. A user may not be familiar with iForest’s [
        <xref ref-type="bibr" rid="ref43">43</xref>
        ]
visualisation tool. Besides, an interpretation containing a lot of features with
narrow ranges [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] may lead to incomprehensible and untrustworthy
rules. The proposed technique, LionForests, will provide users with
small rules as interpretations in natural language, which by
widening the feature ranges will be more reliable and trustworthy as well.
We use classic unsupervised methods such as association rules and
k-medoids clustering to achieve feature and path reduction.
      </p>
      <p>Nevertheless, LionForests is not a complete solution either, since
it is not appropriate for tasks of model inspection. For example, if
a researcher is working to build a reliable and stable model aiming
for the highest performance, a visualisation tool like iForest may be
preferred. This approach is the best and easiest way of providing an
interpretation to a non-expert user. Lastly, by reducing the number of
paths to the quorum to minimise the features and at the same time
to increase the features’ ranges, the outcome would be a discounted
probability of the classification of the instance, which poses
questions about the prediction’s reliability. This can be counter-attacked
by introducing a threshold parameter to the reduction effect,
requiring the algorithm to retain at least a specific percentage of the paths.</p>
      <p>
        Future research will explore the impact of tuning parameters, such
as the number of estimators or the maximum number of features, on
the reduction of features and paths. We also aim to apply LionForests
to different tree ensembles, rather than random forests, as well as to
various datasets and data types. FPGrowth [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], and its variant
FPMax [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], will be tested against the Apriori algorithm. In addition, we
will consider the possibility of adapting LionForests to other tasks,
such as multi-class or multi-label classification, as well as regression.
We will also explore the possibility of using LionForests
interpretations to provide descriptive narratives through counter-examples.
Ultimately, by means of a qualitative, human-oriented analysis, we will
try to explore this promising method in order to prove its
intelligibility and its necessity as a foundation for human-centred artificial
intelligence systems based on interpretable ML methods.
      </p>
    </sec>
    <sec id="sec-15">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This paper is supported by the European Union’s Horizon 2020
research and innovation programme under grant agreement No
825619, AI4EU Project5.</p>
    </sec>
    <sec id="sec-16">
      <title>REFERENCES</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Amina</given-names>
            <surname>Adadi</surname>
          </string-name>
          and Mohammed Berrada, '
          <article-title>Peeking inside the black-box: A survey on explainable artificial intelligence (xai)'</article-title>
          , IEEE Access,
          <volume>6</volume>
          ,
          <fpage>52138</fpage>
          -
          <lpage>52160</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Rakesh</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , Tomasz Imielin´ ski, and Arun Swami, '
          <article-title>Mining association rules between sets of items in large databases', in Acm sigmod record</article-title>
          , volume
          <volume>22</volume>
          , pp.
          <fpage>207</fpage>
          -
          <lpage>216</lpage>
          . ACM, (
          <year>1993</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Rakesh</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ramakrishnan</given-names>
            <surname>Srikant</surname>
          </string-name>
          , et al., '
          <article-title>Fast algorithms for mining association rules'</article-title>
          ,
          <source>in Proc. 20th int. conf. very large data bases, VLDB</source>
          , volume
          <volume>1215</volume>
          , pp.
          <fpage>487</fpage>
          -
          <lpage>499</lpage>
          , (
          <year>1994</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Bach</surname>
          </string-name>
          , Alexander Binder, Grégoire Montavon, Frederick Klauschen,
          <string-name>
            <surname>Klaus-Robert Müller</surname>
          </string-name>
          , and Wojciech Samek, '
          <article-title>On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation'</article-title>
          ,
          <source>PloS one</source>
          ,
          <volume>10</volume>
          (
          <issue>7</issue>
          ),
          <year>e0130140</year>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Bauckhage</surname>
          </string-name>
          , '
          <article-title>Numpy/scipy recipes for data science: kmedoids clustering', Researchgate</article-title>
          . Net, February, (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Leo</given-names>
            <surname>Breiman</surname>
          </string-name>
          , 'Random forests',
          <source>Machine learning</source>
          ,
          <volume>45</volume>
          (
          <issue>1</issue>
          ),
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          , (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Angela</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Ibm's watson gave unsafe recommendations for treating cancer</article-title>
          . https://cutt.ly/keHQDma,
          <year>2018</year>
          . Accessed:
          <fpage>2019</fpage>
          -11- 18.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Tianqi</given-names>
            <surname>Chen</surname>
          </string-name>
          and Carlos Guestrin, '
          <article-title>Xgboost: A scalable tree boosting system'</article-title>
          ,
          <source>in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining</source>
          , pp.
          <fpage>785</fpage>
          -
          <lpage>794</lpage>
          . ACM, (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>HA</given-names>
            <surname>Chipman</surname>
          </string-name>
          , EI George, and RE McCulloh, '
          <article-title>Making sense of a forest of trees'</article-title>
          ,
          <source>Computing Science and Statistics</source>
          ,
          <volume>84</volume>
          -
          <fpage>92</fpage>
          , (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Samantha</given-names>
            <surname>Cole</surname>
          </string-name>
          .
          <article-title>This trippy t-shirt makes you invisible to ai</article-title>
          . https: //cutt.ly/FeHQHAa,
          <year>2019</year>
          . Accessed:
          <fpage>2019</fpage>
          -11-18.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Houtao</surname>
            <given-names>Deng</given-names>
          </string-name>
          , '
          <article-title>Interpreting tree ensembles with intrees'</article-title>
          ,
          <source>International Journal of Data Science and Analytics</source>
          ,
          <volume>7</volume>
          (
          <issue>4</issue>
          ),
          <fpage>277</fpage>
          -
          <lpage>287</lpage>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Pedro</surname>
            <given-names>Domingos,</given-names>
          </string-name>
          '
          <article-title>Knowledge discovery via multiple models'</article-title>
          ,
          <source>Intelligent Data Analysis</source>
          ,
          <volume>2</volume>
          (
          <issue>1-4</issue>
          ),
          <fpage>187</fpage>
          -
          <lpage>202</lpage>
          , (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Filip</given-names>
            <surname>Karlo</surname>
          </string-name>
          <string-name>
            <surname>Došilovic´</surname>
          </string-name>
          , Mario Brcˇic´, and Nikica Hlupic´, '
          <article-title>Explainable artificial intelligence: A survey'</article-title>
          ,
          <source>in 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO)</source>
          , pp.
          <fpage>0210</fpage>
          -
          <lpage>0215</lpage>
          . IEEE, (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Mengnan</surname>
            <given-names>Du</given-names>
          </string-name>
          , Ninghao Liu, and Xia Hu, '
          <article-title>Techniques for interpretable machine learning'</article-title>
          , arXiv preprint arXiv:
          <year>1808</year>
          .
          <volume>00033</volume>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Dheeru</given-names>
            <surname>Dua</surname>
          </string-name>
          and
          <string-name>
            <given-names>Casey</given-names>
            <surname>Graff</surname>
          </string-name>
          .
          <source>UCI machine learning repository</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Niall</given-names>
            <surname>Firth</surname>
          </string-name>
          .
          <article-title>Apple card is being investigated over claims it gives women lower credit limits</article-title>
          . https://cutt.ly/oeGYCx5,
          <year>2019</year>
          . Accessed:
          <fpage>2019</fpage>
          -11-18.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Aaron</surname>
            <given-names>Fisher</given-names>
          </string-name>
          , Cynthia Rudin, and Francesca Dominici, '
          <article-title>All models are wrong but many are useful: Variable importance for black-box, proprietary, or misspecified prediction models, using model class reliance'</article-title>
          , arXiv preprint arXiv:
          <year>1801</year>
          .
          <volume>01489</volume>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Alex</surname>
            <given-names>Goldstein</given-names>
          </string-name>
          , Adam Kapelner, Justin Bleich, and Emil Pitkin, '
          <article-title>Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation'</article-title>
          ,
          <source>Journal of Computational and Graphical Statistics</source>
          ,
          <volume>24</volume>
          (
          <issue>1</issue>
          ),
          <fpage>44</fpage>
          -
          <lpage>65</lpage>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Gösta</given-names>
            <surname>Grahne</surname>
          </string-name>
          and Jianfei Zhu, '
          <article-title>Efficiently using prefix-trees in mining frequent itemsets</article-title>
          .',
          <string-name>
            <surname>in</surname>
            <given-names>FIMI</given-names>
          </string-name>
          , volume
          <volume>90</volume>
          , (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Th</surname>
          </string-name>
          <string-name>
            <surname>Gries</surname>
          </string-name>
          , '
          <article-title>On classification trees and random forests in corpus linguistics: Some words of caution and suggestions for improvement'</article-title>
          ,
          <source>Corpus Linguistics and Linguistic Theory</source>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Riccardo</surname>
            <given-names>Guidotti</given-names>
          </string-name>
          , Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi, '
          <article-title>A survey of methods for explaining black box models'</article-title>
          ,
          <source>ACM computing surveys (CSUR)</source>
          ,
          <volume>51</volume>
          (
          <issue>5</issue>
          ),
          <fpage>93</fpage>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>MAISSAE</given-names>
            <surname>HADDOUCHI and ABDELAZIZ</surname>
          </string-name>
          <string-name>
            <surname>BERRADO</surname>
          </string-name>
          , '
          <article-title>A survey of methods and tools used for interpreting random forest'</article-title>
          ,
          <source>in 2019 1st International Conference on Smart Systems and Data Science (ICSSD)</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . IEEE, (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Tameru</surname>
            <given-names>Hailesilassie</given-names>
          </string-name>
          , '
          <article-title>Rule extraction algorithm for deep neural networks: A review'</article-title>
          ,
          <source>arXiv preprint arXiv:1610.05267</source>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Jiawei</surname>
            <given-names>Han</given-names>
          </string-name>
          , Jian Pei, Yiwen Yin, and Runying Mao, '
          <article-title>Mining frequent patterns without candidate generation: A frequent-pattern tree approach', Data mining and knowledge discovery, 8(1</article-title>
          ),
          <fpage>53</fpage>
          -
          <lpage>87</lpage>
          , (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Satoshi</given-names>
            <surname>Hara</surname>
          </string-name>
          and Kohei Hayashi, '
          <article-title>Making tree ensembles interpretable: A bayesian model selection approach'</article-title>
          ,
          <source>in Proceedings of the TwentyFirst International Conference on Artificial Intelligence</source>
          and Statistics, eds.,
          <source>Amos Storkey and Fernando Perez-Cruz</source>
          , volume
          <volume>84</volume>
          <source>of Proceedings of Machine Learning Research</source>
          , pp.
          <fpage>77</fpage>
          -
          <lpage>85</lpage>
          ,
          <string-name>
            <surname>Playa</surname>
            <given-names>Blanca</given-names>
          </string-name>
          , Lanzarote, Canary Islands, (
          <volume>09</volume>
          -
          <fpage>11</fpage>
          Apr
          <year>2018</year>
          ). PMLR.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Trevor</surname>
            <given-names>Hastie</given-names>
          </string-name>
          , Robert Tibshirani, and Jerome Friedman,
          <article-title>The elements of statistical learning: data mining, inference, and prediction</article-title>
          , Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Leonard</given-names>
            <surname>Kaufman and Peter J Rousseeuw</surname>
          </string-name>
          , '
          <article-title>Clustering by means of medoids. statistical data analysis based on the l1 norm'</article-title>
          , Y. Dodge, Ed,
          <fpage>405</fpage>
          -
          <lpage>416</lpage>
          , (
          <year>1987</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Ron</surname>
            <given-names>Kohavi</given-names>
          </string-name>
          , '
          <article-title>Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid'</article-title>
          ,
          <source>in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining</source>
          , p. to appear, (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Zachary</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lipton</surname>
          </string-name>
          , '
          <article-title>The mythos of model interpretability', Commun</article-title>
          . ACM,
          <volume>61</volume>
          (
          <issue>10</issue>
          ),
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          , (
          <year>September 2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Scott</surname>
            <given-names>M Lundberg</given-names>
          </string-name>
          , Gabriel Erion, Hugh Chen, Alex DeGrave,
          <string-name>
            <surname>Jordan M Prutkin</surname>
          </string-name>
          , Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and
          <string-name>
            <surname>Su-In</surname>
            <given-names>Lee</given-names>
          </string-name>
          , '
          <article-title>Explainable ai for trees: From local explanations to global understanding'</article-title>
          , arXiv preprint arXiv:
          <year>1905</year>
          .
          <volume>04610</volume>
          , (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Scott</surname>
            <given-names>M Lundberg</given-names>
          </string-name>
          and
          <string-name>
            <surname>Su-In</surname>
            <given-names>Lee</given-names>
          </string-name>
          , '
          <article-title>A unified approach to interpreting model predictions'</article-title>
          ,
          <source>in Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          , eds., I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Garnett</surname>
          </string-name>
          ,
          <volume>4765</volume>
          -
          <fpage>4774</fpage>
          , Curran Associates, Inc., (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Laurens</surname>
            <given-names>van der Maaten and Geoffrey</given-names>
          </string-name>
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , '
          <article-title>Visualizing data using t-sne'</article-title>
          ,
          <source>Journal of machine learning research</source>
          ,
          <volume>9</volume>
          (Nov),
          <fpage>2579</fpage>
          -
          <lpage>2605</lpage>
          , (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Ioannis</surname>
            <given-names>Mollas</given-names>
          </string-name>
          , Nikolaos Bassiliades, and Grigorios Tsoumakas, 'Lionets:
          <article-title>Local interpretation of neural networks through penultimate layer decoding'</article-title>
          ,
          <source>in ECML PKDD 2019 AIMLAI XKDD Workshop</source>
          . Würzburg, Germany, (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Alexander</surname>
            <given-names>Moore</given-names>
          </string-name>
          , Vanessa Murdock, Yaxiong Cai, and Kristine Jones, '
          <article-title>Transparent tree ensembles'</article-title>
          ,
          <source>in The 41st International ACM SIGIR Conference on Research &amp; Development in Information Retrieval</source>
          , pp.
          <fpage>1241</fpage>
          -
          <lpage>1244</lpage>
          . ACM, (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and E. Duchesnay, '
          <article-title>Scikit-learn: Machine learning in Python'</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          ,
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          , (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <article-title>General Data Protection Regulation, 'Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data</article-title>
          ,
          <source>and repealing directive</source>
          <volume>95</volume>
          /46',
          <source>Official Journal of the European Union (OJ)</source>
          ,
          <volume>59</volume>
          (
          <fpage>1</fpage>
          -
          <lpage>88</lpage>
          ),
          <volume>294</volume>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>Marco</given-names>
            <surname>Tulio</surname>
          </string-name>
          <string-name>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sameer</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Carlos</given-names>
            <surname>Guestrin</surname>
          </string-name>
          , '
          <article-title>Why should i trust you?: Explaining the predictions of any classifier'</article-title>
          ,
          <source>in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</source>
          , pp.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          . ACM, (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>Marco</given-names>
            <surname>Tulio</surname>
          </string-name>
          <string-name>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sameer</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Carlos</given-names>
            <surname>Guestrin</surname>
          </string-name>
          , '
          <article-title>Anchors: High-Precision Model-Agnostic Explanations'</article-title>
          ,
          <source>in Thirty-Second AAAI Conference on Artificial Intelligence</source>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <surname>Wojciech</surname>
            <given-names>Samek</given-names>
          </string-name>
          , Thomas Wiegand, and
          <string-name>
            <surname>Klaus-Robert</surname>
            <given-names>Müller</given-names>
          </string-name>
          , '
          <article-title>Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models'</article-title>
          ,
          <source>arXiv preprint arXiv:1708.08296</source>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <surname>Gabriele</surname>
            <given-names>Tolomei</given-names>
          </string-name>
          , Fabrizio Silvestri,
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Haines</surname>
          </string-name>
          , and Mounia Lalmas, '
          <article-title>Interpretable predictions of tree-based ensembles via actionable feature tweaking'</article-title>
          ,
          <source>in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining</source>
          , pp.
          <fpage>465</fpage>
          -
          <lpage>474</lpage>
          . ACM, (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>James</given-names>
            <surname>Vincent</surname>
          </string-name>
          .
          <article-title>This colorful printed patch makes you pretty much invisible to ai</article-title>
          . https://cutt.ly/TeHQJHU,
          <year>2019</year>
          . Accessed:
          <fpage>2019</fpage>
          -11-18.
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Von Eye and Clifford C Clogg</surname>
          </string-name>
          ,
          <article-title>Categorical variables in developmental research: Methods of analysis</article-title>
          ,
          <source>Elsevier</source>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <surname>Xun</surname>
            <given-names>Zhao</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yanhong Wu</surname>
          </string-name>
          ,
          <article-title>Dik Lun Lee, and Weiwei Cui, 'iforest: Interpreting random forests via visual analytics'</article-title>
          ,
          <source>IEEE transactions on visualization and computer graphics</source>
          ,
          <volume>25</volume>
          (
          <issue>1</issue>
          ),
          <fpage>407</fpage>
          -
          <lpage>416</lpage>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>Yichen</given-names>
            <surname>Zhou</surname>
          </string-name>
          and Giles Hooker, '
          <article-title>Interpreting models via single tree approximation'</article-title>
          ,
          <source>arXiv preprint arXiv:1610.09036</source>
          , (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Ziegler and Inke R König</surname>
          </string-name>
          , '
          <article-title>Mining data with random forests: current options for real-world applications</article-title>
          ',
          <source>Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</source>
          ,
          <volume>4</volume>
          (
          <issue>1</issue>
          ),
          <fpage>55</fpage>
          -
          <lpage>63</lpage>
          , (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>