<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Guided-LIME: Structured Sampling based Hybrid Approach towards Explaining Blackbox Machine Learning Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Amit Sangroya</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mouli Rastogi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>C. Anantaram</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lovekesh Vig</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>TCS Innovation Labs</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tata Consultancy Services Ltd.</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Delhi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>India</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>70</volume>
      <fpage>123</fpage>
      <lpage>128</lpage>
      <abstract>
        <p>Many approaches to explain machine learning models and interpret its results have been proposed. These include shadow model approaches, like LIME and SHAP; model inspection approaches like Grad-CAM and data-based approaches like Formal Concept Analysis (FCA). Explanations of the decisions of blackbox ML models using any one of these approaches has their limitations as the underlying model is rather complex. Running explanation model for each sample is not cost-eficient. This motivates to design a hybrid approach for evaluating interpretability of blackbox ML models. One of the major limitations of widely-used LIME explanation framework is the sampling criteria that is employed in SP-LIME algorithm for generating a global explanation of the model. In this work, we investigate a hybrid approach based on LIME using FCA for structured sampling of instances. The approach combines the benefits of using a data-based approach (FCA) and proxy model-based approach (LIME). We evaluate these models on three real-world datasets: IRIS, Heart Disease and Adult Earning dataset. We evaluate our approach based on two parameters: 1) by measuring the prominent features in the explanations, and 2) proximity of the proxy model to the original blackbox ML model. We use calibration error metric in order to measure the closeness between blackbox ML model and proxy model.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Interpretability</kwd>
        <kwd>Explainability</kwd>
        <kwd>blackbox Models</kwd>
        <kwd>Deep Neural Network</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Formal Concept Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>erated. In the proxy model approach, the data corpus
needs to be created by perturbing the inputs of the
tarExplainability is an important aspect for an AI system get blackbox model and then an interpretable shadow
in order to increase the trustworthiness of its decision- model is built, while in the model inspection approach
making process. Many blackbox deep learning mod- the model architecture needs to be available for
inels are being developed and deployed for real-world spection to determine the activations, and in the
datause (an example is Google’s Diabetic Retinopathy Sys- based approach the training data needs to be available.
tem [1]). For such blackbox systems neither the model Local shadow models are interpretable models that
details nor its training dataset is made publicly avail- are used to explain individual predictions of blackbox
able. Explanations of the predictions made by such machine learning models. LIME (Local Interpretable
blackbox systems has been a great challenge. Model-agnostic Explanations [9]) is a well-known
ap</p>
      <p>Apart from post-hoc visualization techniques [2] (e.g., proach where shadow models are trained to
approxifeature dependency plots), feature importance tech- mate the predictions of the underlying blackbox model.
niques based on sensitivity analysis, there have been LIME focuses on training local shadow models to
exthree main approaches for explainability of AI systems: plain individual predictions, wherein a prediction of
i) Proxy or Shadow model approaches like LIME, SHAP interest   of the target blackbox deep learning model
ii) Model inspection approaches like Class Activation  is considered and its related input features   ’s are
maps (CAM), Grad-CAM, Smooth-Grad-CAM, etc. and perturbed within a neighborhood proximity to
meaiii) Data based approaches like Decision sets and For- sure the changes in predictions. Based on a reasonable
mal Concept Analysis [3, 4, 5, 6, 7]. Most of the re- sample of such perturbations a dataset is created and
search work on explainability has followed one of the a locally linear explainable model is constructed. To
above approaches [8]. However, each of these approachescover the decision-making space of the target model
have limitations in the way the explanations are gen- , Submodular Pick-LIME (SP-LIME) [9] generates the
global explanations by finding a set of points whose
explanations (generated by LIME) are varied in their
selected features and their dependence on those
features. SP-LIME proposes a sampling way based on
sub-modular picks to select instances such that the
interpretable features have higher importance.
ysis to explain the outcomes of a machine learning
model. We use LIME to interpret locally by using a
linear shadow model of the blackbox model, and use
Formal Concept Analysis to construct a concept
latFigure 1: Example output of LIME after adding noisy fea- tice of the training dataset, and then extract out
implitures in the Heart Disease dataset cation rules among the features. Based on the
implication rules we select relevant samples for the global
instances that we feed to SP-LIME. Therefore, rather</p>
      <p>Figure 1 shows a sample explanation output of LIME than using all instances (which is very costly for deep
for a binary classification problem on Heart Disease networks) or random sampling (which never
guarandataset. The prediction probabilities are shown in the tees optimal behavior), we use a FCA guided approach
left using diferent colors and prominent features that for selecting the instances. Therefore, we call our
frameare important for classification decision are shown in work as Guided-LIME.
the right. Important features are presented in a sorted We show that Guided-LIME results in better
covermanner based on their relevance. Note that some noisy age of the explanation space as compared to SP-LIME.
features are also injected in the dataset and therefore Our main contributions in this paper are as follows:
are present in the explanation (af1, af2, af3 and af4) • We propose a hybrid approach based on LIME
as well. In an ideal scenario, noisy features should and FCA for generating explanation by
exploitnot be the most relevant features for any ML model ing the structure in training data. We
demonand therefore should be least important from an ex- starte how FCA helps in structured sampling of
planation point of view. However, due to proxy model instances for generating global explanations.
inaccuracies and unreliability, sometimes these noisy
features can also come as the most relevant features • Using the structured sampling, we can choose
in explanations. In figure 2, we show an example sce- optimal instances both in terms of quantity and
nario that compares the calibration level of two proxy quality to generate explanations and interpret
models with a machine learning model. The x axis in the outcomes.Thereafter, using calibration error
this figure is the confidence of model and y axis is the metric we show that Guided-LIME is a closer
apaccuracy. Assuming that we have a blackbox machine proximate of the original blackbox ML model.
learning model and a proxy model that explains this
model, we argue that these models should be closer to
each other in terms of their calibration levels. 2. Background and Preliminaries</p>
      <p>Ideally, a proxy model which is used for explaining a
machine learning model should be as close as possible 2.1. Blackbox Model Outcome
to the original model Explanation</p>
      <p>Motivated by the design of an optimized
explanation model, we design a hybrid approach where we A blackbox is a model, whose internals are either
uncombine the shadow model approach proposed by LIME known to the observer or they are known but
uninterwith the data-based approach of Formal Concept Anal- pretable by humans. Given a blackbox model solving
a classification problem, the blackbox outcome
explanation problem consists of providing an interpretable
explanation for the outcome of the blackbox. In other
words, the interpretable model must return the
prediction together with an explanation about the
reasons for that prediction. In this context, local
interpretability refers to understanding only the reasons for
a specific decision. In this case, only the single
prediction/decision is interpretable. On the other hand, a
model may be completely interpretable when we are
able to understand the global prediction behavior
(different possible outcomes of various test predictions).</p>
      <sec id="sec-1-1">
        <title>2.2. LIME Approach for Global</title>
      </sec>
      <sec id="sec-1-2">
        <title>Explanations</title>
        <p>SP-LIME algorithm provides a global understanding
of the machine learning model by explaining a set of
individual instances. Ribeiro et al. [9] propose a
budget  that denotes the number of explanations to be
generated. Thereafter, they use Pick Step to select 
instances for the user to inspect. The aim of this is to
obtain non-redundant explanations that represent how
the model behaves globally. This is done by avoiding
instances with similar explanations. However, there
are some limitations of this algorithm [10]:</p>
        <sec id="sec-1-2-1">
          <title>Data points are sampled from a Gaussian distribu</title>
          <p>tion, ignoring the correlation between features. This
can lead to unlikely data points which can then be
used to learn local explanation models. In [11], au- 
thors study the stability of the explanations given by
LIME. They showed that the explanations of two very
close points varied greatly in a simulated setting. This
instability decreases the trust in the produced
explanations. The correct definition of the neighborhood
is also an unsolved problem when using LIME with
tabular data. Local surrogate models e.g. LIME is a
concrete and very promising implementation. But the
method is still in development phase and many
problems need to be solved before it can be safely applied.</p>
          <p>• The SP-LIME algorithm is based on a greedy ap- (1982) to study how objects can be hierarchically grouped
proach which does not guarantee an optimal so- together according to their common attributes. FCA
lution. deals with the formalization of concepts and has been
applied in many disciplines such as software
engineer• The algorithm runs the model on all instances ing, machine learning, knowledge discovery and
onto maximize the coverage function. tology construction during the last 20-25 years.
Informally, FCA studies how objects can be hierarchically
grouped together with their common attributes. A
formal context  = (,  ,  ) consists of two sets  and</p>
          <p>and a relation  between  and  . The elements
of  are called the objects and the elements of  are
called the attributes of the context. A formal concept
of a formal context  = (,  ,  ) is a pair (,  ). The
set of all formal concepts of a context K together with
the order relation  forms a complete lattice, called the
concept lattice of  .</p>
          <p>Figure 3 and 4 are examples from IRIS dataset (more
details in Section 4). In figure 3, we show a collection
of some objects and their attributes. For simplicity, we
choose only those objects where a particular attribute
is present or not. In real-world objects can have very
complex relationships with fuzzy values. Figure 4 is an
example concept lattice generated using this sample
data.</p>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>2.3. Formal Concept Analysis</title>
        <sec id="sec-1-3-1">
          <title>Formal Concept Analysis (FCA) is a data mining model that introduces the relation among attributes in a visual form. It was introduced in the early 80s by Wille</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Guided-LIME Framework:</title>
    </sec>
    <sec id="sec-3">
      <title>Guiding sampling in SP-LIME using FCA extracted concepts</title>
      <p>uses these instances to generate a set of local
explanation models and covers the overall decision-making
space. FCA provides a useful means for discovering
implicational dependencies in complex data [12, 13].</p>
      <p>In previous work, FCA-based mechanism has been
In [9] SP-LIME has been used to generate global ex- used as an approach to explain the outcome of a
blackplanations of a blackbox model. SP-LIME carries out box machine learning model through the construction
submodular picks from a set of explanations generated of lattice structure of the training data and then using
for a given set X of individual data instances. The SP- that lattice structure to explain the features of
predicLIME algorithm picks out explanations based on fea- tions made on test data [4]. In this proposed hybrid
ture importances across generated explanations. How- approach, we use the power of FCA to determine
imever, the data instances X from which explanations plication rules among features and using that to guide
are generated, are either the full dataset (called Full- the submodular picks for LIME in order to generate
LIME) or data points sampled from a Gaussian distri- local explanations. It provides the benefits of using
bution (SP-LIME random), and ignore the correlation data-based approach and proxy model based approach
between features in the dataset. Carrying out SP-LIME in a unified framework.
for the full dataset (Full-LIME) is very time consuming
especially when the dataset is large. Carrying out SP- 3.1. FCA-based selection of Instances
LIME random on the dataset may end up considering
data points that are implied by other data points in the The goal of our FCA-based instances selection is to
explanation space. Thus it is important to analyze the take advantage of the underlying structure of data to
full data set and choose only those points for SP-LIME build a concise and non-redundant set of instances.
such that the selected data points are representative of We hypothesize that most of the state-of-the-art
apthe data space. In this work, we propose a mechanism proaches do not consider this information (to the best
to determine the implication of features to guide the of our knowledge). We shortlist sample instances
usselection of the instances X from the training dataset. ing the following process:
We use Formal Concept Analysis (FCA) to analyze the
training data and discover feature implication rules. 1. We first binarize the training data in an ad-hoc
Using these feature implication rules, we pick appro- way. The binarization technique is applied to
priate instances to feed into SP-LIME. SP-LIME then discretize the continuous attribute values into
3.1.1. Generating Implication Rules from</p>
      <p>Training Data</p>
      <sec id="sec-3-1">
        <title>In order to find an optimal subset of samples, we gen</title>
        <p>erate implication rules from the given training data.
One of the challenge in generating implication rules is
that for a given domain and training data, the number
of rules can be very large. Therefore, we shortlist rules
based on their expressiveness e.g. we select the subset
of rules that have the highest coverage and lowest
redundancy.</p>
        <p>When we generate association rules from the dataset,
conclusion does not necessarily hold for all objects.
However, it is true for some stated percentage of all
objects covering the premise of rule. We sort the rules
using this percentage and select the top  rules. The
value of  is emperically calculated based on a given
domain.
3.1.2. Generating Lattice Structure and
selecting Instances
Using the lattice structure and implication rules, we
select instances for guiding SP-LIME. We identify all
the instances that follow the implication rules. For
each rule in the “implication rules list", we calculate
if a given sample “pass" or “fail" the given criteria i.e.
if a particular sample  follows implication rule  or
not. Finally, we produce a sorted list of the instances
that are deemed more likely to cover maximally and
are non-redundant as well.</p>
        <sec id="sec-3-1-1">
          <title>3.2. Guided-LIME for Global</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Explanations</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>We propose structured data sampling based approach</title>
        <p>Guided-LIME towards a hybrid framework extending
SP-LIME. SP-LIME normally has two methods for
sampling: random and full. In the random approach,
samonly of two values, 0 or 1. The binarization pro- ples are chosen randomly using a Gaussian
distribucess can be done in a more formal manner e.g. tion. On the other hand, full approach make use of all
chiMerge algorithm [14] which ensures that bi- the instances. We extend the LIME implementation to
narization method does not corrupt the gener- integrate another method “FCA" that takes the samples
ated lattice. In the scope of current work, we generated using lattice and implication rules.
keep this process simple enough. Thereafter, we Algorithm 1 explains the steps to perform structured
generate concept lattice using standard FCA-based sampling using training data and pass to SP-LIME for
approach. Each concept in the lattice represents generating global explanations. The input to
Guidedthe objects sharing some set of properties; and LIME is training data used to train the blackbox ML
each sub-concept in the lattice represents a sub- model. Data processing for finding the best samples
set of the objects. for Guided-LIME involves binarization of data.
There2. We use ConExp concept explorer tool to gener- after, a concept lattice is created based on FCA
apate lattice from the training data [15]. proach [4]. Using the concept lattice, we derive
implication rules. These rules are then used to select test
instances for Guided-LIME.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Algorithm 1 Sample selection algorithm using FCA</title>
        <p>for Guided-LIME
Require: Training dataset 
Ensure: Samples and their ranking
for a given Training dataset  consisting of data
samples  do</p>
        <p>Binarize numeric features
Generate concept Lattice using FCA
Find implication rules 
Generate samples and their ranking</p>
        <p>Select top  samples from each rule
end for
for all top  samples from each rule do</p>
        <p>Select samples using redundancy and coverage
criteria
end for</p>
        <p>As we mentioned previously, there are various
examples of using a single approach for explanation. This
can be done using any of the proposed techniques i.e.
proxy model, activation based or perturbation based
approach. However, we argue that none of these
approaches provides a holistic view in terms of outcome
explanation. Whereas, if we use a hybrid approach
such as a combination of proxy model and data-based
approach, it can provide a better explanation at a much
reduced cost.</p>
        <p>One of the question that arise in our hybrid approach
is whether the approach is still model agnostic such as
LIME. We argue that sampling step do not afect the
model agnosticity in any manner. It just adds a
sampling step which helps in choosing the samples in a
systematic manner.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental Setup</title>
        <sec id="sec-4-1-1">
          <title>We use the following publicly available datasets to eval</title>
          <p>uate the proposed framework: IRIS, Heart Disease and
Adult Earning dataset (See Table 1). IRIS dataset
contains 3 classes of 50 instances each, where each class
refers to a type of iris plant [16]. There are a total
of 150 samples with 5 attributes each: sepal length,
sepal width, petal length, petal width, class (Iris
Setosa, Iris Versicolor, Iris Virginica). Similarly, Heart
Disease dataset contains 14 attributes; 303 samples and
two classes [17]. Adult Earning dataset contains 48000
samples, 14 features across two classes. The machine
learning task for all three datasets is classification. We
use random forest blackbox machine learning model
in all our experiments.</p>
          <p>Features
sepal length, sepal width, petal length, petal width
age of patient, sex, chest pain type, resting blood pressure,
serum cholesterol, fasting blood sugar, resting ECG,
maximum heart rate achieved, exercise induced angina, ST
depression induced by exercise relative to rest, peak exercise
ST segment, number of major vessels colored by fluoroscopy,
Thal, Diagnosis of heart disease
age, workclass, fnlwgt, education, education-num,
marital status, occupation, relationship, race, sex, capital-gain,
capital-loss, hours-per-week, native-country
20
t
n
u
o15
C
e
r
u
t
eaF10
l
a
i
c
i
f
itr 5
A
#
0</p>
          <p>SP-Lime
Guided-LIME
AF-1_Imp-1
generated explanations. Ideally, the noisy features should
not occur among the important features. Therefore a
4.2. Results lower FDR suggests a better approach for explanation.
We present the discovery of number of noisy features
The goal of this experiment is to compare the proposed for each explanation averaged over 100 runs. Each
exGuided-LIME approach with random sampling of SP- planation consists of a feature importance vector that
LIME. In the scope of this work, we do not compare shows the importance of a particular feature. As we
the proposed hybrid approach with full sampling of see in Figures 6, 7, and 8, y axis is the number of noisy
SP-LIME. We perform a case study to find out which features and x axis is index of noisy feature. We
inapproach is better in selecting important features for a clude the cases where a noisy feature is at first or
secgiven blackbox model. As shown in Table 1, we main- ond place in the feature importance vector.
AF-1_Imptain ground truth oracle of important features as do- 1 represents artificial/noisy feature occurring at first
main knowledge [18, 19]. We train random forest clas- place in feature importance vector whereas
AF-1_Impsifier with default parameters of scikit-learn. In this 2 represents artificial/noisy feature occurring at
secexperiment, we add 25% artificially “noisy” features in ond place. Guided-LIME sampling approach is
consisthe training data. The value of these features is cho- tently better than basic SP-LIME.
sen randomly. In order to evaluate the efectiveness
of approach we use FDR (false discovery rate) metric
which is defined as the total number of noisy features
selected as important features in the explanation.</p>
          <p>We calculate the occurrence of noisy features in the</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.3. Validating Guided-LIME using calibration level</title>
        <p>ror provide a better estimate of reliability of ML
models [21, 22]. Moreover, the focus of our experiment is to
estimate the proximity of the shadow model w.r.t the
The objective of this experiment is to validate which original blackbox model. Calibration error values are
proxy model is a closer approximation to original black- therefore used to compare which model is the better
box model with respect to the prediction probabilities approximation of the original model. We hypothesize
of each model. In order to measure this closeness, var- that the proxy model with a ECE closer to the original
ious distance metric can be used e.g. KL divergence, blackbox ML model shall be a closer approximate.
cross entropy etc. We use the well established ECE We perform experiment in two settings: 1) with
orig(expected calibration error) and MCE (maximum cali- inal data 2) by adding noisy features in the data. As
bration error) as the underlying metric to detect the shown in Tables 2 and 3, in both scenarios, ECE and
calibration of both the models [20]. Calibration
erblackbox
MCE of Guided-LIME is closer to the original ML model SHAP need to run for every instance. This generates a
in comparison to the random SP-LIME. This justifies matrix of Shapley values which has one row per data
the benefit of structured sampling. We also run ex- instance and one column per feature. We can
interperiments with full samples of LIME. Although, this pret the entire model by analyzing the Shapley values
can be a better approximate of original model, but tak- in this matrix.
ing all the samples in the proxy model is not a practi- In CAM and Grad-CAM approaches, explanation is
cal and economic choice for real world huge datasets. provided by using a Saliency Mask (SM), i.e. a subset
Guided-LIME has a closer ECE to the original black- of the original record which is mainly responsible for
box model. Hence, Guided-LIME is a better choice as the prediction. For example, as salient mask we can
a proxy model to explain the original ML model. consider the part of an image or a sentence in a text.
A saliency image summarizes where a DNN looks into
an image for recognizing their predictions. Although
5. Related Work these solutions are not just limited/agnostic to
blackbox NN, but it requires specific architectural
modificaVarious approaches for explainability of blackbox mod- tions.
els have been proposed [8]. Broadly the existing
techniques can be classified into Model Explanation
approaches; outcome Explanation approaches; Model
Inspection approaches. There are also example of works
that focus on designing transparent design of models.</p>
        <p>In this work, we focus only on the outcome
explanation approaches. In the category of outcome
explanation, CAM, Grad-CAM, Smooth Grad-CAM++, SHAP,
DeepLIFT, LRP and LIME are the main approaches [23,
24, 25, 9, 26, 27, 28]. These methods provide a locally
interpretable shadow model which is able to explain
the prediction of the blackbox in understandable terms
for humans.</p>
        <p>Most popular shadow model approaches for
black</p>
        <p>Feature importance is well known approach to
explain blackbox models. More recently, instance-wise
feature selection methods are proposed to extract a
subset of features that are most informative for each
given example in deep learning network. [29]. In [30]
authors make use of a combination of neural networks
to identify prominent features that impact the model
accuracy. These approaches are based on subset
sampling through back-propagation.</p>
        <p>Ribeiro et. al. [9] present the Local Interpretable
Model-agnostic Explanations (LIME) approach which
does not depend on the type of data, nor on the type
of blackbox b to be opened. In other words, LIME can
return an understandable explanation for the
predicbox ML model explanations are Local Interpretable Modelt-ion obtained by any blackbox. The main intuition of
Agnostic Explanations (LIME) and SHAP. LIME can
explain the predictions of any classifier in “an
interLIME is that the explanation may be derived locally
from the records generated randomly in the
neighborpretable and faithful manner, by learning an interpretablehood of the record to be explained. As blackbox the
model locally around the prediction. In order to make
the predictions easily interpretable, LIME have two
design goals: Easy to interpret and Local fidelity : This
means that outcomes of shadow model are easily
interpretable and the explanation for individual predictions
are locally faithful, i.e. it correspond to how the model
behaves in the vicinity of the individual observation
being predicted.</p>
        <p>In contrast, SHAP (SHapley Additive exPlanations)
is distinctly built on the Shapley value. The Shapley
following classifiers are tested: decision trees, logistic
regression, nearest neighbors, SVM and random
forest.</p>
        <p>In [31], authors find the global importance
introduced by Local Interpretable Model-agnostic
Explanations (LIME) unreliable and present approach based on
global aggregations of local explanations with the
objective to provide insights in a model’s global decision
making process. This work reveal that the choice of
aggregation matters regarding the ability to gain
relivalue is the average of the marginal contributions across able and useful global insights on a blackbox model.
all permutations. The Shapley values consider all pos- We find this work as motivation to propose an hybrid
sible permutations, thus SHAP is a united approach approach where aggregations can be generated using
that provides global and local consistency and inter- knowledge of data through FCA-based system.
pretability. However, its cost is time — it has to com- In contrast to model explanation approaches such
pute all permutations in order to give the results. SHAP as LIME and SHAP [9, 26], our approach is
compleapproach has speed limitations as it has to compute all mentary which can guide these approaches for
selectpermutations globally to get local accuracy whereas ing the optimal instances for explanation. Extracting
LIME perturbs data around an individual prediction rules from neural networks is also a well studied
probto build a model. For generating a global explanation, lem [32]. These approaches depend on various factors
such as: Quality of the rules extracted; Algorithmic
complexity; Expressive power of the extracted rules;
Portability of the rule extraction technique etc. Our
approach also uses the knowledge of structure in data
however it is not dependent on the blackbox model.</p>
        <p>Moreover, formal concept analysis based data analysis
provides a solid theoretical basis.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusions and Future Work</title>
      <sec id="sec-5-1">
        <title>In this paper,we proposed a hybrid approach for eval</title>
        <p>uating interpretability of blackbox ML systems.
Although Guided-LIME do not guarantee an optimal
solution, yet we observe that a single approach like LIME
is not suficient to explain the AI system thoroughly.
There are limitations of deciding an optimal sampling
criteria in SP-Lime algorithm. Our approach combines
the benefits of using a data-based approach (FCA) and
proxy model based approach (LIME). Overall, our
approach is complementary to SP-LIME as we provided
a structured way of selecting right instances for global
explanations. Our results on real world datasets shows
that false discovery rate is much lower with
GuidedLIME in comparison to random SP-LIME. Moreover,
Guided-LIME has a closer ECE and MCE to the
original blackbox model. In future, we would like to
perform extensive experiments with diverse datasets and
complex deep learning models.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>