=Paper=
{{Paper
|id=Vol-2327/ExSS14
|storemode=property
|title=Surrogate Decision Tree Visualization
|pdfUrl=https://ceur-ws.org/Vol-2327/IUI19WS-ExSS2019-15.pdf
|volume=Vol-2327
|authors=Federica Di Castro,Enrico Bertini
|dblpUrl=https://dblp.org/rec/conf/iui/CastroB19
}}
==Surrogate Decision Tree Visualization==
<pdf width="1500px">https://ceur-ws.org/Vol-2327/IUI19WS-ExSS2019-15.pdf</pdf>
<pre>
                                    Surrogate Decision Tree Visualization
          Interpreting and Visualizing Black-Box Classification Models with Surrogate Decision Tree

                             Federica Di Castro                                                               Enrico Bertini
                     Università di Roma La Sapienza                                     New York University Tandon School of Engineering
                               Rome, Italy                                                           Brooklyn, New York
                 dicastro.1597561@studenti.uniroma1.it                                              enrico.bertini@nyu.edu


Figure 1: Understanding the behavior of a black-box Machine Learning model using the explanatory visual interface of our
proposed technique. Interacting with the user-interaction-panel (A) a user can choose a combination of model and dataset
that they desire to analyze. The built model is a surrogate decision tree whose structure can be analyzed in the tree Panel (B)
and the details of the leaves are featured in the Rule Panel (C). The user can interact with the tree, collapsing any node they
see fit and automatically updating the performance Overview (D).
ABSTRACT                                                                               analysis of the level of fidelity the surrogate decision tree can reach
With the growing interest towards the application of Machine                           with respect to the original model.
Learning techniques to many application domains, the need for
transparent and interpretable ML is getting stronger. Visualizations                   CCS CONCEPTS
methods can help model developers understand and refine ML mod-                        • Human-centered computing → Dendrograms; User interface
els by making the logic of a given model visible and interactive.                      toolkits; • Information systems → Data analytics.
In this paper we describe a visual analytics tool we developed to                      ACM Reference Format:
support developers and domain experts (with little to no expertise                     Federica Di Castro and Enrico Bertini. 2019. Surrogate Decision Tree Vi-
in ML) in understanding the logic of a ML model without having                         sualization: Interpreting and Visualizing Black-Box Classification Models
access to the internal structure of the model (i.e., a model-agnostic                  with Surrogate Decision Tree. In Proceedings of (IUI Workshops’19). ACM,
method). The method is based on the creation of a “surrogate” deci-                    New York, NY, USA, 5 pages.
sion tree which simulates the behavior of the black-box model of
interest and presents readable rules to the end-users. We evaluate                     Author Keywords
the effectiveness of the method with a preliminary user study and                      Machine Learning; Interpretability; Classification; Explanation;
                                                                                       Visual Analytic; Decision Tree; Dendrograms; User Interface
IUI Workshops’19, March 20, 2019, Los Angeles, USA
Copyright © 2019 for the individual papers by the papers’ authors. Copying permitted   ACM Classification Keywords
for private and academic purposes. This volume is published and copyrighted by its
editors.                                                                               Information interfaces and presentation: User Interfaces.
                                                                                       Software: Design Tools and Techniques.
INTRODUCTION                                                             model be transparent or that methods are used to enable its users to
In this paper we propose an interactive visualization based on de-       verify and understand its behavior. A clear example of the necessity
cision rules: treating every model as an unknown black-box we            of interpretability is presented in [9][1], where a interpretability
use a Decision Tree to replicate the prediction done by a model          method enabled a group of experts to identify a major fault in a
in a classification problem and we visualize it with the purpose         model used for medical predictions.
of using the rules of the decision tree to propose simple yet effec-        Ilknur Kaynar Kabul, a Senior Manager in the SAS Advanced
tive explanations towards the logic that the model adopts for its        Analytics division, describes in a post about interpretability desir-
classification.                                                          able characteristics of an interpretable model: Transparent - it can
   With the growing adoption of machine learning techniques,             explain how it works and/or why it gives certain predictions; Trust-
there is an increasing demand for research towards making machine        worthy - it can handle different scenarios in the real world without
learning models transparent and interpretable [7]; especially in         continuous control; Explainable - it can convey useful information
critical areas such as medicine [1], security and law.                   about its inner workings, for the patterns that it learns and for the
   In this paper we will follow two definitions of interpretability:     results that it gives. These are goals we took into consideration
(1) interpretability is the degree to which a human can understand       when building our Surrogate Tree Visualization.
the cause of a decision [4] and (2) interpretability is the degree to
which a human can consistently predict the model’s result [3]. These
definitions provide an intuition regarding the type of user who
                                                                         BUILDING SURROGATE TREES
may be in need of interpretability methods: data scientists or de-       In the following section we introduce our steps in creating our
velopers for model debugging and validation; end-users (typically        ’Surrogate Decision Tree Visualization’.
domain experts) for understanding and gaining trust in the model
in the process of decision making; and regulators and lawmakers
for making sure a given system is fair and transparent.
                                                                         Goals and Targets users
                                                                         In our paper we target as potential user of our tool not only model
RELATED WORK                                                             developers but also domain experts that are impacted by the ma-
                                                                         chine learning techniques (e.g., health care, finance, security, and
In this section we discuss related work that may share some of our
                                                                         policymakers). Model developers use interpretability with the goal
goals through different techniques; both on the approach in the gen-
                                                                         of model debugging: understanding a model with the final goal of
eration of rules to describe a model and the need for interpretability
                                                                         refining and improving its classification. Domain experts, who may
in Machine Learning.
                                                                         have little to no-knowledge in ML, have the goal to understand how
                                                                         the model behaves and what conclusions it draws when making its
Rule Generation and Visualization
                                                                         classification. In both cases, there is a need for profound and deep
Many attempts have been performed in summarizing a model                 understanding of what the model does.
though simple and effective rules: rule lists, rule tables, decision        Our tool aims to facilitate the answer to the following questions:
rules have been used in the community to describe ML models.
Very established in the community is also LIME (Local interpretable        Q1 What rules did the model learn?
model-agnostic explanations) which creates a local surrogate model
and computes local weights which one can use for interpretation            Q2 Which of these rules can be considered descriptive of the
of single instances [7]. Using LIME allows to have short, simple,             model?
human-friendly explanations that can help any user gain insights
about how the model computes a prediction of a specific instance.          Q3 What are the behaviors of the model that the surrogate is
The same authors later developed Anchors [8], an improved version             not able to simulate?
that computes local explanatory rules instead of weights.
   Methods exists also for global explanations of model. A current         Q4 What are the most important features used by the model?
method is to learn if-then rules that globally explain the behavior of
black-box models by first gathering conditions that are important
at instance level and then generalizing them into rules that are
meant to be descriptive of the overall behavior of the model [6].        Decision Trees
Another project that has much in common with this proposal is            A decision tree is a simple recursive structure that expresses a
RuleMatrix by Ming et al. [5], which derives surrogate rules from an     sequential process of classification. Every tree-based model splits
existing black-box model and visualizes them with a custom matrix        the data multiple times according to multiple threshold values of
visualization. Our solution is a follow-up of RuleMatrix with an         the features. At each node a splitting of the dataset occurs: going
additional innovation: the use of a decision tree structure to com-      forward the dataset keeps getting split into multiple subsets until
pute and visualize the rules. The tree structure, which is explicitly    each subset, if every leaf in the tree is pure, contains instances from
visualized, helps navigate the rules in a hierarchical fashion and as    one class only.
such makes it easier to spot rules of interest.                             The reason why we chose to use a Decision Tree as the surrogate
                                                                         model is the simplicity of its rules and the natural tree-based visual
Interpretable Machine Learning                                           representation one can build with it. Starting from the root node
Understanding a computer-induced model is often a prerequisite for       one can check the next nodes and trace the path down to the leaf
users to trust the model’s predictions and follow the recommendations    to form a rule.
associated with those predictions [2]. In order for a user to trust a       The following formula describes the relationship between out-
model in the process of decision making, it is necessary that the        come ŷ and the features x:
                                                                          Table 1: Mean Time and Complexity requirements to reach
                                         N
                                         Õ                                maximum fidelity with the available datasets.
                      yˆi = fˆ(x i ) =         c j I {x i ∈ R j }
                                         j=1
                                                                                    Dataset           Mean Time (s)      Mean Nodes
   Each instance x i reaches exactly one leaf node which can be
described as a subset R j of the dataset. The identity function I {.}               Iris              0.131              13
has the purpose of representing the combination of rules at each of                 Fico              22.841             190
the internal nodes.                                                                 Housing           13.818             200
   It’s important to clarify that we use decision trees as a way to                 Demographic       25.262             319
simulate a black-box model. To achieve this purpose we do not train                 Car               19.141             238
the tree using the original data but rather use the labels obtained                 Cancer            1.128              32
from the original model as training data for the decision tree. This,
in turn, allows us to build a tree whose rules simulate the original
                                                                          Table 2: Fidelity, time and Complexity for the FICO Dataset
model.
                                                                          with different models and values of max Depths
Feature Importance. The overall importance of a feature in a deci-
sion tree can be computed by going through all the splits for which               Model     maxDepth      fidelity   time (s)   nNodes
the feature was used and adding up how much it has improved
the predictions in the child nodes compared to the parent node                    KNN       6             0.940      13.01      111
(e.g., measured as decrease of Gini index). The sum of all the values             KNN       8             0.976      27.71      235
of importance is scaled to 100, so that the interpretation for each               KNN       10            0.996      39.62      337
feature importance is the percentage of the overall importance.                   KNN       12            1.000      41.57      353
                                                                                  LDA       6             0.954      13.08      113
Rule Presence. In addition to feature importance we compute a                     LDA       8             0.988      26.22      221
second metric that we call rule presence. The purpose of this metric              LDA       10            0.998      31.59      259
is to give more weight to features that appear more often in the                  LDA       13            1.000      31.95      271
tree (in multiple splits). The metric is computed as follows:                     MLPC      6             0.942      12.02      101
                                                                                  MLPC      8             0.977      23.99      203
                       Number o f Nodes involvinд f eaturei                       MLPC      10            0.993      34.49      291
        RP f eati =                                         .
                          Number o f Internal Nodes                               MLPC      12            0.999      37.73      319
                                                                                  MLPC      14            1.000      39.59      325
Disadvantages
Decision trees have a number of disadvantages as model interpre-
tation tools. First, the number of nodes increases exponentially
with depth, therefore the more terminal nodes, the more difficult         correctly. The fidelity of a single node computes the same measure
it becomes to understand the decision rules of a tree. Even with a        restricting it to the samples that fall into the node.
moderate number of features it is not unlikely to have trees with            Time performance is calculated as the combination of time nec-
hundreds of nodes and links. Second, the same feature may occur           essary to perform the fitting of the model and the fitting of the
multiple times at different levels in the tree; making it hard for the    surrogate but also the time necessary to explore the tree and gather
a viewer to understand how a feature is used by the model across          the necessary information for the visualization.
all rules it generates.                                                      Complexity is measured as the number of nodes in the tree: in
    In our solution, we provide two techniques to mitigate this issue:    fact, this impacts not only the time needed to generate the tree data,
(1) We enable the user to interactively contract and expand the           but also the amount of information displayed in the visualization. As
tree at different levels of granularity; (2) We provide a coordinated     we will see in the following discussion, there is a trade-off between
supplementary view which visualizes the rules generated by the            complexity and fidelity.
tree in a tabular format. As explained in the section explaining how         We tested our tool with 6 datasets: Iris Flower, Cancer, Car Safety,
our visualization works, our design aligns rules so that a viewer         Housing Price, Demographic Life Quality, FICO Risk. We also used
can see how a given feature is used across the whole set of rules.        5 distinct type of models to simulate as black-boxes: KNN, Linear
                                                                          Discriminant Analysis, MultiLayer Perceptron, AdaBoost, Support
Performance Evaluation                                                    Vector Machine. Table 1 and table 2 show the results of our tests.
                                                                             As we can see from Table 1, it is possible with reasonable com-
There are three main aspects we take into account when evaluating
                                                                          plexity and time requirements to reach maximum fidelity with any
the performance of a surrogate model:
                                                                          combination of model and dataset.
    • Fidelity. The accuracy with which the tree can simulate the            What is more interesting is that for a user that is willing to
      original black-box model;                                           trade-off fidelity in order to obtain a less complex visualization it is
    • Speed. The time needed to generate the tree as well as the          possible to do so. In Table 2, we analyze the FICO Dataset (one of
      time performance of the interactive functions (to explore the       the most complex datasets used in our evaluation). As one can see,
      tree interactively);                                                for almost every model it is possible to save about 20-30 seconds
    • Complexity. The overall complexity of the surrogate tree,           and a complexity of 160-240 nodes by reducing fidelity only of
      measured as the number of nodes in the tree.                        4%-6%. Therefore, depending on the need of the user there may be
   The overall fidelity of the tree is computed as the ratio of samples   plenty of room for compromising on fidelity without reaching too
for which the tree predicts the outcome of the simulated model            low values.
                                                                          Tree Panel
                                                                          The tree panel shows the tree as a node-link diagram. Each node is
                                                                          depicted by a pie chart which shows number of instances with size
                                                                          and proportion of labels with the segments of the pie chart.
                                                                             Each edge has a width proportional to the number of instances
                                                                          that follow the path. Hovering a node one can see its details in a
                                                                          tooltip which includes: number of samples, fidelity and the rule
                                                                          that characterizes the node. When a node is clicked it collapses and
                                                                          becomes a leaf node generating an update of our surrogate model.
                                                                          As a consequence, the performance numbers, as well as the panel
                                                                          that shows them, updates to reflect the change.

                                                                          Rule Panel
                                                                          The rule panel is structured as a table: each column is a feature
Figure 2: The pipeline the tool follows once a dataset and a
                                                                          of the dataset and each row is a leaf in the tree and, as such, a
model have been chosen: there is a data pre-processing step
                                                                          rule. Reading the table by row it is possible to identify the rules
(A) followed by the fitting of the model that needs to be sim-
                                                                          that describe the model in its entirety thus providing an answer to
ulated (B). Once the model is fitted, its prediction is then
                                                                          question Q1.
given as input to the surrogate model block (C) and to the
                                                                              The predicates of the rules are depicted as rectangular ranges,
Fidelity Evaluation Block (D 1 ). Once the surrogate has been
                                                                          which show, for a given rule-feature combination, what the con-
fitted, its prediction reached the Fidelity Evaluation Block
                                                                          strains of the rule are, that is, the range depicts constraints of the
and is compared with the output of the Model, which it is
                                                                          type lb ≤ f eature ≤ ub.
meant to replicate. Simultaneously, the tree and its data are
                                                                              For each row/leaf we also provide details of the distribution of
visualized in block (D 2 ).
                                                                          instances that satisfy the rule, including information about rule car-
                                                                          dinality and distribution across the existing classes, the prediction
                                                                          of the rule, and the fidelity. These details are crucial to judge the
                                                                          fidelity and relevance of the rules: questions Q2 and Q3.
VISUALIZING SURROGATE TREES                                                   Finally, next to every feature we provide details about feature
To gather all necessary data for the visualization, the tool follows a    relevance using the two metrics feature importance and rule pres-
specific pipeline (see Fig. 2) which has the purpose of obtaining both    ence which characterize each feature in terms of how relevant it
information about the performance of the surrogate in replicating         is (according to relevance computed from the decision tree) and
the model (Fidelity Evaluation) and the data contained in each node       in how many rules it is used. These details provide the user with
of the tree, necessary for the Visual Analytics step. The visualization   information necessary to understand the role of each feature, which
itself is made of 4 components (see Fig. 1). A user interaction panel     covers the needs expressed in question Q4.
(A), a tree panel (B), a rule panel (C) and performance overview (D).
                                                                          USABILITY STUDY
                                                                          In order to provide preliminary evidence of the utility and usability
User interaction panel
                                                                          of our proposed solution, we conduced a small user study. The
The interaction panel allows the user to select the combination of        purpose of the study was to verify whether people would be able to
model and dataset to use as well as parameters such the preferred         understand the interface and solve the most basic interpretability
value for Max Depth for the decision tree. Being able to choose           problems it proposes to solve.
in advance the max depth is helpful for those users who want to
guarantee the maximum fidelity possible, or to those users who            Which Users
want to start with a small tree. Once the surrogate is generated, the
                                                                          We selected 5 users, ranging from 25 to 35 years of age, with ex-
user can use the fidelity slider to automatically cut the tree at the
                                                                          tended experience in Machine Learning: Master’s degree students
level that provides the desired fidelity.
                                                                          in Data Science, PhD students in Computer Engineering and Data
                                                                          Visualization. We chose expert users to target the population who
Performance Overview                                                      would be interested in using our method as a model verification
The two bars in the top-right have the purpose of keeping track           and debugging method.
of updates performed by the user in case of interaction with the
tree. Surrogate Overall Fidelity provides the measure of the starting     Tasks and Results
fidelity of the surrogate: this is the maximum value that can be          We divided the study in two phases: a training phase in which users
reached, in fact any interaction with the tree itself that does not       were shown the IRIS flower dataset and asked a series of question,
involve an increment of max-depth can only negatively affect the          and a testing phase in which users where asked only three questions
overall fidelity. Tree Dimension shows the current proportion of the      to test the usability of the system.
tree that is shown with respect to the original dimension. When              The questions were aimed at testing whether the users were able
the tree is loaded the first time tree dimension is 100%, but with any    to formally express the rules, by reading the table, and to evaluate
toggling of the tree it decreases to represent the current proportion.    their fidelity.
The combination of the two bars is meant to help users find a good           The training phase was meant to introduce the interface to the
balance between tree size (i.e., complexity) and fidelity.                participants to make sure they were ready to use it in the test
phase. An important goal of this first phase was to make sure that                            07269
every component of the tool was readable and that the participants                        [5] Yao Ming, Huamin Qu, and Enrico Bertini. 2018. RuleMatrix: Visualizing and Un-
                                                                                              derstanding Classifiers with Rules. CoRR abs/1807.06228 (2018). arXiv:1807.06228
understood the need to find a balance between complexity and                                  http://arxiv.org/abs/1807.06228
fidelity. The participants were asked specific questions on how to                        [6] Nikaash Puri, Piyush Gupta, Pratiksha Agarwal, Sukriti Verma, and Balaji Krish-
                                                                                              namurthy. 2017. MAGIX: Model Agnostic Globally Interpretable Explanations.
read the rules in the rule panel and their characteristics. All of them                       CoRR abs/1706.07160 (2017). arXiv:1706.07160 http://arxiv.org/abs/1706.07160
provided correct answers to these questions, confirming they had                          [7] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I
understood how to use it.                                                                     Trust You?": Explaining the Predictions of Any Classifier. CoRR abs/1602.04938
                                                                                              (2016). arXiv:1602.04938 http://arxiv.org/abs/1602.04938
   In the testing phase we gave our participants time to load the                         [8] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-
FICO dataset (a much more complex one than the previously shown                               Precision Model-Agnostic Explanations. In AAAI Conference on Artificial Intelli-
one) and to interact with the tool for the time they felt necessary to                        gence (AAAI).
                                                                                          [9] G. Richards, V.J. Rayward-Smith, P.H. Sönksen, S. Carey, and C. Weng. 2001.
provide a description the model. Two of the subjects spent almost                             Data mining for indicators of early mortality in a database of clinical records.
an hour observing and analyzing the dataset and they were able                                Artificial Intelligence in Medicine 22, 3 (2001), 215 – 231. https://doi.org/10.1016/
                                                                                              S0933-3657(00)00110-X
to draw many significant conclusions regarding the relationships
between some of the features and the classification performed from
the datasets. One user in particular was able to spot many outliers
observing the pie chart that summarizes each leaf and the features
involved in the corresponding rules.

CONCLUSIONS AND FUTURE WORKS
We presented a description and a preliminary evaluation of our sur-
rogate tree visualization. This visualization can be used by domain
experts as well as data scientists to gain an understanding of model
behavior, including what the model does right and where issues
may arise. Application of such method includes healthcare pro-
fessionals who want to understanding ML models for diagnostics
and other medical outcomes; finance experts who use models for
prediction or credit scoring; or even in security to understand how
a model detects fraudulent transactions or other illegal activities.
   The work presented here is very preliminary. We indent to extend
the tool further in many ways. We want to provide a tighter link
between the tree and the table and find a way to make it scale
to a higher number of features. We also need to test the methods
with a much larger number of data sets and models. In particular,
we need to verify the extent to which decision trees can simulate
more complex models and how fidelity changes when some of the
parameters we use change (e.g., tree complexity).
   The most important aspect that we will be working on in the fu-
ture is a wider testing of our tool. We need to test the methods more
formally to better understand the extent to which it helps answer
the questions we outlines in the introduction. More importantly,
we need to better understand how interpretability is affected by vi-
sual representation. For this purpose we plan to develop controlled
experiments to compare our alternative visual representations. In
particular, we deem it important o better understand how the tree
representation and the tabular representation of the rules compare
and how they score in comparison to a simple textual list of rules.

REFERENCES
 [1] Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie
     Elhadad. 2015. Intelligible Models for HealthCare: Predicting Pneumonia Risk
     and Hospital 30-day Readmission. In Proceedings of the 21th ACM SIGKDD Inter-
     national Conference on Knowledge Discovery and Data Mining (KDD ’15). ACM,
     New York, NY, USA, 1721–1730. https://doi.org/10.1145/2783258.2788613
 [2] Alex A. Freitas. 2014. Comprehensible Classification Models: A Position Paper.
     SIGKDD Explor. Newsl. 15, 1 (March 2014), 1–10. https://doi.org/10.1145/2594473.
     2594475
 [3] Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. 2016.                      Exam-
     ples are not enough, learn to criticize! Criticism for Interpretabil-
     ity.   In Advances in Neural Information Processing Systems 29, D. D.
     Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.).
     Curran Associates, Inc., 2280–2288.                  http://papers.nips.cc/paper/
     6300-examples-are-not-enough-learn-to-criticize-criticism-for-interpretability.
     pdf
 [4] Tim Miller. 2017. Explanation in Artificial Intelligence: Insights from the Social
     Sciences. CoRR abs/1706.07269 (2017). arXiv:1706.07269 http://arxiv.org/abs/1706.

</pre>