An Axiomatic Approach to Linear Explanations in Data
                        Classification
              Jakub Sliwinski                               Martin Strobel                             Yair Zick
                ETH Zurich                             National Univ. of Singapore            National Univ. of Singapore
             Zurich, Switzerland                                Singapore                              Singapore
             jakvbs@gmail.com                          mstrobel@comp.nus.edu.sg                   dcsyaz@nus.edu.sg


ABSTRACT                                                               stakeholders to risks. These risks could include incorrect deci-
In this work, we focus on local explanations for data analytics;       sions (e.g. Alice’s application was wrongly rejected due to a
in other words: given a datapoint ~x, how important was the            system bug), information leaks (e.g. the algorithm was inad-
i-th feature in determining the outcome for ~x? The literature         vertently given information about Alice that it should not have
has seen a recent emergence of various analytical answers to           seen), or discrimination (e.g. the algorithm is biased against
this question. We argue for a linear influence measure ex-             female applicants). Indeed, government bodies and regulatory
planation: given a datapoint ~x, assign a value φi (~x) to every       authorities have recently begun calling for algorithmic trans-
feature i, which roughly corresponds to feature i’s importance         parency: providing human-interpretable explanations of the
in determining the outcome for ~x. We present a family of              underlying reasoning behind large-scale decision-making algo-
measures called MIM (monotone influence measures), that are            rithms. Our work represents a first formal axiomatic analysis
uniquely derived from a set of axioms: desirable properties            of automatically generated explanations of black-box classi-
that any reasonable influence measure should satisfy. Depart-          fiers.
ing from prior work on influence measures, we assume no
knowledge — or access — to the underlying classifier labeling
                                                                       Our Proposal
the dataset. In other words, our influence measures are based
on the dataset alone and do not make any queries to the classi-        We propose utilizing simple mathematical frameworks for
fier. We compare MIM to other linear explanation models in             an explanation via influence measures: these are functions
the literature and discuss their underlying assumptions, merits,       that, given a dataset, assign a value to every feature; this
and limitations.                                                       value should roughly correspond to the feature’s importance in
                                                                       affecting the classification outcome for individual data points.
ACM Classification Keywords
                                                                       Slightly more formally, we are given a dataset X containing n
                                                                       dimensional vectors, whose data points are labeled by a binary
H.5.2 Information Interfaces and Presentation: User Inter-
                                                                       classifier c, such that c(~y) = ±1 for all ~y ∈ X ; now, given a
faces—Theory and methods
                                                                       point of interest ~x ∈ X , we wish to identify the features in ~x
                                                                       that are ‘responsible’ for it being labeled the way it was. This
Author Keywords
                                                                       is done via a mapping φ whose input is the dataset X , its
Influence Measures, Explainable ML, Algorithmic
                                                                       labels (given by c), and the point of interest ~x; its output is a
Transparency
                                                                       vector φ (~x) ∈ Rn , where φi (~x) corresponds to the influence of
                                                                       feature i on the label of ~x. Intuitively, a large positive value
INTRODUCTION
                                                                       of φi (~x) should mean that feature i was highly important in
An individual is denied a bank loan; knowing that they are in
                                                                       determining the label of ~x; a large negative value for φi (~x)
good financial standing, they demand that the bank explain
                                                                       should mean that despite the value of i at~x,~x was assigned this
its decision. However, the bank uses an ML algorithm that
                                                                       label. This approach carries several important benefits. First of
automatically rejected the loan application. How should the
                                                                       all, it is completely generic, requiring no assumptions on the
bank explain its decision? This example is more than anec-
                                                                       underlying classification model; secondly, linear explanation
dotal; recent years have seen the widespread implementation
                                                                       models are simple and straightforward, even for a layperson
of data-driven algorithms making decisions in increasingly
                                                                       to understand (e.g. ‘Alice was denied her loan because of
high-stakes domains, such as healthcare, transportation, and
                                                                       the high importance the algorithm placed on her low monthly
public safety. Using novel ML techniques, algorithms are able
                                                                       income, and despite her never having to file for bankruptcy’).
to process massive amounts of data and make highly accu-
                                                                       The appeal of linear explanations has been recognized by the
rate predictions; however, their inherent complexity makes it
                                                                       research community; recent years have seen a moderate boom
increasingly difficult for humans to understand why certain
                                                                       of papers proposing linear explanations in data-driven domains
decisions were made. By obfuscating the underlying decision-
                                                                       (see Section 1.2). However, this poses a new problem for end
making processes, such algorithms potentially expose human
                                                                       users that wish to apply these methodologies: which linear
                                                                       explanation is the ‘right’ one to choose? In other words,
©2018. Copyright for the individual papers remains with the authors.     . . . which linear explanations are guaranteed to satisfy
Copying permitted for private and academic purposes.
ExSS ’18, March 11, Tokyo, Japan.                                        certain desirable properties?
We argue for an axiomatization of influence measures in clas-         [6] show, the measure proposed by [4] outputs undesirable
sification domains. The axiomatic approach is common in the           values (e.g. zero influence) in many real instances. [1] propose
economics literature: first one reasons about simple, reason-         an empirical influence measure that relies on a potential-like
able properties (axioms) which should be satisfied by any func-       approach. However, as we show, their methodology fails to
tion (say, methods for dividing revenue amongst collaborators,        satisfy reasonable properties even on simple datasets. Other
or agreeing on an election winner given voters’ preferences);         approaches in the literature either rely on black-box access to
next, one should prove that there exists a unique function sat-       the classifier [6, 8], or assume domain knowledge (e.g. that
isfying these simple mathematical properties. The axiomatic           the classifier is a neural network whose layers are observ-
approach allows one to rigorously reason about the types of           able) [11]. Another notable axiomatic treatment of influence
influence measures one should use in a given setting: if the          in data-driven domains appears in [6]; in this work, it is shown
axioms set forth make sense in this setting, there is but one         that a Shapley value based approach is the only way influence
method of assigning influence in the given domain. It is, in          can be measured when one assumes counterfactual access to
some sense, an explanation of an explanation method, a prov-          the black-box classifier. This result is confirmed in [7].
able guarantee that the method is sound; in fact, uniqueness
implies that it is the only sound method one can reasonably           THE FORMAL MODEL
use in a domain.                                                      A dataset X = h~x1 , . . . ,~xm i is given as a list of vectors in Rn
                                                                      (each dimension i ∈ [n] is a feature), where every ~x j ∈ X
In a recent line of work, we identify specific properties that        has a unique label c j ∈ {−1, 1}; given a vector ~x ∈ X , we
any reasonable influence measure should satisfy (Section 3);          often refer to the label of ~x as c(~x). For example, X can
using these axioms, we mathematically derive a class of influ-        be a dataset of bank loan applications, with ~x describing the
ence measures, dubbed monotone influence measures (MIM),              applicant profile (age, gender, credit score etc.), and c(~x) being
which uniquely satisfy these axioms (Section 4). Unlike most          a binary decision (accepted/rejected). An influence measure
existing influence measures in the literature, we assume nei-         is simply a function φ whose input is a dataset X , the labels
ther knowledge of the underlying decision-making algorithm,           of the vectors in X denoted by c, and a specific point ~x ∈ X ;
nor of its behavior on points outside the dataset. Indeed, some       its output is a value φi (~x, X , c) ∈ R; we often omit the inputs
methodologies (see Related Work in Section 1.2) are heavily           X and c when they are clear from context. The value φi (~x)
reliant on having access to counterfactual information: what          should roughly correspond to the importance of the i-th feature
would the classifier have done if some features were changed?         in determining the outcome c(~x) for ~x.
This is a rather strong assumption, as it assumes not only ac-
cess to the classifier but also the potential ability to use it on    AXIOMS FOR EMPIRICAL INFLUENCE MEASUREMENT
nonsensical data points1 . By making no such assumptions,             We are now ready to define our axioms; these are simple prop-
we are able to provide a far more general methodology for             erties that we believe any reasonable influence measure should
measuring influence; indeed, many of the tools described in           satisfy. We take a geometric interpretation of the dataset X ;
Section 1.2 will simply not be usable when queries to the clas-       thus, several of our axioms are phrased in terms of geometric
sifier are not available, or when the underlying classification       operations on X .
algorithm is not known. Finally, grounding the measure in the
dataset ensures the distribution of data is accounted for, rather     1. Shift Invariance: let X +~b be the dataset resulting from
than explaining the classification in terms of arbitrarily chosen     adding the vector ~b ∈ Rn to every vector in X (not changing
data points. The points can be very unlikely or impossible to         the labels). An influence measure φ is said to be shift invariant
occur in practice, and using them can demonstrate a behavior          if for any vector ~b ∈ Rn , any i ∈ [n] and any ~x ∈ X ,
the algorithm will never exhibit in its actual domain. Despite
their rather limiting conceptual framework, our influence mea-                         φi (~x, X ) = φi (~x +~b, X +~b).
sures do surprisingly well on a sparse image dataset. We show
                                                                      In other words, shifting the entire dataset by some vector ~b
that the outputs of our influence measure are comparable to
                                                                      should not affect feature importance.
those of other measures, and provide interpretable results.
                                                                      2. Rotation and Reflection Faithfulness: let A be a rotation
Related Work                                                          (or reflection) matrix, i.e. an n × n matrix with det(A) ∈ ±1;
Axiomatic approaches for influence measurement are com-               let AX be the dataset resulting from taking every point ~x in
mon in economic domains. Of particular note are axiomatic             X and replacing it with A~x. An influence measure φ is said to
approaches in cooperative game theory [9, 12, 3].                     be faithful to rotation and reflection if for any rotation matrix
                                                                      A, and any point ~x ∈ X , we have Aφ (~x, X ) = φ (A~x, AX ).
The first axiomatic characterization of an influence measure          In other words, rotating or reflecting the entire dataset results
for datasets is provided in [4]; however, they interpret influ-       in the influence vector rotating in the same manner.
ence as a global measure (e.g., what is the overall importance
of gender for decision making). Moreover, one of the axioms           3. Continuity: an influence measure φ is said to be continuous
proposed in [4] turned out to be too strong, severely limiting        if it is a continuous function of X .
the explanation power of the resulting measure. Indeed, as            4. Flip Invariance: let −c be the labeling resulting from re-
1 For example if the dataset consists of medical records of men and   placing every label c(~x) with −c(~x). An influence measure is
women, the classifier might need to answer how it would handle        flip invariant if for every point ~x ∈ X and every i ∈ [n] we
pregnant men                                                          have φi (~x, X , c) = φi (~x, X , −c).
5. Monotonicity: a point ~y ∈ Rn is said to strengthen the             EXISTING MEASURES
influence of feature i with respect to ~x ∈ X if c(~x) = c(~y)         In this section, we provide an overview of some existing
and yi > xi ; similarly, a point ~y ∈ Rn is said to weaken the         methodologies for measuring influence in data domains and
influence of i with respect to ~x ∈ X if yi > xi and c(~x) , c(~y).    compare them to MIM.
An influence measure φ is said to be monotonic, if for any
data set X , any feature i and any data point ~x ∈ X we have           Parzen
φi (~x, X ) ≤ φi (~x, X ∪ {~y}) whenever ~y strengthens i w.r.t. ~x,   The main idea behind the approach followed by [1] is to ap-
and φi (~x, X ) ≥ φi (~x, X ∪ {~y}) whenever ~y weakens i w.r.t. ~x.   proximate the labeled dataset with a potential function and
                                                                       then use the derivative of this function to locally assign influ-
                                                                       ence to features. Parzen satisfies Axioms 1 to 4. However, it is
6. Random Labels: an influence measure φ is said to satisfy            neither monotonic nor can it efficiently detect random labels.
the random labels axiom, if for any dataset X , if all labels
are assigned i.i.d. uniformly at random (i.e. for all ~x ∈ X ,         LIME
Pr[c(~x) = 1] = Pr[c(~x) = −1]); we call this label distribution       The measure in [8] is based on the idea of finding a best local
U . Then, for all ~x ∈ X and all i we have                             fit for the classifier in a region around ~x. At its core, LIME fits
                 Ec∼U [φi (~x, X , c) | c(~x) = 1] =                   a classifier by minimizing the mean-squared error, whereas
               Ec∼U [φi (~x, X , c) | c(~x) = −1] =0                   MIM maximizes cosine similarity.

In other words, when we fix the label of ~x and randomize all          The Counterfactual Influence Measure
other labels, the expected influence of all features is 0.             [4] initiated the axiomatic treatment of influence in data analy-
                                                                       sis; they propose a counterfactual aggregate influence measure
Let us briefly discuss the latter two axioms. Monotonicity is          for black-box data domains. Unlike other measures in this
key in defining what influence means: intuitively, if one is           section, [4] do not measure local feature influence; rather, they
to argue that Alice’s old age caused her loan rejection, then          measures the overall influence of a feature for a given dataset.
finding older persons whose loans were similarly rejected              The measure proposed by [4] does the following: when mea-
should strengthen this argument; however, finding older per-           suring the influence of the i-th feature; for every point ~x ∈ X ,
sons whose loans were not rejected should weaken the argu-             it counts the number of points in X who differ from ~x by only
ment. The Random Labels axiom states that when labels are              the i-th feature, and in their classification outcome. Given its
randomly generated, no feature should have any influence in            rather restrictive notion of influence, this methodology only
expectation; any influence measure that fails this test is in-         measures non-zero influence in very specific types of datasets:
herently biased towards assigning influence to some features,          it assigns zero influence to all features in datasets that do not
even when labels are completely unrelated to the data.                 contain data points that differ from one another by only one
CHARACTERIZING MONOTONE INFLUENCE MEASURES                             feature; moreover, it only measures influence when a change in
Influence measures satisfying the Axioms in Section 3 must             the state of a single feature changes the classification outcome.
follow a simple formula, described in Theorem 4.1; the full            Quantitative Input Influence
proof of Theorem 4.1 appears in a full version of this work.2
                                                                        [6] propose a general framework for influence measure in
Below, 1(p) is a {1, −1}-valued indicator (i.e. 1 if p is true
                                                                        datasets, generalizing counterfactual influence. Instead of
and −1 otherwise), and k~xk2 is the Euclidean length of ~x; note
                                                                        measuring the effect of changing a single feature on point
that we can admit other distances over Rn , but stick with k·k2
                                                                       ~x ∈ X , they examine the expected effect of changing a set
for concreteness.
                                                                        of features. The resulting measure, named QII (Quantitative
   T HEOREM 4.1. Axioms 1 to 6 are satisfied iff φ is of the            Input Influence) is based on the Shapley value [9], a method
form                                                                    of measuring the importance of individuals in collaborative
                                                                        environments. QII allows access to counterfactual information;
    φ (~x, X ) =     ∑ (~y −~x)α(k~y −~xk2 )1(c(~x) = c(~y))    (1)     moreover, it is computationally intensive in practice, and under
                   ~y∈X \~x
                                                                        its current implementation, will not scale to domains having
where α is any non-negative-valued function.                            more than a few dozen features.
We refer to measures satisfying Equation (1) as monotone               Black-Box Access Vs. Data-Driven Approaches
influence measures (MIM). MIM uniquely satisfy a set of                Some measures above assume black-box access to the clas-
reasonable axioms; moreover, they maximize the total cosine            sifier (e.g. QII and LIME); others (e.g. Parzen and MIM)
similarity objective function. Intuitively, given a vector~x ∈ X ,     make no such assumption. Is it valid to assume black-box
an MIM vector φ (~x, X ) will point in the direction that has the      access to a classifier? This depends on the implementation
‘most’ vectors in X sharing a label with ~x. The value kφ k2           domain one has in mind and the strength of explanations that
can be thought of as one’s confidence in the direction: if kφ k2       one wishes to arrive at. On the one hand, having more access,
is high, this means that one is fairly certain where other vectors     measures such as QII and LIME can offer better explanations
sharing a label with ~x are (and, correspondingly, this means          in a sparse data domain; however, they are essentially unus-
that there are at least some highly influential features identified    able when one does not have access to the underlying classifier.
by φ ); a small value of kφ k2 implies low explanation strength.       Data-driven approaches such as MIM, the counterfactual mea-
2 The main paper is currently under review.                            sure, and Parzen are more generic and can be applied on any
given dataset; however, they will naturally not be particularly       mation of the original classifier, which may not be feasible to
informative in sparse regions of the dataset.                         achieve, nor easy to interpret. A better understanding of this
                                                                      behavior would be an important step in the design of influence
DISCUSSION AND FUTURE WORK
                                                                      measures. Finally, it is important to translate our numerical
                                                                      measure to an actual human-readable report. [6] propose using
In this paper, we argue for the axiomatic treatment of linear
                                                                      linear explanations as transparency reports; however, more ad-
influence measurement. We present a measure uniquely de-
                                                                      vanced methods which assume access to the classifier source
rived from a set of reasonable properties which also optimizes
                                                                      code propose mapping back to specific subroutines for ex-
a natural objective function. Our characterization subsumes
                                                                      planations [5, 10]. Indeed, while the transition from data to
known influence measures proposed in the literature. In partic-
                                                                      numerical explanations is an important step, mapping these to
ular, MIM becomes the Banzhaf index in cooperative games
                                                                      actual human-interpretable explanations is an open problem.
and is also related to formal models of causality. Furthermore,
MIM generalizes the measure proposed by [2] for measuring             REFERENCES
influence in a data-dependent cooperative game setting. Tak-           1. D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe,
ing a broader perspective, axiomatic influence analysis in data           K. Hansen, and K.-R. Müller. 2010. How to explain
domains is an important research direction: it allows us to               individual classification decisions. Journal of Machine
rigorously discuss the underlying desirable norms we’d like               Learning Research 11 (2010), 1803–1831.
to see in our explanations. Indeed, an alternative set of axioms
is likely to result in other novel measures, that satisfy other de-    2. E. Balkanski, U. Syed, and S. Vassilvitskii. 2017.
sirable properties. Being able to mathematically justify one’s            Statistical Cost Sharing. In Proceedings of the 30th
choice of influence measures is important from a legal/ethical            Annual Conference on Neural Information Processing
perspective as well: when explaining the behavior of classi-              Systems (NIPS). 6222–6231.
fiers in high-stakes domains, having provably sound measures           3. J.F. Banzhaf. 1965. Weighted Voting Doesn’t Work: a
offers mathematical backing to those using them.                          Mathematical Analysis. Rutgers Law Review 19 (1965),
                                                                          317–343.
While MIM offers an interesting perspective on influence mea-
surement, it is but a first step. There are several interesting        4. A. Datta, A. Datta, A. D. Procaccia, and Y. Zick. 2015.
directions for future work; first, our analysis is currently lim-         Influence in Classification via Cooperative Game Theory.
ited to binary classification domains. It is possible to naturally        In Proceedings of the 24th International Joint Conference
extend our results to regression domains, e.g. by replacing               on Artificial Intelligence (IJCAI).
the value 1(c(~x) = c(~y)) with c(~x) − c(~y); however, it is not      5. A. Datta, M. Fredrikson, G. Ko, P. Mardziel, and S. Sen.
entirely clear how one might define influence measures for                2017. Proxy Non-Discrimination in Data-Driven Systems.
multiclass domains. It is still possible to retain 1(c(~x) = c(~y))       CoRR abs/1707.08120 (2017).
as the measure of ‘closeness’ between classification outputs           6. A. Datta, S. Sen, and Y. Zick. 2016. Algorithmic
— i.e. all points that share ~x’s output offer positive influence,        Transparency via Quantitative Input Influence. In
and all those who do not offer negative influence — but we                Proceedings of the 37th IEEE Conference on Security
believe that this may result in a somewhat coarse influence               and Privacy (Oakland).
analysis. This is especially true in cases where there is a large
number of possible output labels. One possible solution for            7. S.M. Lundberg and S. Lee. 2017. A Unified Approach to
the multiclass case would be to define a distance metric over             Interpreting Model Predictions. In Proceedings of the
output labels; however, the choice of metric would greatly                30th Annual Conference on Neural Information
impact the outputs of MIM (or any other influence measure).               Processing Systems (NIPS). 4768–4777.
                                                                       8. M. T. Ribeiro, S. Singh, and C. Guestrin. 2016. “Why
Another major issue with MIM (and several other measures)
                                                                          Should I Trust You?”: Explaining the Predictions of Any
is that their explanations are limited to the influence of indi-
                                                                          Classifier. In Proceedings of the 22nd International
vidual features; they do not capture joint effect, let alone more
                                                                          Conference on Knowledge Discovery and Data Mining
complex synergistic effects of features on outputs (the only
                                                                          (KDD). 1513–1522.
exception to this is LIME, which, at least in theory, allows
fitting non-linear classifiers in the local region of the point of     9. L.S. Shapley. 1953. A Value for n-Person Games. In
interest). It would be a major theoretical challenge to axiom-            Contributions to the Theory of Games, vol. 2. Princeton
atize and design ‘good’ methods for measuring the effect of               University Press, 307–317.
pairwise (or k-wise) interactions amongst features. This also         10. S. Singh, M. T. Ribeiro, and C. Guestrin. 2016. Programs
allows one to have a natural tradeoff between the accuracy and            as Black-Box Explanations. CoRR abs/1611.07579
interpretability of a given explanation. A linear explanation             (2016).
(e.g. LIME, QII, or this work) is easy to understand: each
                                                                      11. M. Sundararajan, A. Taly, and Q. Yan. 2017. Axiomatic
feature is assigned a number that corresponds to their positive
                                                                          Attribution for Deep Networks. arXiv preprint
or negative effect on the output of ~x; a measure that captures
                                                                          arXiv:1703.01365 (2017).
k-wise interactions would be able to explain much more of the
underlying feature interactions, but would naturally be less          12. H.P. Young. 1985. Monotonic solutions of cooperative
human interpretable. Indeed, a measure that captures all levels           games. International Journal of Game Theory 14, 2
of feature interactions would be equivalent to a local approxi-           (1985), 65–72.