-

An Axiomatic Approach to Linear Explanations in Data Classification

Jakub Sliwinski

jakvbs@gmail.com 0

Martin Strobel

mstrobel@comp.nus.edu.sg 1

Yair Zick

dcsyaz@nus.edu.sg 1 0 ETH Zurich , Zurich , Switzerland 1 National Univ. of Singapore , Singapore

In this work, we focus on local explanations for data analytics; in other words: given a datapoint ~x, how important was the i-th feature in determining the outcome for ~x? The literature has seen a recent emergence of various analytical answers to this question. We argue for a linear influence measure explanation: given a datapoint ~x, assign a value fi(~x) to every feature i, which roughly corresponds to feature i's importance in determining the outcome for ~x. We present a family of measures called MIM (monotone influence measures), that are uniquely derived from a set of axioms: desirable properties that any reasonable influence measure should satisfy. Departing from prior work on influence measures, we assume no knowledge - or access - to the underlying classifier labeling the dataset. In other words, our influence measures are based on the dataset alone and do not make any queries to the classifier. We compare MIM to other linear explanation models in the literature and discuss their underlying assumptions, merits, and limitations.

INTRODUCTION An individual is denied a bank loan; knowing that they are in good financial standing, they demand that the bank explain its decision. However, the bank uses an ML algorithm that automatically rejected the loan application. How should the bank explain its decision? This example is more than anecdotal; recent years have seen the widespread implementation of data-driven algorithms making decisions in increasingly high-stakes domains, such as healthcare, transportation, and public safety. Using novel ML techniques, algorithms are able to process massive amounts of data and make highly accurate predictions; however, their inherent complexity makes it increasingly difficult for humans to understand why certain decisions were made. By obfuscating the underlying decisionmaking processes, such algorithms potentially expose human ©2018. Copyright for the individual papers remains with the authors. Copying permitted for private and academic purposes.

ExSS ’18, March 11, Tokyo, Japan. stakeholders to risks. These risks could include incorrect decisions (e.g. Alice’s application was wrongly rejected due to a system bug), information leaks (e.g. the algorithm was inadvertently given information about Alice that it should not have seen), or discrimination (e.g. the algorithm is biased against female applicants). Indeed, government bodies and regulatory authorities have recently begun calling for algorithmic transparency: providing human-interpretable explanations of the underlying reasoning behind large-scale decision-making algorithms. Our work represents a first formal axiomatic analysis of automatically generated explanations of black-box classifiers.

Our Proposal We propose utilizing simple mathematical frameworks for an explanation via influence measures: these are functions that, given a dataset, assign a value to every feature; this value should roughly correspond to the feature’s importance in affecting the classification outcome for individual data points. Slightly more formally, we are given a dataset X containing n dimensional vectors, whose data points are labeled by a binary classifier c, such that c(~y) = 1 for all ~y 2 X ; now, given a point of interest ~x 2 X , we wish to identify the features in ~x that are ‘responsible’ for it being labeled the way it was. This is done via a mapping f whose input is the dataset X , its labels (given by c), and the point of interest ~x; its output is a vector f (~x) 2 Rn, where fi(~x) corresponds to the influence of feature i on the label of ~x. Intuitively, a large positive value of fi(~x) should mean that feature i was highly important in determining the label of ~x; a large negative value for fi(~x) should mean that despite the value of i at ~x, ~x was assigned this label. This approach carries several important benefits. First of all, it is completely generic, requiring no assumptions on the underlying classification model; secondly, linear explanation models are simple and straightforward, even for a layperson to understand (e.g. ‘Alice was denied her loan because of the high importance the algorithm placed on her low monthly income, and despite her never having to file for bankruptcy’). The appeal of linear explanations has been recognized by the research community; recent years have seen a moderate boom of papers proposing linear explanations in data-driven domains (see Section 1.2). However, this poses a new problem for end users that wish to apply these methodologies: which linear explanation is the ‘right’ one to choose? In other words, . . . which linear explanations are guaranteed to satisfy certain desirable properties? We argue for an axiomatization of influence measures in classification domains. The axiomatic approach is common in the economics literature: first one reasons about simple, reasonable properties (axioms) which should be satisfied by any function (say, methods for dividing revenue amongst collaborators, or agreeing on an election winner given voters’ preferences); next, one should prove that there exists a unique function satisfying these simple mathematical properties. The axiomatic approach allows one to rigorously reason about the types of influence measures one should use in a given setting: if the axioms set forth make sense in this setting, there is but one method of assigning influence in the given domain. It is, in some sense, an explanation of an explanation method, a provable guarantee that the method is sound; in fact, uniqueness implies that it is the only sound method one can reasonably use in a domain.

In a recent line of work, we identify specific properties that any reasonable influence measure should satisfy (Section 3); using these axioms, we mathematically derive a class of influence measures, dubbed monotone influence measures (MIM), which uniquely satisfy these axioms (Section 4). Unlike most existing influence measures in the literature, we assume neither knowledge of the underlying decision-making algorithm, nor of its behavior on points outside the dataset. Indeed, some methodologies (see Related Work in Section 1.2) are heavily reliant on having access to counterfactual information: what would the classifier have done if some features were changed? This is a rather strong assumption, as it assumes not only access to the classifier but also the potential ability to use it on nonsensical data points1. By making no such assumptions, we are able to provide a far more general methodology for measuring influence; indeed, many of the tools described in Section 1.2 will simply not be usable when queries to the classifier are not available, or when the underlying classification algorithm is not known. Finally, grounding the measure in the dataset ensures the distribution of data is accounted for, rather than explaining the classification in terms of arbitrarily chosen data points. The points can be very unlikely or impossible to occur in practice, and using them can demonstrate a behavior the algorithm will never exhibit in its actual domain. Despite their rather limiting conceptual framework, our influence measures do surprisingly well on a sparse image dataset. We show that the outputs of our influence measure are comparable to those of other measures, and provide interpretable results. Related Work Axiomatic approaches for influence measurement are common in economic domains. Of particular note are axiomatic approaches in cooperative game theory [ 9, 12, 3 ].

The first axiomatic characterization of an influence measure for datasets is provided in [ 4 ]; however, they interpret influence as a global measure (e.g., what is the overall importance of gender for decision making). Moreover, one of the axioms proposed in [ 4 ] turned out to be too strong, severely limiting the explanation power of the resulting measure. Indeed, as 1For example if the dataset consists of medical records of men and women, the classifier might need to answer how it would handle pregnant men [ 6 ] show, the measure proposed by [ 4 ] outputs undesirable values (e.g. zero influence) in many real instances. [ 1 ] propose an empirical influence measure that relies on a potential-like approach. However, as we show, their methodology fails to satisfy reasonable properties even on simple datasets. Other approaches in the literature either rely on black-box access to the classifier [ 6, 8 ], or assume domain knowledge (e.g. that the classifier is a neural network whose layers are observable) [ 11 ]. Another notable axiomatic treatment of influence in data-driven domains appears in [ 6 ]; in this work, it is shown that a Shapley value based approach is the only way influence can be measured when one assumes counterfactual access to the black-box classifier. This result is confirmed in [ 7 ]. THE FORMAL MODEL A dataset X = h~x1; : : : ;~xmi is given as a list of vectors in Rn (each dimension i 2 [n] is a feature), where every ~x j 2 X has a unique label c j 2 f 1; 1g; given a vector ~x 2 X , we often refer to the label of ~x as c(~x). For example, X can be a dataset of bank loan applications, with ~x describing the applicant profile (age, gender, credit score etc.), and c(~x) being a binary decision (accepted/rejected). An influence measure is simply a function f whose input is a dataset X , the labels of the vectors in X denoted by c, and a specific point ~x 2 X ; its output is a value fi(~x; X ; c) 2 R; we often omit the inputs X and c when they are clear from context. The value fi(~x) should roughly correspond to the importance of the i-th feature in determining the outcome c(~x) for ~x.

AXIOMS FOR EMPIRICAL INFLUENCE MEASUREMENT We are now ready to define our axioms; these are simple properties that we believe any reasonable influence measure should satisfy. We take a geometric interpretation of the dataset X ; thus, several of our axioms are phrased in terms of geometric operations on X . 1. Shift Invariance: let X +~b be the dataset resulting from adding the vector~b 2 Rn to every vector in X (not changing the labels). An influence measure f is said to be shift invariant if for any vector~b 2 Rn, any i 2 [n] and any ~x 2 X , fi(~x; X ) = fi(~x +~b; X +~b): In other words, shifting the entire dataset by some vector ~b should not affect feature importance. 2. Rotation and Reflection Faithfulness: let A be a rotation (or reflection) matrix, i.e. an n n matrix with det(A) 2 1; let AX be the dataset resulting from taking every point ~x in X and replacing it with A~x. An influence measure f is said to be faithful to rotation and reflection if for any rotation matrix A, and any point ~x 2 X , we have Af (~x; X ) = f (A~x; AX ): In other words, rotating or reflecting the entire dataset results in the influence vector rotating in the same manner. 3. Continuity: an influence measure f is said to be continuous if it is a continuous function of X . 4. Flip Invariance: let c be the labeling resulting from replacing every label c(~x) with c(~x). An influence measure is flip invariant if for every point ~x 2 X and every i 2 [n] we have fi(~x; X ; c) = fi(~x; X ; c): 5. Monotonicity: a point ~y 2 Rn is said to strengthen the influence of feature i with respect to ~x 2 X if c(~x) = c(~y) and yi > xi; similarly, a point ~y 2 Rn is said to weaken the influence of i with respect to ~x 2 X if yi > xi and c(~x) , c(~y). An influence measure f is said to be monotonic, if for any data set X , any feature i and any data point ~x 2 X we have fi(~x; X ) fi(~x; X [ f~yg) whenever ~y strengthens i w.r.t. ~x, and fi(~x; X ) fi(~x; X [ f~yg) whenever ~y weakens i w.r.t. ~x. 6. Random Labels: an influence measure f is said to satisfy the random labels axiom, if for any dataset X , if all labels are assigned i.i.d. uniformly at random (i.e. for all ~x 2 X , Pr[c(~x) = 1] = Pr[c(~x) = 1]); we call this label distribution U . Then, for all ~x 2 X and all i we have

Ec U [fi(~x; X ; c) j c(~x) = 1] = Ec U [fi(~x; X ; c) j c(~x) = In other words, when we fix the label of ~x and randomize all other labels, the expected influence of all features is 0. Let us briefly discuss the latter two axioms. Monotonicity is key in defining what influence means: intuitively, if one is to argue that Alice’s old age caused her loan rejection, then finding older persons whose loans were similarly rejected should strengthen this argument; however, finding older persons whose loans were not rejected should weaken the argument. The Random Labels axiom states that when labels are randomly generated, no feature should have any influence in expectation; any influence measure that fails this test is inherently biased towards assigning influence to some features, even when labels are completely unrelated to the data. CHARACTERIZING MONOTONE INFLUENCE MEASURES Influence measures satisfying the Axioms in Section 3 must follow a simple formula, described in Theorem 4.1; the full proof of Theorem 4.1 appears in a full version of this work.2 Below, 1(p) is a f1; 1g-valued indicator (i.e. 1 if p is true and 1 otherwise), and k~xk2 is the Euclidean length of ~x; note that we can admit other distances over Rn, but stick with k k2 for concreteness.

THEOREM 4.1. Axioms 1 to 6 are satisfied iff f is of the form f (~x; X ) =

å (~y ~x)a(k~y ~xk2)1(c(~x) = c(~y)) ~y2X n~x (1) where a is any non-negative-valued function.

We refer to measures satisfying Equation (1) as monotone influence measures (MIM). MIM uniquely satisfy a set of reasonable axioms; moreover, they maximize the total cosine similarity objective function. Intuitively, given a vector~x 2 X , an MIM vector f (~x; X ) will point in the direction that has the ‘most’ vectors in X sharing a label with ~x. The value kf k2 can be thought of as one’s confidence in the direction: if kf k2 is high, this means that one is fairly certain where other vectors sharing a label with ~x are (and, correspondingly, this means that there are at least some highly influential features identified by f ); a small value of kf k2 implies low explanation strength. 2The main paper is currently under review.

EXISTING MEASURES In this section, we provide an overview of some existing methodologies for measuring influence in data domains and compare them to MIM.

Parzen The main idea behind the approach followed by [ 1 ] is to approximate the labeled dataset with a potential function and then use the derivative of this function to locally assign influence to features. Parzen satisfies Axioms 1 to 4. However, it is neither monotonic nor can it efficiently detect random labels. LIME The measure in [ 8 ] is based on the idea of finding a best local fit for the classifier in a region around ~x. At its core, LIME fits a classifier by minimizing the mean-squared error, whereas MIM maximizes cosine similarity.

The Counterfactual Influence Measure [ 4 ] initiated the axiomatic treatment of influence in data analysis; they propose a counterfactual aggregate influence measure for black-box data domains. Unlike other measures in this section, [ 4 ] do not measure local feature influence; rather, they measures the overall influence of a feature for a given dataset. The measure proposed by [ 4 ] does the following: when measuring the influence of the i-th feature; for every point ~x 2 X , it counts the number of points in X who differ from ~x by only the i-th feature, and in their classification outcome. Given its rather restrictive notion of influence, this methodology only measures non-zero influence in very specific types of datasets: it assigns zero influence to all features in datasets that do not contain data points that differ from one another by only one feature; moreover, it only measures influence when a change in the state of a single feature changes the classification outcome. Quantitative Input Influence [ 6 ] propose a general framework for influence measure in datasets, generalizing counterfactual influence. Instead of measuring the effect of changing a single feature on point ~x 2 X , they examine the expected effect of changing a set of features. The resulting measure, named QII (Quantitative Input Influence) is based on the Shapley value [ 9 ], a method of measuring the importance of individuals in collaborative environments. QII allows access to counterfactual information; moreover, it is computationally intensive in practice, and under its current implementation, will not scale to domains having more than a few dozen features.

Black-Box Access Vs. Data-Driven Approaches Some measures above assume black-box access to the classifier (e.g. QII and LIME); others (e.g. Parzen and MIM) make no such assumption. Is it valid to assume black-box access to a classifier? This depends on the implementation domain one has in mind and the strength of explanations that one wishes to arrive at. On the one hand, having more access, measures such as QII and LIME can offer better explanations in a sparse data domain; however, they are essentially unusable when one does not have access to the underlying classifier. Data-driven approaches such as MIM, the counterfactual measure, and Parzen are more generic and can be applied on any given dataset; however, they will naturally not be particularly informative in sparse regions of the dataset.

DISCUSSION AND FUTURE WORK In this paper, we argue for the axiomatic treatment of linear influence measurement. We present a measure uniquely derived from a set of reasonable properties which also optimizes a natural objective function. Our characterization subsumes known influence measures proposed in the literature. In particular, MIM becomes the Banzhaf index in cooperative games and is also related to formal models of causality. Furthermore, MIM generalizes the measure proposed by [ 2 ] for measuring influence in a data-dependent cooperative game setting. Taking a broader perspective, axiomatic influence analysis in data domains is an important research direction: it allows us to rigorously discuss the underlying desirable norms we’d like to see in our explanations. Indeed, an alternative set of axioms is likely to result in other novel measures, that satisfy other desirable properties. Being able to mathematically justify one’s choice of influence measures is important from a legal/ethical perspective as well: when explaining the behavior of classifiers in high-stakes domains, having provably sound measures offers mathematical backing to those using them.

While MIM offers an interesting perspective on influence measurement, it is but a first step. There are several interesting directions for future work; first, our analysis is currently limited to binary classification domains. It is possible to naturally extend our results to regression domains, e.g. by replacing the value 1(c(~x) = c(~y)) with c(~x) c(~y); however, it is not entirely clear how one might define influence measures for multiclass domains. It is still possible to retain 1(c(~x) = c(~y)) as the measure of ‘closeness’ between classification outputs — i.e. all points that share ~x’s output offer positive influence, and all those who do not offer negative influence — but we believe that this may result in a somewhat coarse influence analysis. This is especially true in cases where there is a large number of possible output labels. One possible solution for the multiclass case would be to define a distance metric over output labels; however, the choice of metric would greatly impact the outputs of MIM (or any other influence measure). Another major issue with MIM (and several other measures) is that their explanations are limited to the influence of individual features; they do not capture joint effect, let alone more complex synergistic effects of features on outputs (the only exception to this is LIME, which, at least in theory, allows fitting non-linear classifiers in the local region of the point of interest). It would be a major theoretical challenge to axiomatize and design ‘good’ methods for measuring the effect of pairwise (or k-wise) interactions amongst features. This also allows one to have a natural tradeoff between the accuracy and interpretability of a given explanation. A linear explanation (e.g. LIME, QII, or this work) is easy to understand: each feature is assigned a number that corresponds to their positive or negative effect on the output of ~x; a measure that captures k-wise interactions would be able to explain much more of the underlying feature interactions, but would naturally be less human interpretable. Indeed, a measure that captures all levels of feature interactions would be equivalent to a local approximation of the original classifier, which may not be feasible to achieve, nor easy to interpret. A better understanding of this behavior would be an important step in the design of influence measures. Finally, it is important to translate our numerical measure to an actual human-readable report. [ 6 ] propose using linear explanations as transparency reports; however, more advanced methods which assume access to the classifier source code propose mapping back to specific subroutines for explanations [ 5, 10 ]. Indeed, while the transition from data to numerical explanations is an important step, mapping these to actual human-interpretable explanations is an open problem.

Baehrens ,

Schroeter ,

Harmeling ,

Kawanabe ,

Hansen , and

K.-R.

Müller . 2010 . How to explain individual classification decisions . Journal of Machine Learning Research 11 ( 2010 ), 1803 - 1831 .

Balkanski ,

Syed , and

Vassilvitskii . 2017 . Statistical Cost Sharing . In Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NIPS) . 6222 - 6231 .

J.F.

Banzhaf . 1965 . Weighted Voting Doesn't Work: a Mathematical Analysis . Rutgers Law Review 19 ( 1965 ), 317 - 343 .

Datta ,

A. D.

Procaccia , and

Zick . 2015 . Influence in Classification via Cooperative Game Theory . In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI).

Datta ,

Fredrikson , G. Ko,

Mardziel , and

Sen . 2017 . Proxy Non-Discrimination in Data-Driven Systems . CoRR abs/1707 .08120 ( 2017 ).

Datta ,

Sen , and

Zick . 2016 . Algorithmic Transparency via Quantitative Input Influence . In Proceedings of the 37th IEEE Conference on Security and Privacy (Oakland).

S.M.

Lundberg and

Lee . 2017 . A Unified Approach to Interpreting Model Predictions . In Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NIPS) . 4768 - 4777 .

M. T.

Ribeiro ,

Singh , and

Guestrin . 2016 . “ Why Should I Trust You?”: Explaining the Predictions of Any Classifier . In Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining (KDD) . 1513 - 1522 .

L.S.

Shapley . 1953 . A Value for n-Person Games . In Contributions to the Theory of Games , vol. 2 . Princeton University Press, 307 - 317 .

10.

Singh ,

M. T.

Ribeiro , and

Guestrin . 2016 . Programs as Black-Box Explanations . CoRR abs/1611 .07579 ( 2016 ).

11. M. Sundararajan , A.

Taly , and Q.

Yan . 2017 . Axiomatic Attribution for Deep Networks . arXiv preprint arXiv:1703.01365 ( 2017 ).

12.

H.P.

Young . 1985 . Monotonic solutions of cooperative games . International Journal of Game Theory 14 , 2 ( 1985 ), 65 - 72 .