An Axiomatic Approach to Linear Explanations in Data Classification Jakub Sliwinski Martin Strobel Yair Zick ETH Zurich National Univ. of Singapore National Univ. of Singapore Zurich, Switzerland Singapore Singapore jakvbs@gmail.com mstrobel@comp.nus.edu.sg dcsyaz@nus.edu.sg ABSTRACT stakeholders to risks. These risks could include incorrect deci- In this work, we focus on local explanations for data analytics; sions (e.g. Alice’s application was wrongly rejected due to a in other words: given a datapoint ~x, how important was the system bug), information leaks (e.g. the algorithm was inad- i-th feature in determining the outcome for ~x? The literature vertently given information about Alice that it should not have has seen a recent emergence of various analytical answers to seen), or discrimination (e.g. the algorithm is biased against this question. We argue for a linear influence measure ex- female applicants). Indeed, government bodies and regulatory planation: given a datapoint ~x, assign a value φi (~x) to every authorities have recently begun calling for algorithmic trans- feature i, which roughly corresponds to feature i’s importance parency: providing human-interpretable explanations of the in determining the outcome for ~x. We present a family of underlying reasoning behind large-scale decision-making algo- measures called MIM (monotone influence measures), that are rithms. Our work represents a first formal axiomatic analysis uniquely derived from a set of axioms: desirable properties of automatically generated explanations of black-box classi- that any reasonable influence measure should satisfy. Depart- fiers. ing from prior work on influence measures, we assume no knowledge — or access — to the underlying classifier labeling Our Proposal the dataset. In other words, our influence measures are based on the dataset alone and do not make any queries to the classi- We propose utilizing simple mathematical frameworks for fier. We compare MIM to other linear explanation models in an explanation via influence measures: these are functions the literature and discuss their underlying assumptions, merits, that, given a dataset, assign a value to every feature; this and limitations. value should roughly correspond to the feature’s importance in affecting the classification outcome for individual data points. ACM Classification Keywords Slightly more formally, we are given a dataset X containing n dimensional vectors, whose data points are labeled by a binary H.5.2 Information Interfaces and Presentation: User Inter- classifier c, such that c(~y) = ±1 for all ~y ∈ X ; now, given a faces—Theory and methods point of interest ~x ∈ X , we wish to identify the features in ~x that are ‘responsible’ for it being labeled the way it was. This Author Keywords is done via a mapping φ whose input is the dataset X , its Influence Measures, Explainable ML, Algorithmic labels (given by c), and the point of interest ~x; its output is a Transparency vector φ (~x) ∈ Rn , where φi (~x) corresponds to the influence of feature i on the label of ~x. Intuitively, a large positive value INTRODUCTION of φi (~x) should mean that feature i was highly important in An individual is denied a bank loan; knowing that they are in determining the label of ~x; a large negative value for φi (~x) good financial standing, they demand that the bank explain should mean that despite the value of i at~x,~x was assigned this its decision. However, the bank uses an ML algorithm that label. This approach carries several important benefits. First of automatically rejected the loan application. How should the all, it is completely generic, requiring no assumptions on the bank explain its decision? This example is more than anec- underlying classification model; secondly, linear explanation dotal; recent years have seen the widespread implementation models are simple and straightforward, even for a layperson of data-driven algorithms making decisions in increasingly to understand (e.g. ‘Alice was denied her loan because of high-stakes domains, such as healthcare, transportation, and the high importance the algorithm placed on her low monthly public safety. Using novel ML techniques, algorithms are able income, and despite her never having to file for bankruptcy’). to process massive amounts of data and make highly accu- The appeal of linear explanations has been recognized by the rate predictions; however, their inherent complexity makes it research community; recent years have seen a moderate boom increasingly difficult for humans to understand why certain of papers proposing linear explanations in data-driven domains decisions were made. By obfuscating the underlying decision- (see Section 1.2). However, this poses a new problem for end making processes, such algorithms potentially expose human users that wish to apply these methodologies: which linear explanation is the ‘right’ one to choose? In other words, ©2018. Copyright for the individual papers remains with the authors. . . . which linear explanations are guaranteed to satisfy Copying permitted for private and academic purposes. ExSS ’18, March 11, Tokyo, Japan. certain desirable properties? We argue for an axiomatization of influence measures in clas- [6] show, the measure proposed by [4] outputs undesirable sification domains. The axiomatic approach is common in the values (e.g. zero influence) in many real instances. [1] propose economics literature: first one reasons about simple, reason- an empirical influence measure that relies on a potential-like able properties (axioms) which should be satisfied by any func- approach. However, as we show, their methodology fails to tion (say, methods for dividing revenue amongst collaborators, satisfy reasonable properties even on simple datasets. Other or agreeing on an election winner given voters’ preferences); approaches in the literature either rely on black-box access to next, one should prove that there exists a unique function sat- the classifier [6, 8], or assume domain knowledge (e.g. that isfying these simple mathematical properties. The axiomatic the classifier is a neural network whose layers are observ- approach allows one to rigorously reason about the types of able) [11]. Another notable axiomatic treatment of influence influence measures one should use in a given setting: if the in data-driven domains appears in [6]; in this work, it is shown axioms set forth make sense in this setting, there is but one that a Shapley value based approach is the only way influence method of assigning influence in the given domain. It is, in can be measured when one assumes counterfactual access to some sense, an explanation of an explanation method, a prov- the black-box classifier. This result is confirmed in [7]. able guarantee that the method is sound; in fact, uniqueness implies that it is the only sound method one can reasonably THE FORMAL MODEL use in a domain. A dataset X = h~x1 , . . . ,~xm i is given as a list of vectors in Rn (each dimension i ∈ [n] is a feature), where every ~x j ∈ X In a recent line of work, we identify specific properties that has a unique label c j ∈ {−1, 1}; given a vector ~x ∈ X , we any reasonable influence measure should satisfy (Section 3); often refer to the label of ~x as c(~x). For example, X can using these axioms, we mathematically derive a class of influ- be a dataset of bank loan applications, with ~x describing the ence measures, dubbed monotone influence measures (MIM), applicant profile (age, gender, credit score etc.), and c(~x) being which uniquely satisfy these axioms (Section 4). Unlike most a binary decision (accepted/rejected). An influence measure existing influence measures in the literature, we assume nei- is simply a function φ whose input is a dataset X , the labels ther knowledge of the underlying decision-making algorithm, of the vectors in X denoted by c, and a specific point ~x ∈ X ; nor of its behavior on points outside the dataset. Indeed, some its output is a value φi (~x, X , c) ∈ R; we often omit the inputs methodologies (see Related Work in Section 1.2) are heavily X and c when they are clear from context. The value φi (~x) reliant on having access to counterfactual information: what should roughly correspond to the importance of the i-th feature would the classifier have done if some features were changed? in determining the outcome c(~x) for ~x. This is a rather strong assumption, as it assumes not only ac- cess to the classifier but also the potential ability to use it on AXIOMS FOR EMPIRICAL INFLUENCE MEASUREMENT nonsensical data points1 . By making no such assumptions, We are now ready to define our axioms; these are simple prop- we are able to provide a far more general methodology for erties that we believe any reasonable influence measure should measuring influence; indeed, many of the tools described in satisfy. We take a geometric interpretation of the dataset X ; Section 1.2 will simply not be usable when queries to the clas- thus, several of our axioms are phrased in terms of geometric sifier are not available, or when the underlying classification operations on X . algorithm is not known. Finally, grounding the measure in the dataset ensures the distribution of data is accounted for, rather 1. Shift Invariance: let X +~b be the dataset resulting from than explaining the classification in terms of arbitrarily chosen adding the vector ~b ∈ Rn to every vector in X (not changing data points. The points can be very unlikely or impossible to the labels). An influence measure φ is said to be shift invariant occur in practice, and using them can demonstrate a behavior if for any vector ~b ∈ Rn , any i ∈ [n] and any ~x ∈ X , the algorithm will never exhibit in its actual domain. Despite their rather limiting conceptual framework, our influence mea- φi (~x, X ) = φi (~x +~b, X +~b). sures do surprisingly well on a sparse image dataset. We show In other words, shifting the entire dataset by some vector ~b that the outputs of our influence measure are comparable to should not affect feature importance. those of other measures, and provide interpretable results. 2. Rotation and Reflection Faithfulness: let A be a rotation Related Work (or reflection) matrix, i.e. an n × n matrix with det(A) ∈ ±1; Axiomatic approaches for influence measurement are com- let AX be the dataset resulting from taking every point ~x in mon in economic domains. Of particular note are axiomatic X and replacing it with A~x. An influence measure φ is said to approaches in cooperative game theory [9, 12, 3]. be faithful to rotation and reflection if for any rotation matrix A, and any point ~x ∈ X , we have Aφ (~x, X ) = φ (A~x, AX ). The first axiomatic characterization of an influence measure In other words, rotating or reflecting the entire dataset results for datasets is provided in [4]; however, they interpret influ- in the influence vector rotating in the same manner. ence as a global measure (e.g., what is the overall importance of gender for decision making). Moreover, one of the axioms 3. Continuity: an influence measure φ is said to be continuous proposed in [4] turned out to be too strong, severely limiting if it is a continuous function of X . the explanation power of the resulting measure. Indeed, as 4. Flip Invariance: let −c be the labeling resulting from re- 1 For example if the dataset consists of medical records of men and placing every label c(~x) with −c(~x). An influence measure is women, the classifier might need to answer how it would handle flip invariant if for every point ~x ∈ X and every i ∈ [n] we pregnant men have φi (~x, X , c) = φi (~x, X , −c). 5. Monotonicity: a point ~y ∈ Rn is said to strengthen the EXISTING MEASURES influence of feature i with respect to ~x ∈ X if c(~x) = c(~y) In this section, we provide an overview of some existing and yi > xi ; similarly, a point ~y ∈ Rn is said to weaken the methodologies for measuring influence in data domains and influence of i with respect to ~x ∈ X if yi > xi and c(~x) , c(~y). compare them to MIM. An influence measure φ is said to be monotonic, if for any data set X , any feature i and any data point ~x ∈ X we have Parzen φi (~x, X ) ≤ φi (~x, X ∪ {~y}) whenever ~y strengthens i w.r.t. ~x, The main idea behind the approach followed by [1] is to ap- and φi (~x, X ) ≥ φi (~x, X ∪ {~y}) whenever ~y weakens i w.r.t. ~x. proximate the labeled dataset with a potential function and then use the derivative of this function to locally assign influ- ence to features. Parzen satisfies Axioms 1 to 4. However, it is 6. Random Labels: an influence measure φ is said to satisfy neither monotonic nor can it efficiently detect random labels. the random labels axiom, if for any dataset X , if all labels are assigned i.i.d. uniformly at random (i.e. for all ~x ∈ X , LIME Pr[c(~x) = 1] = Pr[c(~x) = −1]); we call this label distribution The measure in [8] is based on the idea of finding a best local U . Then, for all ~x ∈ X and all i we have fit for the classifier in a region around ~x. At its core, LIME fits Ec∼U [φi (~x, X , c) | c(~x) = 1] = a classifier by minimizing the mean-squared error, whereas Ec∼U [φi (~x, X , c) | c(~x) = −1] =0 MIM maximizes cosine similarity. In other words, when we fix the label of ~x and randomize all The Counterfactual Influence Measure other labels, the expected influence of all features is 0. [4] initiated the axiomatic treatment of influence in data analy- sis; they propose a counterfactual aggregate influence measure Let us briefly discuss the latter two axioms. Monotonicity is for black-box data domains. Unlike other measures in this key in defining what influence means: intuitively, if one is section, [4] do not measure local feature influence; rather, they to argue that Alice’s old age caused her loan rejection, then measures the overall influence of a feature for a given dataset. finding older persons whose loans were similarly rejected The measure proposed by [4] does the following: when mea- should strengthen this argument; however, finding older per- suring the influence of the i-th feature; for every point ~x ∈ X , sons whose loans were not rejected should weaken the argu- it counts the number of points in X who differ from ~x by only ment. The Random Labels axiom states that when labels are the i-th feature, and in their classification outcome. Given its randomly generated, no feature should have any influence in rather restrictive notion of influence, this methodology only expectation; any influence measure that fails this test is in- measures non-zero influence in very specific types of datasets: herently biased towards assigning influence to some features, it assigns zero influence to all features in datasets that do not even when labels are completely unrelated to the data. contain data points that differ from one another by only one CHARACTERIZING MONOTONE INFLUENCE MEASURES feature; moreover, it only measures influence when a change in Influence measures satisfying the Axioms in Section 3 must the state of a single feature changes the classification outcome. follow a simple formula, described in Theorem 4.1; the full Quantitative Input Influence proof of Theorem 4.1 appears in a full version of this work.2 [6] propose a general framework for influence measure in Below, 1(p) is a {1, −1}-valued indicator (i.e. 1 if p is true datasets, generalizing counterfactual influence. Instead of and −1 otherwise), and k~xk2 is the Euclidean length of ~x; note measuring the effect of changing a single feature on point that we can admit other distances over Rn , but stick with k·k2 ~x ∈ X , they examine the expected effect of changing a set for concreteness. of features. The resulting measure, named QII (Quantitative T HEOREM 4.1. Axioms 1 to 6 are satisfied iff φ is of the Input Influence) is based on the Shapley value [9], a method form of measuring the importance of individuals in collaborative environments. QII allows access to counterfactual information; φ (~x, X ) = ∑ (~y −~x)α(k~y −~xk2 )1(c(~x) = c(~y)) (1) moreover, it is computationally intensive in practice, and under ~y∈X \~x its current implementation, will not scale to domains having where α is any non-negative-valued function. more than a few dozen features. We refer to measures satisfying Equation (1) as monotone Black-Box Access Vs. Data-Driven Approaches influence measures (MIM). MIM uniquely satisfy a set of Some measures above assume black-box access to the clas- reasonable axioms; moreover, they maximize the total cosine sifier (e.g. QII and LIME); others (e.g. Parzen and MIM) similarity objective function. Intuitively, given a vector~x ∈ X , make no such assumption. Is it valid to assume black-box an MIM vector φ (~x, X ) will point in the direction that has the access to a classifier? This depends on the implementation ‘most’ vectors in X sharing a label with ~x. The value kφ k2 domain one has in mind and the strength of explanations that can be thought of as one’s confidence in the direction: if kφ k2 one wishes to arrive at. On the one hand, having more access, is high, this means that one is fairly certain where other vectors measures such as QII and LIME can offer better explanations sharing a label with ~x are (and, correspondingly, this means in a sparse data domain; however, they are essentially unus- that there are at least some highly influential features identified able when one does not have access to the underlying classifier. by φ ); a small value of kφ k2 implies low explanation strength. Data-driven approaches such as MIM, the counterfactual mea- 2 The main paper is currently under review. sure, and Parzen are more generic and can be applied on any given dataset; however, they will naturally not be particularly mation of the original classifier, which may not be feasible to informative in sparse regions of the dataset. achieve, nor easy to interpret. A better understanding of this behavior would be an important step in the design of influence DISCUSSION AND FUTURE WORK measures. Finally, it is important to translate our numerical measure to an actual human-readable report. [6] propose using In this paper, we argue for the axiomatic treatment of linear linear explanations as transparency reports; however, more ad- influence measurement. We present a measure uniquely de- vanced methods which assume access to the classifier source rived from a set of reasonable properties which also optimizes code propose mapping back to specific subroutines for ex- a natural objective function. Our characterization subsumes planations [5, 10]. Indeed, while the transition from data to known influence measures proposed in the literature. In partic- numerical explanations is an important step, mapping these to ular, MIM becomes the Banzhaf index in cooperative games actual human-interpretable explanations is an open problem. and is also related to formal models of causality. Furthermore, MIM generalizes the measure proposed by [2] for measuring REFERENCES influence in a data-dependent cooperative game setting. Tak- 1. D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, ing a broader perspective, axiomatic influence analysis in data K. Hansen, and K.-R. Müller. 2010. How to explain domains is an important research direction: it allows us to individual classification decisions. Journal of Machine rigorously discuss the underlying desirable norms we’d like Learning Research 11 (2010), 1803–1831. to see in our explanations. Indeed, an alternative set of axioms is likely to result in other novel measures, that satisfy other de- 2. E. Balkanski, U. Syed, and S. Vassilvitskii. 2017. sirable properties. Being able to mathematically justify one’s Statistical Cost Sharing. In Proceedings of the 30th choice of influence measures is important from a legal/ethical Annual Conference on Neural Information Processing perspective as well: when explaining the behavior of classi- Systems (NIPS). 6222–6231. fiers in high-stakes domains, having provably sound measures 3. J.F. Banzhaf. 1965. Weighted Voting Doesn’t Work: a offers mathematical backing to those using them. Mathematical Analysis. Rutgers Law Review 19 (1965), 317–343. While MIM offers an interesting perspective on influence mea- surement, it is but a first step. There are several interesting 4. A. Datta, A. Datta, A. D. Procaccia, and Y. Zick. 2015. directions for future work; first, our analysis is currently lim- Influence in Classification via Cooperative Game Theory. ited to binary classification domains. It is possible to naturally In Proceedings of the 24th International Joint Conference extend our results to regression domains, e.g. by replacing on Artificial Intelligence (IJCAI). the value 1(c(~x) = c(~y)) with c(~x) − c(~y); however, it is not 5. A. Datta, M. Fredrikson, G. Ko, P. Mardziel, and S. Sen. entirely clear how one might define influence measures for 2017. Proxy Non-Discrimination in Data-Driven Systems. multiclass domains. It is still possible to retain 1(c(~x) = c(~y)) CoRR abs/1707.08120 (2017). as the measure of ‘closeness’ between classification outputs 6. A. Datta, S. Sen, and Y. Zick. 2016. Algorithmic — i.e. all points that share ~x’s output offer positive influence, Transparency via Quantitative Input Influence. In and all those who do not offer negative influence — but we Proceedings of the 37th IEEE Conference on Security believe that this may result in a somewhat coarse influence and Privacy (Oakland). analysis. This is especially true in cases where there is a large number of possible output labels. One possible solution for 7. S.M. Lundberg and S. Lee. 2017. A Unified Approach to the multiclass case would be to define a distance metric over Interpreting Model Predictions. In Proceedings of the output labels; however, the choice of metric would greatly 30th Annual Conference on Neural Information impact the outputs of MIM (or any other influence measure). Processing Systems (NIPS). 4768–4777. 8. M. T. Ribeiro, S. Singh, and C. Guestrin. 2016. “Why Another major issue with MIM (and several other measures) Should I Trust You?”: Explaining the Predictions of Any is that their explanations are limited to the influence of indi- Classifier. In Proceedings of the 22nd International vidual features; they do not capture joint effect, let alone more Conference on Knowledge Discovery and Data Mining complex synergistic effects of features on outputs (the only (KDD). 1513–1522. exception to this is LIME, which, at least in theory, allows fitting non-linear classifiers in the local region of the point of 9. L.S. Shapley. 1953. A Value for n-Person Games. In interest). It would be a major theoretical challenge to axiom- Contributions to the Theory of Games, vol. 2. Princeton atize and design ‘good’ methods for measuring the effect of University Press, 307–317. pairwise (or k-wise) interactions amongst features. This also 10. S. Singh, M. T. Ribeiro, and C. Guestrin. 2016. Programs allows one to have a natural tradeoff between the accuracy and as Black-Box Explanations. CoRR abs/1611.07579 interpretability of a given explanation. A linear explanation (2016). (e.g. LIME, QII, or this work) is easy to understand: each 11. M. Sundararajan, A. Taly, and Q. Yan. 2017. Axiomatic feature is assigned a number that corresponds to their positive Attribution for Deep Networks. arXiv preprint or negative effect on the output of ~x; a measure that captures arXiv:1703.01365 (2017). k-wise interactions would be able to explain much more of the underlying feature interactions, but would naturally be less 12. H.P. Young. 1985. Monotonic solutions of cooperative human interpretable. Indeed, a measure that captures all levels games. International Journal of Game Theory 14, 2 of feature interactions would be equivalent to a local approxi- (1985), 65–72.