=Paper=
{{Paper
|id=Vol-2741/paper-04
|storemode=property
|title=Explaining Explanation Methods
|pdfUrl=https://ceur-ws.org/Vol-2741/paper-04.pdf
|volume=Vol-2741
|authors=Riccardo Guidotti
|dblpUrl=https://dblp.org/rec/conf/sigir/Guidotti20
}}
==Explaining Explanation Methods==
Explaining Explanation Methods Riccardo Guidotti[0000−0002−2827−7613] University of Pisa, Italy riccardo.guidotti@unipi.it Abstract. The most effective Artificial Intelligence (AI) systems exploit complex machine learning models to fulfill their tasks due to their high performance. Unfortunately, the most effective machine learning models use for their decision processes a logic not understandable from humans that makes them real black-box models. The lack of transparency on how AI systems make decisions is a clear limitation in their adoption in safety-critical and socially sensitive contexts. Consequently, since the ap- plications in which AI are employed are various, research in eXplainable AI (XAI) has recently caught much attention, with specific distinct re- quirements for different types of explanations for different users. In this paper, we briefly present the existing explanation problems, the main strategies adopted to solve them, and the desiderata for XAI methods. Finally, the most common types of explanations are illustrated with ref- erences to state-of-the-art explanation methods able to retrieve them. 1 Introduction Nowadays, Artificial Intelligence is one of the most important scientific and tech- nological areas, with a huge socio-economic impact and a pervasive adoption in every field of the modern society. High-profile applications such as autonomous vehicles, medical diagnosis, spam filtering, image recognition, and voice assis- tants are based on Artificial Intelligence (AI) systems. Modern AI is mainly based on Machine Learning models that allow AI systems to reach impressive performance in emulating human behavior. The most effective ML models are black-box models [18], i.e., obscure decision-making or predictive methods that “hide” the logic of their internal decision processes to humans, either because not human-understandable or because not directly accessible. Examples of black-box models include Neural Networks and Deep Neural Networks, SVMs, Ensemble classifiers such as Random Forest, but also compositions of expert systems, data mining, and hard-coded software. The choice for the adoption of these obscure models is driven by the high performance in terms of accuracy [36]. As a conse- quence, the last decade has witnessed the rise of a black-box society [27]. The lack of explanations of how these black-box models make decisions is a restriction for their adoption in safety-critical contexts and socially sensitive Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). BIRDS 2020, 30 July 2020, Xi’an, China (online). 6 domains such as healthcare or law. Moreover, the problem is not only for lack of transparency but also for possible biases inherited by black-box models from ar- tifacts and preconceptions hidden in the training data of the ML algorithms. Pre- dictive ML models learned on biased datasets may inherit such biases, possibly leading to unfair and wrong decisions. Consequences of biased misclassifications can damage decision-makers and put certain societal groups at risk [9, 28, 39] For instance, the AI software used by Amazon to determine the areas of the US to which Amazon would offer free same-day delivery, unintentionally restricted minority neighborhoods from participating in the program (often when every surrounding neighborhood was allowed)1 . Another example is relative to prop- ublica.org. Their journalists have shown that the COMPAS score, a predictive model for the “risk of crime recidivism” (proprietary secret of Northpointe), has a strong ethnic bias. Indeed, according to this score, a black who did not re-offend was classified as “high risk” twice as much as whites who did not re- offend. On the other hand, white repeat offenders were classified as “low risk” twice as much as black repeat offenders2 . In [9] is shown that the neural network used to train the English language words was encoding biases towards gender and stereotypes. The authors show that for the analogy “Man is to computer programmer as woman is to X”, the variable X was replaced by “homemaker” by the neural network. Consequently, the research in eXplainable AI (XAI) and on the study of explanation methods for obscure ML models has recently caught much attention [1, 5, 18, 26, 40]. In addition, an innovative aspect of the General Data Protection Regulation (GDPR) promulgated by the European Parliament, which has become law in May 2018, are the clauses on automated decision-making. The GDPR, for the first time, introduces, to some extent, a right of explanation for all individuals to obtain “meaningful explanations of the logic involved” when automated decision making takes place. Despite conflicting opinions among legal scholars regarding the real scope of these clauses [15, 24, 37], there is a joint agreement on the need for the implementation of such a principle is imperative and that it represents today a huge open scientific challenge. However, without technology capable of explaining the logic of black boxes, the right to explanation will remain a “dead letter”. How can companies trust their AI services without understanding and validating the underlying rationale of their ML components? Furthermore, in turn, how can users trust AI services? It will be impossible to increase the trust of people in AI without explaining the rationale followed by these models. These are the reasons why explanation is now at the heart of responsible, open data science across multiple industry sectors and scientific disciplines. 2 Explanation Methods A black-box predictor is a ML obscure model, whose internals are unknown to the observer, or they are known but uninterpretable by humans [18]. Therefore, 1 http://www.techinsider.io/how-algorithms-can-be-racist-2016-4 2 http://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing 7 in ML to interpret means to give or provide the meaning or to explain in un- derstandable terms the predictive process of a model to a human [5, 13]. It is assumed that the concepts composing an explanation are self-contained and do not need further explanations [18]. The most widely used approach to explain black-box models and return interpretations is a sort of reverse engineering: the explanation is learned by observing the changes in the black-box output by vary- ing the input. A set of dimensions are identified to analyze ML interpretability, and explanation methods and, in turn, reflect on existing types of explanations. Explanation Problems. In the literature, we recognize two types of prob- lems: black box explanation and explanation by design [18]. The black-box ex- planation idea is to couple a ML black-box model with an explanation method able to interpret the black-box decisions. The underlying strategy is to maintain the high performance of the obscure model and to use an explanation method to retrieve the explanations [12, 23, 29]. The explanation methods generally try to approximate the black-box behavior with an interpretable predictor, also named surrogate model. This kind of approach is the one more addressed nowadays in the XAI research field. On the other hand, the explanation by design consists of directly designing a transparent model that is interpretable by design and aims at replacing the obscure ML model with the new transparent one [32, 33]. In the literature, there are various models recognized to be interpretable. Examples are decision tree, decision rules, and linear models [14]. These models are considered easily understandable and interpretable for humans. They sacri- fice performance for interpretability. Besides, most of them cannot be applied to data types such as images or text, but only on tabular data. Explanation Targets and Strategy. We recognize global and local expla- nation methods depending on the target of the explanation. A global explanation consists in providing an explanation that allows understanding the whole logic of a black-box model and interpreting any possible decision. Global explanations are difficult to achieve, and in the literature are provided only for tabular data. On the other hand, a local explanation consists in retrieving the reasons for the prediction returned by a black-box model for a specific case. While for a global explanation, the interpretable surrogate approximates the whole black-box, for a local explanation, the interpretable surrogate model is used to approximate the black-box behavior only in a “neighborhood” of the instance analyzed. The idea is that, in such a neighborhood, it is easier to explain the decision boundary [29]. In addition, we distinguish between model-specific and model-agnostic ex- planation method depending on the strategy adopted. An explanation method is model-specific, or not generalizable [25], if it can be used to interpret only particular types of black-box models. If an explanation method is designed to interpret a Random Forest [36] and internally use a distance between trees, such a method cannot be used to explain the predictions of a Neural Network. On the other hand, a generalizable or model-agnostic explanation method can be used independently from the black-box model being explained because the internal characteristics of the black-box are not exploited to retrieve the explanation [29]. 8 Desiderata of Explainable Methods. A set of desiderata should be con- sidered when designing and using explanation methods [14]. The interpretability aspect should measure to what extent a given explanation is human-understanda- ble. Interpretability is generally evaluated with the complexity of the inter- pretable surrogate model. For example, the complexity of a rule can be measured with the number of clauses in the condition, for linear models with the number of non-zero weights, while for decision trees with the depth of the tree. The performance of the interpretable surrogate model form which explanations are extracted is generally called fidelity and measures to which extent it accurately imitates the black-box prediction. The fidelity is practically measured in terms of Accuracy score, F1-score, etc. [36] with respect to the prediction of the black- box model. Moreover, an interpretable model should satisfy guarantee fairness by protecting minorities against discrimination [31], and privacy by not revealing sensitive information [2]. Also, an explanation methods must return robust and stable explanations: similar instances should have similar explanation for a given black-box model [19]. In addition, since the meaningfulness of an explanation de- pends on the stakeholder [7], the explanation returned must consider the user background : common users require simple clarifications, while domain experts can be able to understand complex explanations. Finally, the time that a user is allowed to spend on understanding an explanation is another crucial aspect. In contexts where the decision time is not a constraint, one might prefer a more exhaustive explanation, while when the user needs to quickly make a decision, it is preferable to have an explanation “easy to read”. Thus an explanation method must consider time limitations. Types of Explanations. Research on XAI is producing various alternatives. Explanation methods differ one from another depending on the type of explana- tion returned. In the following, we illustrate the most used types of explanations and highlights how explanation methods build them. – List of Rules. An explanation returned in the form of a list of rules implies that rules are read one after the other, and the first rule for which the conditions are verified is used for prediction. Rules are in form of if-then rule: if conditions, then consequent the consequent corresponds to the prediction, while the conditions explain the factual reasons for the consequent. The CORELS method [3] is a transparent by design method able to build a list of rules with the aim of globally replacing the black-box model. A compact set of rules is returned by the transparent predictive method proposed in [21]. – Single Tree Approximation. The black-box predictor is approximated with a decision tree that represents all the possible decisions. The TREPAN explanation method [12] allows to globally explore a Neural Network through a tree structure that, starting from a root, shows for every path the condi- tions driving the decision process. TREPAN retrieves the decision tree by maximizing a gain ratio [36] calculated on the fidelity with respect to the predictions of an obscure Neural Network. – Rule-based Explanation. A single if-then rule is used for local explana- tions. The conditions of the rule explain the factual reasons for the pre- 9 diction. The LORE explanation method [17] builds a local decision tree in the neighborhood of the instance analyzed, and then extracts from the tree a single rule revealing the reasons for the decision on the specific instance. The ANCHOR method [30] returns if then rules called anchors. An anchor con- tains a set of attributes with the values which are fundamental for obtaining a certain prediction. – Features Importance. A feature importance-based local explanation con- sists of attributes equipped with positive and negative values. The explana- tion consists of both the sign and the magnitude of the contribution of the attributes for a specific prediction. If the value is positive, then it contributes by increasing the model’s output, if the sign is negative, it decreases the out- put of the model. LIME [29] adopts a linear model as the interpretable local surrogate and returns the importance of the features as an explanation ex- ploiting the regression’s coefficients. SHAP [23] provides the local unique additive feature importance for a specific record exploiting shapely values. – Saliency Maps. In image processing, typical explanations consist of saliency maps, i.e., images that show the positive (or negative) contribution of each pixel to the black-box prediction. Saliency maps are built for locally explain- ing DNN models by gradient [34, 35] and perturbation-based [6] attribution methods. These explanation methods assign a score to each pixel such that it is maximized the probability of returning the same answer without consider- ing irrelevant pixels. Under appropriate image transformations that exploit the concept of “superpixels” also methods such as LORE and LIME can be employed to explain black-box working on images. – Prototype-based Explanations. An explanation based on prototypes re- turns specimens similar to the instance analyzed, which makes clear the rea- sons for the prediction. Prototype-based explanations can refer to any type of data. In [11, 22], image prototypes are used as the foundation of the concept for interpretability [8]. In [20] is discussed the concept of counter-prototypes called criticisms for tabular data, i.e., prototype showing what should be dif- ferent to obtain another decision. Exemplar and counter-exemplars synthetic images are generated by the ABELE explanation method [16] to augment the interpretability of local saliency maps. – Counterfactual Explanations. A counterfactual explanation shows what should have been different to change the prediction of the black-box model. Counterfactuals help people in reasoning on the cause-effect relations be- tween observed features and classification outcomes [4, 10] and reveal what should change in a given instance to obtain a different prediction [37]. The ex- planation method proposed in [38] returns counterfactual explanations that describe the smallest change that can be made to a given instance to obtain a certain outcome by solving an optimization problem. The aforementioned LORE [17], besides a factual explanation rule, also provides a set of coun- terfactual rules extracted from the local decision tree, while ABELE [16] returns synthetically generated counter-exemplar images. 10 3 Conclusion AI systems based on obscure ML models cannot be the long term solution for any real application, especially those involving humans with the final predictions. Re- search on XAI has strong ethical motivations aimed at empowering users against undesired, possibly illegal, effects of black-box automated decision-making sys- tems. Different types of explanations, and different explanation methods, permits to retrieve the logic of machines, which can be completely different from the logic of humans and resolve unexpected bugs and issues. However, despite recent developments on XAI some questions remain open. Are the existing explanation methods useful for the realization of the right of explanation declared in the GDPR? Can the actual explanation methods ef- fectively be exploited by business companies for the industrial development of explainable AI services and products? Are explanation methods able to reveal forms of discrimination towards vulnerable social groups, and are their immune from other algorithmic bias and artifacts in the data? Only when these ques- tions will have a positive answer, the research on explanation methods would have reached a satisfactory level. Acknowledgment This work is partially supported by the European Community H2020 programme under the funding schemes H2020-INFRAIA-2019-1: Res. Infr. G.A. 871042 So- BigData++, (sobigdata.eu), G.A. 952026 Humane AI-Net, (humane-ai.eu). References 1. A. Adadi and M. Berrada. Peeking inside the black-box: A survey on explainable artificial intelligence (xai). IEEE Access, 6:52138–52160, 2018. 2. Y. A. A. S. Aldeen, M. Salleh, and M. A. Razzaque. A comprehensive review on privacy preserving data mining. SpringerPlus, 4(1):694, 2015. 3. E. Angelino, N. Larus-Stone, D. Alabi, M. Seltzer, and C. Rudin. Learning certi- fiably optimal rule lists. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 35–44. ACM, 2017. 4. A. Apicella, F. Isgrò, R. Prevete, and G. Tamburrini. Contrastive explanations to classification systems using sparse dictionaries. In International Conference on Image Analysis and Processing, pages 207–218. Springer, 2019. 5. A. B. Arrieta, N. Dı́az-Rodrı́guez, J. Del Ser, et al. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information Fusion, 58:82–115, 2020. 6. S. Bach, A. Binder, et al. On pixel-wise explanations for non-linear classifier deci- sions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015. 7. U. Bhatt, A. Xiang, S. Sharma, A. Weller, A. Taly, Y. Jia, et al. Explainable ma- chine learning in deployment. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pages 648–657, 2020. 8. J. Bien and R. Tibshirani. Prototype selection for interpretable classification. The Annals of Applied Statistics, 5(4):2403–2424, 2011. 11 9. T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems, pages 4349–4357, 2016. 10. R. M. Byrne. Counterfactuals in explainable artificial intelligence (xai): evidence from human reasoning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 6276–6282, 2019. 11. C. Chen, O. Li, A. Barnett, J. Su, and C. Rudin. This looks like that: deep learning for interpretable image recognition. arXiv:1806.10574, 2018. 12. M. Craven and J. W. Shavlik. Extracting tree-structured representations of trained networks. In Advances in neural information processing systems, pages 24–30, 1996. 13. F. Doshi-Velez and B. Kim. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017. 14. A. A. Freitas. Comprehensible classification models: a position paper. ACM SIGKDD explorations newsletter, 15(1):1–10, 2014. 15. B. Goodman and S. Flaxman. Eu regulations on algorithmic decision-making and a “right to explanation”. In ICML workshop on human interpretability in machine learning (WHI 2016), New York, NY. http://arxiv. org/abs/1606.08813 v1, 2016. 16. R. Guidotti, A. Monreale, et al. Black box explanation by learning image exemplars in the latent feature space. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 189–205. Springer, 2019. 17. R. Guidotti, A. Monreale, F. Giannotti, D. Pedreschi, S. Ruggieri, and F. Turini. Factual and counterfactual explanations for black box decision making. IEEE Intelligent Systems, 2019. 18. R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi. A survey of methods for explaining black box models. ACM computing surveys (CSUR), 51(5):1–42, 2018. 19. R. Guidotti and S. Ruggieri. On the stability of interpretable models. In 2019 International Joint Conference on Neural Networks, pages 1–8. IEEE, 2019. 20. B. Kim, O. O. Koyejo, and R. Khanna. Examples are not enough, learn to criti- cize! criticism for interpretability. In Advances In Neural Information Processing Systems, pages 2280–2288, 2016. 21. H. Lakkaraju et al. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD International Confer- ence on Knowledge Discovery and Data Mining, pages 1675–1684. ACM, 2016. 22. O. Li, H. Liu, C. Chen, and C. Rudin. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In Thirty- second AAAI conference on artificial intelligence, 2018. 23. S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. In Advances in neural information processing systems, pages 4765–4774, 2017. 24. G. Malgieri and G. Comandé. Why a right to legibility of automated decision- making exists in the General Data Protection Regulation. International Data Privacy Law, 7(4):243–265, 2017. 25. D. Martens, B. Baesens, T. Van Gestel, and J. Vanthienen. Comprehensible credit scoring models using rule extraction from support vector machines. European journal of operational research, 183(3):1466–1476, 2007. 26. T. Miller. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267:1–38, 2019. 27. F. Pasquale. The black box society. Harvard University Press, 2015. 28. D. Pedreschi, F. Giannotti, R. Guidotti, A. Monreale, S. Ruggieri, and F. Turini. Meaningful explanations of black box ai decision systems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 9780–9784, 2019. 12 29. M. T. Ribeiro et al. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016. 30. M. T. Ribeiro, S. Singh, and C. Guestrin. Anchors: High-precision model-agnostic explanations. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), 2018. 31. A. Romei and S. Ruggieri. A multidisciplinary survey on discrimination analysis. The Knowledge Engineering Review, 29(5):582–638, 2014. 32. C. Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. NMI, 1(5):206–215, 2019. 33. C. Rudin and J. Radin. Why are we using black box models in ai when we don’t need to? a lesson from an explainable ai competition. HHDSR, 1(2), 2019. 34. A. Shrikumar et al. Not just a black box: Learning important features through propagating activation differences. arXiv:1605.01713, 2016. 35. K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional net- works: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. 36. P.-N. Tan et al. Introduction to data mining. Pearson Education India, 2006. 37. S. Wachter, B. Mittelstadt, and L. Floridi. Why a right to explanation of auto- mated decision-making does not exist in the general data protection regulation. International Data Privacy Law, 7(2):76–99, 2017. 38. S. Wachter, B. Mittelstadt, and C. Russell. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. HJLT, 31:841, 2017. 39. Y. Wang and M. Kosinski. Deep neural networks are more accurate than humans at detecting sexual orientation from facial images. JPSP, 114(2):246, 2018. 40. Y. Zhang and X. Chen. Explainable recommendation: A survey and new perspec- tives. arXiv preprint arXiv:1804.11192, 2018. 13