Towards a Formalization of Explanations for Robots’ Actions and Beliefs1 Felix LINDNER Institute of Artificial Intelligence, Ulm University Abstract. A robot’s capacity to self-explain its behavior is a means of ensuring trust. This work presents a preliminary formal characterization of explanations em- bracing the distinction between explanations based on counterfactuality and those based on regularity. It also distinguishes generative and instrumental explanations. The formalization will guide future work on explanation generation, explanation sharing, and explanation understanding in human-robot interaction. Keywords. Explainable AI, Ontology, Human-Robot Interaction 1. Introduction Robots are complex systems composed of software components for perception, motion control, and decision making. Humans often fail to correctly interpret the behavior of robotic systems, cf., [1]. This lack of interpretability of robot behavior may lead to a loss of trust and thus hinders acceptance and adoption of new technology. One way of mitigating trust loss are explanations [1,2]: Robotic systems should be able to self-explain their beliefs and behavior and to self-introduce their capabilities. Current approaches to explanation generation in robotics are often tailored to specific AI components. For example, explanation of kinematic capabilities are explained by sen- soric and motor components [3], explanation of image classifications are explained by highlighting important portions of the image [4], explanations of actions are explained by their contribution to maximizing some metric [5,6] or their contribution to the achieve- ment of specific goals [7], and a robot’s ethical judgments can be explained by morally relevant aspects of an ethical problem [8]. The emerging field of explainable AI is developing fast and generates a highly di- verse landscape of solutions. A strongly required research agenda consists in conceptual work on systematizing and unifying the concept of explanation, and eventually coming up with a format for formally representing individual explanations for the sake of com- munication between systems and systems, as well as systems and humans. For the sake of this paper, I will focus on explanations for why a robot executes an action and for why a robot maintains some belief. I anticipate that a user of a robotic system will be interested in knowing why a robot exhibits some observable behavior. The explanation most likely will consist in statements of belief, such as “because the 1 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). robot believes doing so is morally permissible”, “because the robot believes this is the shortest plan”, or “because the robot believes you want something to drink”, and the user will sometimes be interested in explanations for why the robot believes in these beliefs. Generally, there are also explanations for how something works. Explanations as a response to how questions are relevant to human-robot interaction, as well: When the robot explains itself can bring about some state of affairs, the user might ask for an explanation of how the robot is going to achieve that. For the time being, however, I will only make an attempt to formally characterize explanations for why questions. 2. Background and Related Work There is a significant body of research on explanations in various disciplines as diverse as philosophy of science, psychology, linguistics, and artificial intelligence. Given limited space, I will concentrate on recent work on explanation in AI. Research on ontology modeling is interested in supporting modelers by automati- cally providing explanations for why the modeled ontology entails a formula (the ex- planandum). The solution amounts to identifying a minimal subset (the explanans) of the set of axioms in the ontology such that the explanans entails the explanandum [9]. The explanans is also called a justification. The concept of a repair is dual to justifica- tion and denotes subsets of the ontology to be removed such that the explanandum is not entailed by the altered ontology. Hence, justifications are sufficient conditions for the explanandum and repairs are necessary ones. In the explainable AI planning community [5,7], the idea that goals and performance metrics are explanations of actions is prevalent: According to this view, explaining why an action (explanandum) was executed amounts to identifying the goals (explanans) the action contributed to. Alternatively, also performance metrics under which the action was preferred, or prior beliefs that necessitate the execution of the action can be explanans. In machine ethics, one is interested in moral reasons for actions [8]. Here the goal is to identify aspects of a situation that are necessary or sufficient conditions for an action to be rendered permissible by an ethical principle. In machine learning, explanation generation methods are designed to explain black-box classifiers. These approaches generate sets of features that were necessary [10] or sufficient [11] for the observed classification. There already exists a proposal for a formal characterization of explanation by Tiddi and colleagues [12]. The authors present an ontology design pattern and show how it is instantiated by various explanation concepts in different research fields such as psy- chology, philosophy, and computer science. The ontological formalization is in large part similar to the one I propose. A difference is that my formalization stresses the contrastive nature of explanation [13,14]. Based on contrastivity, I define the distinction between explanations based on counterfactuality and those based on regularity. Moreover, by con- necting explanation to causality, the distinction between generative and instrumental ex- planations will be defined. 3. Formalization The following axioms are meant to characterize the concept of explanation and to de- fine four sub-concepts: counterfactual explanations, regularity explanations, generative explanations, and instrumental explanations. These distinctions result from an attempt to classify current work on explainable AI. Determining the empirical significance of these distinctions to human-robot interaction remains an open question. 3.1. Explanandum and Explanans An explanation has an explanandum (Exum) and an explanans (Exans) (1). The ex- planandum is the entity to be explained, and the explanans is the entity that is meant to explain the explanandum. The entities that play the role of explanandum and explanans play this role in the context of an explanation, Axioms 2 and 3. ∀x(Ex(x) → ∃y, z(Exum(x, y) ∧ Exans(x, z))) (1) ∀x, y(Exum(x, y) → Ex(x)) (2) ∀x, y(Exans(x, y) → Ex(x)) (3) The axioms leave open which kinds of entities play the role of explanandum and explanans. In fact, the fillers may be as diverse as events, facts, situations, theories, phe- nomena, or objects. 3.2. Origin and Contrast The explanans is part of a situation, which is called here the origin of the explanation (Axioms 4 and 5). For instance, if the robot explains “I am swerving left because people are standing ahead of me”, the origin is the situation made of everything that is currently perceived by the robot including the people standing ahead of. When explaining the output of a neural network classifier, the origin is the whole input vector fed into the neural network. The origin gives rise to (Grt) the explanandum (Axiom 6). The gives- rise-to relation is three-placed relating the explanation, the origin, and the explanans. The reason is that in principle the origin can give rise to many different things. Consider the case when the origin is the mereological sum of predicates describing a person. In the context of a loan classifier, this description may give rise to the decision that the loan is rejected. In the context of a personality classifier, the same origin may give rise to the decision that the person is agreeable. In a given explanation context, only a fraction of these relations is usually relevant (in most cases only one). ∀x, y(Exans(x, y) → ∃z(P(y, z) ∧ Origin(x, z))) (4) ∀x, y(Origin(x, y) → Ex(x)) (5) ∀x, y, z((Origin(x, y) ∧ Exum(x, z)) → Grt(x, y, z)) (6) In line with Lipton [13] and Miller [14], explanations are taken to be contrastive. If the autonomous car explains “I turn left because this is the fastest route”, then this implies the existence of a contrast situation. The contrast may either be a situation in which turning left is not optimal (“If it were not optimal, I would not turn left.”) or another situation in which turning left is also optimal (“Whenever turning left is optimal, I turn left.”), see Section 3.4 for more on this distinction. The contrast relation is also three-placed. This is necessary because the contrast must—besides being a plausible contrast to the origin—respect the context of the explanation. Consider how to respond to the question “Why does the robot go into the kitchen?” The contrast could either be a situation which gives rise to “Lisa goes into the kitchen” or another situation which gives rise to “the robot goes into the dining room.” Which one of these options is a valid contrast depends on the nature of the explanation, viz., if it explains why the robot rather than Lisa goes into the kitchen, or why the robot goes into the kitchen rather than into the dining room. Axiom 7 asserts the existence of a contrast, and Axiom 8 ensures that contrasts are relative to explanations and relate two different entities. ∀x, y(Origin(x, y) → ∃z(Contrast(x, y, z))) (7) ∀x, y, z(Contrast(x, y, z) → (Ex(x) ∧ y 6= z)) (8) 3.3. Regularity and Counterfactuality Hume [15] gives two definitions of causality when he writes: [. . . ] we may define a cause to be an object, followed by another, and where all the objects similar to the first are followed by objects similar to the second. Or in other words where, if the first object had not been, the second never had existed. The first part of the quote requires a relation of regularity between the cause and the effect, while the second part of the quote requires a relation of counterfactuality. Both these views on the relationship between cause and effect can also be observed to apply to explanation. One example from the literature on black-box explanations in AI is the difference between explanations based on anchors [11] and those based on counterfactuals [10]. Anchors are sufficient sets of predicates that give rise to the explanandum, while counterfactual explanations highlight necessary features. As stated above, in the research area of ontology debugging, there is the distinc- tion between justifications and repairs [9]. Justifications are minimal subsets of the ontol- ogy that are individually sufficient to entail the explanandum, while repairs are minimal subsets of the ontology that are necessary for the entailment of the explanandum. As for explanations based on necessity, Axiom 9 defines counterfactual explana- tions. An explanation counts as a counterfactual explanation if and only if the origin and the contrast give rise to different outcomes. Consider again the case of the robot swerving left because it has perceived people ahead. A counterfactual explanation could be framed like this: “I swerve left because there are people ahead. If there were no people ahead, I would move straight.” The origin of this explanation is the situation where people are ahead, and this situation gives rise to swerving left. The contrast is a situation where the people are not present (or standing somewhere else), and this situation gives rise to moving straight. The explanans is thus a necessary condition for the explanandum. Next, one can ask what is the difference between the origin and the contrast. To de- note this difference, a function Minus is used that computes the mereological difference (cf., [16]) between the origin and the contrast. For counterfactual explanations, the ex- planans must be part of this difference, as expressed by Axiom 10. In the robot example, the origin may be the mereological sum of people standing ahead of and a plant being on the left. The contrast may be just the fact that a plant is on the left. The mereologi- cal difference between origin and contrast is the fact that people are ahead, and this is already the explanans. Usually, one will require that contrasts are minimal perturbations of origins. However, this minimality constraint is not formalized here. ∀x(CEx(x) ↔ ∃y, z, y0 , z0 (Origin(x, y) ∧Contrast(x, y, z) ∧ (9) 0 0 0 0 Grt(x, y, y ) ∧ Grt(x, z, z ) ∧ y 6= z )) ∀x, y, y , z, z ((Origin(x, y) ∧Contrast(x, y, z) ∧ Exans(x, x0 ) ∧ 0 0 (10) Grt(x, y, y0 ) ∧ Grt(x, z, z0 ) ∧ y0 6= z0 )) → P(x0 , Minus(y, z)) If one wants to explain why one does or believes something in some situation, one can also refer to another situation which is stable with respect to the explanans and state that one does or believes the same. Instead of giving a counterfactual explanation to explain swerving left, the robot from the swerving example could explain its behavior by citing a regularity: “I swerve left because I have perceived people ahead. Also in other situations when people are ahead, I use to swerve left.” This time, the origin and the contrast give rise to the same behavior, which is seen as definitory for a regularity explanation as defined by Axiom 11. In the case of regularity explanations, the explanans is part of the mereological product [16] of the origin and the contrast, see Axiom 12. In the robot example, the origin again is the situation that there are people ahead and there is a plant to the left. The contrast situation might be the sum of the facts that there are people ahead, a plant to the left, and that the sun is shining. The product then is the sum of the facts that there are people ahead and a plant to the left, and clearly, the explanans is part of this product. ∀x(REx(x) ↔ ∃y, z, y0 , z0 (Origin(x, y) ∧Contrast(x, y, z) ∧ (11) Grt(x, y, y0 ) ∧ Grt(x, z, z0 ) ∧ y0 = z0 )) ∀x, x0 , y, y0 , z, z0 ((Origin(x, y) ∧Contrast(x, y, z) ∧ Exans(x, x0 ) ∧ (12) Grt(x, y, y0 ) ∧ Grt(x, z, z0 ) ∧ y0 = z0 )) → P(x0 , Product(y, z)) Nothing prevents an explanation from being counterfactual and regular at the same time. Indeed, often very good candidates for explanans are both necessary and sufficient [8]. In this case, an origin has two contrasts one of which gives rise to something different and a second one giving rise to the same. 3.4. Generativity, Instrumentality, and Causality Besides the distinction between counterfactual and regularity explanations, there is an- other distinction to observe in explainable AI literature: I call it the distinction between generative explanations and instrumental explanations. Again consider the robot swerving left. The generative type of explanation may read “I swerve left because I perceive people ahead. If there were no people ahead, I would not swerve left.” In this case, the explanans “I perceive people ahead” causes the ex- planandum “I swerve left”. Contrast this to the explanation “I swerve left because this way I avoid bumping into the people. If I did not swerve left, then I would bump into the group of people.” This time the explanandum “I swerve left” causes the explanans “I avoid bumping into the people”. Definitions 13 and 14 take the direction of causality to be definitory for the distinction between generative explanations (GEx) and instrumen- tal explanations (IEx). One could also say that in the case of generative explanations, the explanans grounds the explanandum, i.e., the explanandum holds in virtue of the explanans. Conversely, in the case of instrumental explanations, the explanandum holds for the purpose of the explanans. Moreover, in the case of instrumental explanations, it is required—via Axiom 15—that the origin gives rise to something which is preferred over what is given rise to by the contrast. In the robot navigation example, avoiding the people is preferred over bumping into the people. ∀xGEx(x) ↔ ∃y, z(Exum(x, y) ∧ Exans(x, z) ∧Causes(z, y)) (13) ∀xIEx(x) ↔ ∃y, z(Exum(x, y) ∧ Exans(x, z) ∧Causes(y, z)) (14) ∀x(IEx(x) → ∃y, y0 , z, z0 (Origin(x, z) ∧Contrast(x, y, z) ∧ (15) 0 0 0 0 Grt(x, y, y ) ∧ Grt(x, z, z ) ∧ z > y )) As the robot swerving example shows both types of explanations are possible when a robot seeks to explain its actions. Both types of explanations are also possible when a robot seeks to explain its beliefs: One can slightly change the above why-question: Instead of asking why the robot swerves left, one asks why the robot believes that it must swerve left. The above explanations still work. However, this might be the case just because this belief essentially is about an action. Consider a robot believing in the afterlife. A generative explanation of this belief may be given by other beliefs, e.g., the robot may say it believes in everything written in the bible. An instrumental explanation, on the other hand, would cite some value this belief has to the robot. For instance, it could say that believing in the afterlife makes itself more optimistic overall, or that this belief helps the robot to console those who have lost loved ones. Even if one hesitates to ascribe beliefs of this kind to robots, a robot still should have the capability to represent and reason about such beliefs because humans the robot will interact with are likely to give explanations of these kinds. Moreover, in application areas where a robot simulates human reasoning to adapt its behavior, it can be of value to simulate such beliefs. 4. Conclusions There has already been quite some work on generating explanations in AI. Much less work has been done on representing and communicating explanations—neither between systems and humans nor between different components within a system. The preliminary formalization presented in this paper is meant as a first step towards filling this gap. A conceptual distinction between four types of explanations, viz., counterfactual, regular, generative, and instrumental, has been defined. Future work will target at deepening the analysis of explanation and linking it to formalizations of causality and preferences. Hav- ing a richer formal theory of explanation will also guide the development of algorithmic solutions to explanation generation, explanation fusion, explanation verbalization, and explanation understanding. These problems are considered subjects of future work with practical significance to human-robot interaction, viz., robots representing and exchang- ing explanations for actions and beliefs with humans, and also robots understanding and reasoning about humans’ explanations. References [1] Tolmeijer S, Weiss A, Hanheide M, Lindner F, Powers TM, Dixon C, Tielman ML. Taxonomy of trust- relevant failures and mitigation strategies. In Proceedings of HRI’20, 2020. [2] de Graaf M, Malle B. How people explain action (and autonomous intelligent systems should too). AAAI Fall Symposium Series, 2017. [3] Kunze L, Roehm T, Beetz M. Towards semantic robot description languages. In Proceedings of 2011 IEEE International Conference on Robotics and Automation, 2011, pp. 5589–5595. [4] Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative local- ization. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921–2929. [5] Fox M, Long D, Magazzeni D. Explainable planning. In IJCAI 2017 Workshop on Explainable Artificial Intelligence (XAI), 2017. [6] Krarup B, Krivic S, Lindner F, Long D. Towards contrastive explanations for comparing the ethics of plans. In ICRA 2020 Workshop Against Robot Dystopias: Thinking through the ethical, legal and societal issues of robotics and automation (AGAINST-20), 2020. [7] Broekens J, Harbers M, Hindriks K, Van Den Bosch K, Jonker C, Meyer JJ. Do you get it? User- evaluated explainable BDI agents. In German Conference on Multiagent System Technologies, 2010, pp. 28–39. [8] Lindner F, Möllney K. Extracting reasons for moral judgments under various ethical principles. In KI 2019: Advances in Artificial Intelligence, 2019, pp. 216–229. [9] Horridge M. Justification based explanation in ontologies. Ph.D. thesis, University of Manchester, UK (2011) [10] Sandra W, Mittelstadt B, Russell C. Counterfactual explanations without opening the black box: Auto- mated decisions and the GDPR. Harvard Journal of Law & Technology 1(2), 2018. [11] Ribeiro MT, Singh S, Guestrin C. Anchors: High-precision model-agnostic explanations. In AAAI Con- ference on Artificial Intelligence, 2018. [12] Tiddi I, D’Aquin M, Motta E. An ontology design pattern to define explanations. In K-CAP 2015: Proceedings of the 8th International Conference on Knowledge Capture, 2015, pp. 1–8. [13] Lipton P. Inference to the best explanation. London and New York: Routledge (1991) [14] Miller T. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence 267, 2019. [15] Hume D. An Enquiry Concerning Human Understanding (1748) [16] Varzi A. Mereology. Stanford Encyclopedia of Philosophy, 2016.