The Effect of Explanation Styles on User’s Trust Retno Larasati Anna De Liddo Enrico Motta retno.larasati@open.ac.uk anna.deliddo@open.ac.uk enrico.motta@open.ac.uk Knowledge Media Institute Knowledge Media Institute Knowledge Media Institute The Open University The Open University The Open University UK UK UK ABSTRACT that a key role in calibrating trust can be played by the way in which This paper investigates the effects that different styles of textual explanation is expressed and presented to the users. Explanation explanation have on explainee’s trust in an AI medical support style and modalities affect users’ trust toward algorithmic systems scenario. From the literature, we focused on four different styles sometime improving sometime reducing trust [12][25]. This paper of explanation: contrastive, general, truthful, and thorough. We aims to investigate the relation between explanation and trust by conducted a user study in which we presented explanations of a exploring different explanation styles. We first conducted a litera- fictional mammography diagnosis application system to 48 non- ture review in psychology, philosophy, and information systems, expert users. We carried out a between-subject comparison between to understand what are the characteristics of meaningful expla- four groups of 11-13 people each looking at a different explanation nations. We then designed several styles of explanation based on style. Our findings suggest that contrastive and thorough explana- these characteristics. Since we are interested in assessing the effects tions produce higher personal attachment trust scores compared to of explanation styles on users’ trust, we also defined a variety of general explanation style, while truthful explanation shows no dif- trust components to measure users’ trust levels. Our proposed trust ference compared to the rest of explanations. This means that users measurement was gathered from the literature in human factors who received contrastive and thorough explanation types found and HCI research. Finally we carried out a user study to see if the explanation given significantly more agreeable and suiting their any specific explanation style differently affects users’ trust. Our personal taste. These findings, even though not conclusive, confirm contribution is twofold: the impact of explanation style on users trust towards AI systems (1) we provide evidence which confirms the effect of explanation and may inform future explanation design and evaluation studies. styles on different trust factors; (2) we propose a reliable human-AI trust measurement (Cron- CCS CONCEPTS bach’s α=0.88) to investigate explanation and trust in health- • Human-centered computing → Human Computer Interaction. care. Thie rest of the paper is organized as follows: Section 2 introduces KEYWORDS the context of this research and summarises the relevant literature. Explanation, Trust, Explainable Artificial Intelligence Section 3 describes the methodology of this study. Section 4 and 5 ACM Reference Format: presents and analyse the results from the study. Finally, Section 6 Retno Larasati, Anna De Liddo, and Enrico Motta. 2020. The Effect of Expla- discusses the limitation of this work and outlines the next steps of nation Styles on User’s Trust. In Proceedings of IUI workshop on Explainable the research. Smart Systems and Algorithmic Transparency in Emerging Technologies (ExSS- ATEC’20). , 6 pages. 2 BACKGROUND AND RELATED WORK 1 INTRODUCTION 2.1 Explanation Explanation can be seen as an act or a product and can be cate- One of the main arguments motivating Explainable Artificial Intel- gorised as good or bad. A good explanation is an explanation that ligence research is that the explicability of AI systems can improve feels right because offers a phenomenologically familiar sense of people’s trust and adoption of AI solutions [8][27]. Still, the rela- understanding [1]. In this paper, we focus on meaningful explana- tionships between trust and explanation is complex, and it is not tion, to stress our interest and focus on the explanation’s capability always the case that explicability improves users’ trust. Trust in to improve understanding and sense-making of AI and algorithmic AI systems is claimed to be enhanced by transparency [11] and results. As such good explanations are not explanations that neces- understandability [17]. In order to gain understandability, an AI sarily improve trust, and can affect user’s trust both ways, by either system should provide explanations that are meaningful to the ex- improving or moderating trust. plainee(someone who received explanation). Providing meaningful We might ask what is meaningful explanation? There is no single explanations could then support users to appropriately calibrate definition of meaningful explanation. Guidotti et al. defined mean- trust, by improving trust (when they tend to down-trust the system) ingful explanation as explanation that is faithful and interpretable and mitigating over-trust issues [31]. Previous research has shown [7]. Thirumuruganathan et al. defined meaningful explanation as Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons explanation that is personalised based on users’ demographic [29]. License Attribution 4.0 International (CC BY 4.0). Regulators have also mentioned meaningful explanation. GDPR ExSS-ATEC’20, March 2020, Cagliari, Italy, © 2020 Articles 13–15 state that users have the right to receive ‘meaningful information about the logic involved’ in automated decisions, but ExSS-ATEC’20, March 2020, Cagliari, Italy, Larasati, et al. it fails to provide any specific definition of what is to be considered Table 1: Characteristics of Meaningful Explanation ’meaningful information’. In this paper, we will refer to meaningful explanation as explanation that is understandable. Explanation Description In cognitive psychology, explanation can be classified into dif- contrastive the cause of something relative to some ferent types: i. Causal explanation, which tells you what causes other thing in contrast [16][10][22] what, ii. Mechanical explanation, which tells you how a certain domain/role dependent pragmatic and relative to the back- phenomenon comes about, and iii. Personal explanation which tells ground context [10][20] you what causes what in the context of personal reasons or beliefs general simpler and broad explanation is prefer- [32]. Approaching these definitions from an explainable AI and AI able [26][18] reasoning angle, we could say that causal and mechanical expla- social/interactive people explain to transfer knowledge, nation could be the same, because the causal explanation of an AI thus can be a social exchange [10][22] system is mechanical by definition. For instance, if we ask why the truthful how truthful each elements in an expla- AI system gives us a certain prediction, the answer will consist of nation is with respect to the underlying an illustration of the AI’s mechanical process, which produced that system [14] prediction result. Personal explanation might also not be relevant, thorough describes all of the underlying system since all AI "personal" explanations are defined in terms of what [14] causes what in the context of a specific AI reasoning mechanism. Therefore, in what follows we will focus on causal explanation. Hilton proposed that causal explanation proceeds through the describes all of the underlying system. Completeness is argued operation of counterfactual and contrastive criteria [10]. Lipton to positively affect user understandability [13]. Even though both suggested that "to explain why P rather than Q, we must cite a causal of Kulesza’s studies used explanation in the case of a music rec- difference between P and not-Q, consisting of a cause of P and the ommender system, we think that being truthful (soundness) and absence of a corresponding event in the history of not-Q” [16]. Miller thorough (completeness) are key characteristics of explanations to quoted Lipton and argued that everyday explanations, or human be further explored. Building on the literature reviewed above, we explanations, are “sought in response to particular counterfactual therefore distilled 6 key characteristics of meaningful explanation, cases. [...]people do not ask why event P happened, but rather why that are defined in Table 1. event P happened instead of some event Q” [22]. Causal explanation happens through several processes [10]. First, 2.2 Explanation and User’s Trust there is information collection: a person gathers the information There is arguably a relation between explanation and users’ trust. available. Second, a causal diagnosis takes place: a person tries According to the Defense Advanced Research Projects Agency to identify a connection between two events/instances based on (DARPA), Explainable AI is essential to enable human users to un- the information. Third, there is causal selection, a person dignifies derstand and appropriately trust a machine learning system [8]. Pre- a set of conditions as "the explanation". This selection process is vious studies proposing different types of explanation [27][2][9][14] influenced by the information gathered and the domain knowledge further cemented the claim that explanations improves user trust of a person [20]. This means that what people consider acceptable [30][24][5]. and understandable is selected from the information provided and However, users’ trust could be misplaced and lead to over-reliance depends on people’s own domain knowledge or role. According to or over-trust. In a healthcare scenario, a doctor could unknowingly Lambrozo, explanations that are simpler are judged more likely to be trust a technologically complex laboratory diagnostic test that incor- believed and more valuable [18] and another study also highlighted rectly calibrated and misdiagnosed patients [4]. Previous research that users prefer a combination of simple and broad explanations suggests that giving explanation could help users to moderate their [26]. trust level [31], either by providing explanation as system’s accu- As mentioned previously, explanation can be seen as an act or racy [33][23] or as system’s confidence level [15]. On one hand, can be seen as a product. Explanation as an act involves the interac- these findings are not applied to healthcare. Hence, while system’s tion between one or more explainer and explainee [22]. According accuracy and system’s confidence level might be highly affecting to Hilton, explanation is understandable only when it involves ex- users’ trust in dating app [33], or context aware app [15], it is un- plainer and explainee engaging in information exchange through clear if that would be the case in a healthcare scenario. On the dialogue, visual representation, or other communication modalities other hand, in the healthcare/medical domain, Bussone et al. found [10]. This statement implies that static explanations could be harder that a high system’s confidence level had only a slight effect on to understand because they could be less engaging and would not over-reliance [3]. involve a dynamic interchange between explainer and explainee. To There are a number of ways to present an explanation. For exam- achieve meaningful explanation, a social (interactive) characteristic ple, a study mentioned above, used accuracy level as explanation. It of explanation needs to be taken into account. is important to know, what kind of style we are going to present our Previous research also showed that participants place the highest explanation. Research found that explanation style and modalities trust in explanations that are sound and complete [14]. Soundness affect users trust toward algorithmic systems, with the result that here means nothing but the truth, how truthful each element in an this can either improve or decrease [12][25]. explanation is with respect to the underlying system. Completeness In addition, in each of the reviewed studies trust was measured here means the whole truth, the extent to which an explanation differently, hence the results are hard to compare and do not provide ExSS-ATEC’20, March 2020, Cagliari, Italy, The Effect of Explanation Styles on User’s Trust a clear picture of the extent to which different styles of explana- Table 2: Human-AI Trust Measurement tion affect different types of trust. To better understand users’ trust towards an AI medical system, a more comprehensive trust mea- Trust Factors Description surement instrument is needed and will be explored in the next perceived technical ability system is perceived to perform the tasks section. accurately and correctly based on the information that is input. perceived reliability system is perceived to be, in the usual 2.3 Trust Measurement sense of repeated, consistent function- In general, there is quite a large literature presenting scales for ing. measuring trust. This paper will focus on identifying an appropriate perceived understandability user can form a mental model and pre- scale for the assessment of human trust in a machine prediction dict future system behaviour. system, which can be contextualised to a healthcare scenario. personal attachment user finds using the system agreeable, Some of the trust measurements reviewed from the automation preferable, suits their personal taste. literature are highly specific to particular application contexts. For faith user has faith in the future ability of the example, the scale developed by Schaefer [28] refers specifically to system to perform even in situations in the context of human reliance on a robot. The questions that are which it is untried. asked to users to measure trust are, for example: "Does it act as part perceived helpfulness user beliefs that the technology pro- of a team?" and "Is it friendly?". Another example of specific trust vides adequate, effective, and respon- measurement is the scale developed by Dzindolet, et al. [6]. It was sive help. created in the context of aerial terrain photography, showing images to detect camouflaged soldiers. The questions asked to measure trust in this case are for example: "How many errors do you think you will make during the 200 trials?". As these questions are very with Cronbach’s alpha > 0.89. In the proposed scale, trust with a specific to the task and the technical knowledge of the users in the specific technology was analyzed into three factors: perceived func- specific application context, it would be hard to translate them to a tionality, perceived helpfulness, and perceived reliability. Perceived healthcare scenario. functionality is users’ perceived capability of the system to prop- Madsen and Gregor [19] developed and tested a more generic erly accomplish its main function. Perceived helpfulness is users’ human-computer trust measurement instrument, with the focus on perception of the technology providing adequate, effective, and trust in an intelligent decision aid. A validity analysis conducted responsive help. Finally, perceived reliability means that the system of this instrument showed high Cronbach’s alpha results, which is perceived to operate continually or responding predictably to makes this scale promising to be tested in a different application inputs. field. Trust factors here are divided in two groups, cognitive based In our study we adopt a merged and modified version of the trust and affect based trust. Madsen and Gregor [19] conceptualise 9 trust items proposed by Madsen and Gregor and by McKnight. trust as consisting of five main factors: perceived reliability, per- From the total 9 trust items, that have been described above, we ceived technical competence, perceived understandability, faith, merged items that overlapped in meaning and modified some of and personal attachment. Perceived Technical competence means their descriptions into the final 6 trust metrics: perceived under- that the system is perceived to perform the tasks accurately and cor- standability, perceived reliability, perceived technical competence, rectly, based on the input information. Perceived Understandability faith, personal attachment, and helpfulness (See Table 2). means that the user can form a mental model and predict future system behaviours. Perceived Reliability means that the system is 3 METHODOLOGY perceived to be consistently functioning. Faith means that the user We aimed to test to what extent different types of textual explana- is confident in the future ability of the system to perform, even in tions affect different factors of users’ trust. In section 2.1 we have situations in which has never used the system before. Finally, per- identified 6 characteristics of meaningful explanation: contrastive, sonal attachment means that users find using the system agreeable, truthful, general, thorough, social/interactive, and role/domain- preferable, and that suits their personal taste. dependent explanations (see Table 1). We used these characteristics Some of these factors overlap with the trust factors identified to design distinctive textual explanations, and then presented them by McKnight [21]. McKnight provides an understanding of trust to users. Since we focus on a healthcare scenario, we used a drama- in technology in a wider societal context. McKnight [21] defines tising vignette to probe participants responses. We asked them to trust as consisting of three main components: propensity to trust read the explanation after reading the vignette and then run an on- general technology, institution-based trust in technology, and trust line survey asking them to rate different explanation types. To elicit in specific technology. In the context of this paper we only focus feedback on the explanation types we used the trust measurement on trust in a specific technology. McKnight [21] defines trust in mentioned above. a specific technology as a person’s relationship with a particular We designed a between-subjects study, in which different groups technology. Even if the study does not specifically target decision of users were each presented with a different explanation type. systems, the paper goes into a large literature and looks at different When designing the explanations, we focused on 4 out of the 6 object of trust, trust attributes, and their empirical relationships, explanation characteristics: contrastive, general, truthful, and thor- thus proposing a scale of trust which demonstrated good reliability ough. Social/interactive and role/domain-dependent characteristics ExSS-ATEC’20, March 2020, Cagliari, Italy, Larasati, et al. Table 3: Explanation Styles Characteristic Presented Explanation contrastive "From the screen image, Malignant lesions are present. Benign cases and fluid cyst looks hol- low and have a round shape. Your spots are not hollow and and have irregular shapes. There- fore, your spots are detected as Malignant." general "Based on your screen image, your spots are detected as Malignant. 19 in 20 similar images are in Malignant class." truthful "Using 5,600 of ultrasound images in our data- base, your image have 95% similarities with Ma- lignant cases." thorough "Malignant lesions are present at 2 sites, 30mm and 5mm. Non homogeneous. Non parallel. Not circumscribed. Your risk of breast cancer as; 30- 50 years old, cyst history, woman is increased 20%" were ignored at this stage for simplicity. In fact, these explana- tion styles could not be expressed with a textual description, and needed work on the UX design of the explanation type in order to be realized. Therefore the assessment of the effects of these two characteristics was left for future study. The AI system’s diagnosis tool described in the dramatising vignette was a fictional AI system Figure 1: Thorough Explanation for mammography diagnosis, used in a self managed health sce- nario. With the system users could upload images of self-scanned mammograms and then received a diagnosis result with an attached Participants were randomly assigned to 1 of the 4 conditions, with textual explanation. each condition being a different explanation type. The number of participants for each condition are not identical, with n 1 = 12, 3.1 Explanation Design n 2 = 12, n 3 = 11, and n 4 = 13. In order to design the explanation, we first tried to look at breast We asked participants to rate the AI system after having read cancer diagnosis report and several screening reports including the dramatizing vignette and to reflect on the 6 trust’s components ultrasound. Next, we designed the possible textual explanations while rating the explanation using a 7-points Likert scale. Following based on each characteristic definition in a small-scale informal a between-subject comparison of the results we were able to identify design phase. We then consulted the designed explanations with which explanation (if any) affects which of the 6 components of researcher outside this study and medical professional. The expla- trust, and to what extent. The overall aim of the study was to give nations were identical from a UI perspective, with one graphic and us insights on how different styles of linguistics explanations affect followed by the diagnosis and the explanation text. The explanation specific aspects of users’ trust. We also asked participants if they texts were designed to stress the four explanation characteristics: would have liked the presented explanation to be included in the contrastive, truthful, general, thorough. We also tried to present a AI system and explain why. balanced level of system’s capability, for example in general style: To analyse the data, we used ANOVA tests, followed by Tukey’s "19 in 20 similar images" and in truthful style: "95% similarities". The posthoc paired tests, to see the relative effects of different expla- explanation text presented to the participants can be seen in Table nation types. The ANOVA test tells us whether there is an overall 3 and how we presented it can be seen in Figure 1. difference between the groups, but it does not indicate which spe- cific groups differed. The Tukey’s post-hoc tests can confirm where 3.2 Data Collection and Analysis the difference occurred between specific groups. In addition, we The participants were recruited on Mechanical Turk, with a survey evaluated the trust measurement instrument, by using Cronbach’s set up using Google Form. Our target was initially 80 participants, Alpha. with 40 participants from the general public and 40 participants from worker in the healthcare field. We choose the option of "master 4 RESULTS worker" and added one check-in question in the survey, to maximise From the online survey data, we ran two ANOVA tests, to check participation quality and check if the participant read the vignette the explanation styles and the trust factors. In the first ANOVA test, carefully. The Mechanical Turk hits were up for a week, and in the we compared the 4 explanations types in relation to an average end, we got 48 participants (only 8 with some medical expertise). trust factor (calculated as median value between the 6 trust scores). ExSS-ATEC’20, March 2020, Cagliari, Italy, The Effect of Explanation Styles on User’s Trust We found that different styles of explanation significantly affect average trust values (pvalue=0.0033, α=0.05). We then ran a Tukey’s posthoc test, and found that general explanation show significantly lower trust scores compared to the rest of the explanation styles; contrastive, truthful, and thorough (α=0.05). The Tukey’s posthoc test analysis can be seen in Fig 2. Figure 4: Median of participant’s rating towards their expla- nation preference Figure 2: Tukey’s post hoc test in explanation styles presence of a clear rationale, and the use of lay terms, as the two In the second ANOVA test, we compared the four explanation distinctive factors motivating the high trust rating. In turn, the need styles for each trust factor, we therefore ran 6 comparisons and of a rationale for the AI result was also explicitly mentioned as a found that Personal Attachment was the only trust factor show- way to improve general explanation (by 4 out of 11 people in the ing significant difference (pvalue=0.02158, α=0.05). We then ran a general explanation style group mentioned rationale as a need). Tukey’s posthoc test for Personal Attachment, to identify where The trust measurement was tested using the overall data from the specific difference occurred, and found that contrastive and 48 participants. The reliability of the overall measurement was thorough explanation styles shows significant difference compared determined by Cronbach’s Aplha. We found that the alpha is quite to general explanation style (α=0.05). The Tukey’s posthoc test high, α=0.88. This is an encouraging result which may inform analysis can be seen in Fig 3. further use, testing and validation of the proposed human-AI trust measure in other healthcare applications. 5 DISCUSSION Our study confirms previous research indicating that different styles of explanation significantly affect specific trust factors. In particular we found that Personal Attachment (pvalue=0.02158) was signif- icantly affected by different textual explanation styles, and was highly rated by the groups that were presented with thorough and contrastive explanation styles. This means that among the par- ticipant, thorough and contrastive styles suited their taste more, compared to the general explanation style. This finding was corroborated by the additional comparison Figure 3: Tukey’s post hoc test in Personal Attachment of the 4 explanations by average trust ratings, which showed that general style explanation was significantly rated lower than the rest As mentioned above, other than trust scaling, we also asked of the explanation styles. Overall preferability scores also confirmed participants if they would like the explanation style presented to that general style explanation was rated the lowest. them to be included in the app for self managed health. We can see Participants seemed to prefer thorough and contrastive styles in Fig 4, contrastive, truthful, and thorough explanation styles are explanation because of the rationale provided, and because of the rated quite high (6 = very), while the general explanation style is layperson language used to provide the explanation. The need of ra- rated lower (5 = moderately). This assessment is consistent with the tionale was also suggested as a way to improve general explanation explanation style-trust analysis we did. In the analysis, it shows that style. general explanation is the least performing explanation in affecting However, further investigations about the extent to which expla- personal attachment. nation affects trust judgement need to be conducted. The current We also asked why participants preferred or not to receive the results are not conclusive and sufficient to develop an explanation explanation given to them. By qualitative analysing the 25 answers style and trust relation model. Additional studies to explore the from thorough and contrasting style groups, users reported the explanation mediums and interaction types are also necessary. ExSS-ATEC’20, March 2020, Cagliari, Italy, Larasati, et al. 6 LIMITATIONS AND FUTURE WORK [19] Maria Madsen and Shirley Gregor. 2000. Measuring human-computer trust. In 11th australasian conference on information systems, Vol. 53. Citeseer, 6–8. This preliminary study has several limitations that should be noted. [20] Bertram F Malle. 2006. How the mind explains behavior: Folk explanations, meaning, This is an exploratory study of quite a broad topic and we only and social interaction. Mit Press. [21] D Harrison Mcknight, Michelle Carter, Jason Bennett Thatcher, and Paul F Clay. conducted one online survey with low number of participants. The 2011. Trust in a specific technology: An investigation of its components and fact that some explanation styles did not show significantly different measures. ACM Transactions on Management Information Systems (TMIS) 2, 2 effects on users trust judgements could be caused by the small (2011), 12. [22] Tim Miller. 2018. Explanation in artificial intelligence: Insights from the social sample size. Future studies with a bigger sample size and a baseline sciences. Artificial Intelligence (2018). group are needed to determine the extent of which explanation [23] Andrea Papenmeier, Gwenn Englebienne, and Christin Seifert. 2019. How affects trust. model accuracy and explanation fidelity influence user trust. arXiv preprint arXiv:1907.12652 (2019). We also acknowledge that trust is difficult to measure. Even [24] Alun Preece. 2018. Asking ‘Why’in AI: Explainability of intelligent systems– though our trust measurement has shown high internal consistency, perspectives and challenges. Intelligent Systems in Accounting, Finance and Management 25, 2 (2018), 63–72. we have not fully investigated the validity of the measurement in [25] Pearl Pu and Li Chen. 2006. Trust building with explanation interfaces. In other cases/fields. Moreover, in this experiments, we only measured Proceedings of the 11th international conference on Intelligent user interfaces. ACM, user’s trust as a self reported measure. Our experimental design, 93–100. [26] Stephen J Read and Amy Marcus-Newhall. 1993. Explanatory coherence in social and the use of a probing method, may have also possibly influ- explanations: A parallel distributed processing account. Journal of Personality enced participants’ reflection and self reporting. Further research and Social Psychology 65, 3 (1993), 429. is needed to carefully determine whether this was the case. [27] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386 (2016). [28] Kristin Schaefer. 2013. The perception and measurement of human-robot trust. (2013). REFERENCES [29] Saravanan Thirumuruganathan, Mahashweta Das, Shrikant Desai, Sihem Amer- [1] Peter Achinstein. 1983. The nature of explanation. Oxford University Press on Yahia, Gautam Das, and Cong Yu. 2012. Maprat: Meaningful explanation, interac- Demand. tive exploration and geo-visualization of collaborative ratings. Proceedings of the [2] Stavros Antifakos, Nicky Kern, Bernt Schiele, and Adrian Schwaninger. 2005. VLDB Endowment 5, 12 (2012), 1986–1989. Towards improving trust in context-aware systems by displaying system con- [30] Eric S Vorm. 2018. Assessing Demand for Transparency in Intelligent Systems Us- fidence. In Proceedings of the 7th international conference on Human computer ing Machine Learning. In 2018 Innovations in Intelligent Systems and Applications interaction with mobile devices & services. ACM, 9–14. (INISTA). IEEE, 1–7. [3] Adrian Bussone, Simone Stumpf, and Dympna O’Sullivan. 2015. The role of [31] Danding Wang, Qian Yang, Ashraf Abdul, and Brian Y Lim. 2019. Designing explanations on trust and reliance in clinical decision support systems. In 2015 Theory-Driven User-Centric Explainable AI. In Proceedings of the 2019 CHI Con- International Conference on Healthcare Informatics. IEEE, 160–169. ference on Human Factors in Computing Systems. ACM, 601. [4] Pat Croskerry. 2009. Clinical cognition and diagnostic error: applications of a [32] Sam Wilkinson. 2014. Levels and kinds of explanation: lessons from neuropsy- dual process model of reasoning. Advances in health sciences education 14, 1 chiatry. Frontiers in psychology 5 (2014), 373. (2009), 27–35. [33] Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding [5] Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, the Effect of Accuracy on Trust in Machine Learning Models. In Proceedings of David O’Brien, Stuart Schieber, James Waldo, David Weinberger, and Alexandra the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 279. Wood. 2017. Accountability of AI under the law: The role of explanation. arXiv preprint arXiv:1711.01134 (2017). [6] Mary T Dzindolet, Scott A Peterson, Regina A Pomranky, Linda G Pierce, and Hall P Beck. 2003. The role of trust in automation reliance. International journal of human-computer studies 58, 6 (2003), 697–718. [7] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Dino Pedreschi, Franco Turini, and Fosca Giannotti. 2018. Local rule-based explanations of black box decision systems. arXiv preprint arXiv:1805.10820 (2018). [8] David Gunning. 2017. Explainable artificial intelligence (xai). (2017). [9] Jonathan L Herlocker, Joseph A Konstan, and John Riedl. 2000. Explaining col- laborative filtering recommendations. In Proceedings of the 2000 ACM conference on Computer supported cooperative work. ACM, 241–250. [10] Denis J Hilton. 1990. Conversational processes and causal explanation. Psycho- logical Bulletin 107, 1 (1990), 65. [11] Andreas Holzinger, Chris Biemann, Constantinos S Pattichis, and Douglas B Kell. 2017. What do we need to build explainable AI systems for the medical domain? arXiv preprint arXiv:1712.09923 (2017). [12] René F Kizilcec. 2016. How much information?: Effects of transparency on trust in an algorithmic interface. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2390–2395. [13] Todd Kulesza, Simone Stumpf, Margaret Burnett, and Irwin Kwan. 2012. Tell me more?: the effects of mental model soundness on personalizing an intelligent agent. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1–10. [14] Todd Kulesza, Simone Stumpf, Margaret Burnett, Sherry Yang, Irwin Kwan, and Weng-Keen Wong. 2013. Too much, too little, or just right? Ways explanations impact end users’ mental models. In 2013 IEEE Symposium on Visual Languages and Human Centric Computing. IEEE, 3–10. [15] Brian Y Lim and Anind K Dey. 2011. Design of an intelligible mobile context- aware application. In Proceedings of the 13th international conference on human computer interaction with mobile devices and services. ACM, 157–166. [16] Peter Lipton. 1990. Contrastive explanation. Royal Institute of Philosophy Supple- ments 27 (1990), 247–266. [17] Zachary C Lipton. 2017. The Doctor Just Won’t Accept That! arXiv preprint arXiv:1711.08037 (2017). [18] Tania Lombrozo. 2006. The structure and function of explanations. Trends in cognitive sciences 10, 10 (2006), 464–470.