AI Healthcare System Interface: Explanation Design for Non-Expert User Trust Retno Larasati, Anna De Liddo and Enrico Motta Knowledge Media Institute, The Open University. Walton Hall, Milton Keynes, United Kingdom Abstract Research indicates that non-expert users tend to either over-trust or distrust AI systems. This raises concerns when AI is applied to healthcare, where a patient trusting the advice of an unreliable system, or completely distrusting a reliable one, can lead to fatal incidents or missed healthcare opportunities. Previous research indicated that explanations can help users to make appropriate judgements on AI Systems’ trust, but how to design AI explanation interfaces for non-expert users in a medical support scenarios is still an open research challenge. This paper explores a stage-based participatory design process to develop a trustworthy explanation interface for non-experts in an AI medical support scenario. A trustworthy explanation is an explanation that helps users to make considered judgments on trusting (or not) and AI system for their healthcare. The objective of this paper was to identify the explanation components that can effectively inform the design of a trustworthy explanation interface. To achieve that, we undertook three data collections, examining experts’ and non-experts’ perceptions of AI medical support system’s explanations. We then developed a User Mental Model, an Expert Mental Model, and a Target Mental Model of explanation, describing how non-expert and experts understand explanations, how their understandings differ, and how it can be combined. Based on the Target Mental Model, we then propose a set of 14 explanation design guidelines for trustworthy AI Healthcare System explanation, that take into account non-expert users needs, medical experts practice, and AI experts understanding. Keywords Explanation, Trust, Explainable Artificial Intelligence, AI Healthcare, Design Guidelines, Participatory Design 1. Introduction trust, and effectively manage the emerging generation of artificially intelligent partners” [6]. Nevertheless, the Trustworthiness, the capability to independently estab- lack of trust is not the only problem. Previous research lish the right level of trust in an AI system, is progres- indicates that non expert users tend to over-trust and sively becoming an ethical and societal need. Trust is hu- continue to rely on a system even when it malfunctions mans’ primary reason for acceptance [1], without which in some circumstances [7]. To help non-expert health- the fair and accountable adoption of AI in healthcare care customers to appropriately trust AI systems, not may never actualise. The UK government issued a pol- over-trust or distrust, the system should be able to give icy paper that declared its vision for AI to ”transform an appropriate understandable explanation for that spe- the prevention, early diagnosis and treatment of chronic cific target audience. This paper aims at identifying the diseases by 2030” [2], and this might not be achieved if explanation components of AI healthcare system inter- there is an impediment to AI adoption and AI usage from faces, for non-expert users to appropriately inform their the general public (non-expert healthcare customers). trust in the AI system. We carried out a user study to Developing trust is particularly crucial in healthcare determine these explanation components and then used because it involves uncertainty and risks for vulnerable them to inform a set of design guidelines for trustworthy patients [3]. However, the lack of explainability, trans- AI Healthcare Systems explanation interfaces. parency, and human understanding of how AI works are We chose a stage-based participatory method, adapted key reasons why people have little trust in AI healthcare from Eiband et al. [8], that has been previously suc- applications; and research indicates that transparency cessfully applied to design explanation of recommender [4] and understandability [5] can be effectively used as systems in fitness applications [8]. This method particu- means to enhance trust in AI systems. Explainable AI larly fits our case since it enables an individual investiga- is argued to be essential ”to understand, appropriately tion of expert and non-expert views on the problem and Joint Proceedings of the ACM IUI 2021 Workshops, April 13-17, 2021, then provides a framework to combine expert and non- College Station, USA expert knowledge to inform design requirements. The Envelope-Open retno.larasati@open.ac.uk (R. Larasati); stage-based participatory process consists of two phases. anna.deliddo@open.ac.uk (A. De Liddo); enrico.motta@open.ac.uk The first phase focuses on ”what” to explain through the (E. Motta) construction of an Expert Mental Model (what ”can be Orcid 0000-0002-6412-2598 (R. Larasati); 0000-0003-0301-1154 (A. De explained”) and a User Mental Model (what ”needs” to Liddo); 0000-0003-0015-1952 (E. Motta) © 2020 Copyright © 2021 for this paper by its authors. Use permitted under be explained). The second phase focuses on synthesis- Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) ing the two models in a Target Mental Model, which Figure 1: The stage-based participatory process for our case. Inside the box: guideline question and data collection method describes ”how to convey the explanation” by design and sense of an explanation (what does the users’ mental developing a prototype technology. model of an explanation looks like). Unlike previous To build the Expert Mental Model, depicting the key studies [8][18], we did not have an available working components of explanation that need to be communi- system to understand the users’ mental models. This dif- cated to patients, we carried out a series of interviews ference affected how we elicited users’ feedback. We con- with medical professionals. Second, we conducted semi- ceptualised and used a hypothetical AI diagnosis system structured interviews with non-experts to identify the (inspired by similar commercial systems) to interrogate User Mental Model, which captures users’ needs and both expert and non-expert, and elicited their mental expectations in terms of AI explanation. Finally, we con- models from reflections on the system and previous ex- ducted a third set of semi-structured interviews with both perience with healthcare explanation. Our hypothetical AI experts and non-experts to determine how explana- system was a Breast Cancer Self-Assessment system, a tion content could be communicated to the non-expert medical system to assess breast cancer risk tailored for users to respond to the identified users needs (Target non-experts. Mental Model). From the Target Mental Model, we then As mentioned above, following Eiband et al. [8], we derived a list of design guidelines, which we then used carried out a stage-based participatory process consisting to develop a prototype explanation interface for an AI of two phases and five stages. The first phase focused on breast cancer risk assessment support systems. In par- ”what” to explain and consisted of two stages: the Expert ticular, we focused on a self-managed breast cancer risk Mental Model definition and the User Mental Model defi- scenario, in which results of mammography scans are nition. The second phase focused on ”how to convey the automatically analysed by an AI system and need to be explanation” and consisted of three stages: the Target communicated to the prospective patients. We choose a Mental Model construction, the Prototype development self-managed health scenario, because it represents the stage (to implement the Target Mental Model in a realistic extreme case, in which non-expert users are presented application case), and the Evaluation stage, to further test with AI results, without any support from medial or AI the prototype technology. In this paper, we conducted experts, and therefore the explanation is the only medi- four out of five stages in Figure 1, leaving further testing ating interface between patients and the AI system. and evaluation of the prototype for future research. Each stage is described in details in the next section. 2. Background 3. STUDY DESIGN AND In recent years, several studies explored different ap- proaches to design explanation of the outputs from intel- METHODOLOGY: ligent systems [9][10][11]. Some of the research focused STAGE-BASED on explanation designs for AI healthcare systems [12][13]. Despite the fact that many approaches have been pro- PARTICIPATORY PROCESS posed, the explanation design for AI healthcare system FOR EXPLANATION DESIGN mostly targets expert user [14][15]. Explanation design specifically targeted to non-expert users has received 3.1. Experts Mental Model scarce attention, despite the recognised importance of The expert mental model definition stage aimed at captur- improving non-expert user’s understanding of the AI ing experts’ understanding and vision of what an appro- system to positively affect users’ trust in the system[16], priate explanation of AI medical support system results and trust in the system recommendations [17]. to non-experts should look like. The experts involved To improve users’ understanding of the AI system with in its development were both machine learning technol- explanation, we first need to determine how they make ogists and medical professionals. This research stage Figure 2: Expert Mental Model analysis result: explanation components aimed at defining what can or should be explained to the variables,[...]. They can interact with the app and see the wider public from an expert perspective, by distilling a simulation.” - A2 series of explanation components, which represent the The system process answers varied considerably, and Expert Mental Model. spanned from providing information such as features’ Six participants were recruited by email, from the importance, to providing the name of the algorithm or authors’ personal research and social networks, three who made it. ”It’s like, for example, if they’re trying to were AI/machine learning developers/researchers, and recognise cancer in a certain image, so this is the feature the other three were doctor/physicians (general practi- that helped me the most having this conclusion”- A1. ”You tioners and oncology specialists). The main guiding ques- can try to show the formulation of the calculation. But some tions that drove this stage were: what can be explained?; algorithms do provide explanation on how it works.” - A3 and what does an expert explanation for non-experts This means that even though this explanation component looks like? We asked the questions based on participants was deemed important, AI experts were not clear on what respective expertise (medical professionals and AI ex- and how to present it to non expert users. perts). We also showed participants two examples of From the medical experts perspective on the other breast cancer-related systems currently in commerce, hand, explanations they usually gave to patients consist to understand how experts make sense of AI systems’ of disease information, possible treatments to choose, and outputs and how they would explain the results to non- the next step for the patient to take. They mentioned that experts. explaining diagnosis works differently if the diagnosis re- sult is bad. ”When we deliver the diagnosis to a patient, we Result and Analysis consider the situation as well. [...] For breaking bad news, We analysed interviews’ data using Grounded Theory we usually deliver the news layer by layer. so not directly [19]. Three sets of explanation components emerged. go to the diagnosis, we have some introduction first.” - M1. The first set of explanation components entailed the Con- If the result is bad, reassuring words are needed to help tent of the explanation, and described what information patients feel less stressed and worried. If the diagnosis needs to be included in the explanation. The second set of result is good or if there is no sign of distress from the explanation components entailed the required Customi- patient, there is less need for reassuring words. ”I think sation of the explanation, what needs to be considered one of the important things if it’s about serious conditions, when explaining, and changed accordingly on a case we need to put more empathy.” - M3 This is in line with by case basis. The third set of explanation components previous research on medical explanations How to Break entailed the explanation Interaction, the interactivity op- Bad News: A Guide for Health Care Professionals [20] and portunities that need to be available to users during an similar explanation protocols have been proposed and explanation. tested in the literature [21][22]. In terms of content of the explanation itself, AI experts The medical experts mentioned that explanation was answers were quite straight forward; users need to know not given by default but based on patients’ request and about input, system process, and output. ”We have the customised to patients’ needs. ”It depends on how curious inputs and intermediate results. The inputs are different they are. If the patient just wants to know the diagnosis, variables, as a driver for the predictor and explanatory then I may just tell them about it.” - M3. AI experts also mentioned explanation should probably only be provided several key components of explanation, which constitute on request. According to the AI experts, they rarely the User Mental Model. explain how the system works to non-expert user in a Szalma & Taylor (2011) showed that trust propensity is real-life situations unless the user asks for it. ”if the app one of the human-related factors that could affect the is working properly you don’t need to explain. But if there response to an intelligence system [23]. To account for is a problem, you need to explain what is going wrong.” - trust propensity, we sampled the participants based on A3. One AI expert even argued that non-expert were not their dispositional trust towards an AI medical support interested in knowing the logic behind/system process. ”I application and made sure there was a nearly equal num- have never met a common user that is interested in artificial ber of people in each trust groups (the AI sceptic, the intelligence or the machine learning of it. Even the expert open-minded, the AI enthusiast). We recruited four par- from the Ministry (people they work for), they were not ticipants for the three groups representing three levels really curious.” - A2. of dispositional trust, with 12 participants in total. To The medical experts also reported that they assess identify the level of trust, we asked the perspective par- what the patient knows and the patient’s perception. One ticipants to answer the following question: ”if there was medical expert mentioned that people who live in a rural a cancer risk assessment/self-detection application avail- area might have different knowledge than people who able on the market, how likely would you be to use it? live in a big city, meaning the explanation is customised Please rate the likelihood from 1-7”. This question was to the patients’ knowledge. ”People in the rural area, sent in advance of the interview invitation. The partici- don’t get the privilege to get a proper education, so it’s pants were then grouped into three groups, the sceptic (1- challenging for them to absorbs the explanation.” - M2 3 likelihood responses), the open-minded (4-5 likelihood The explanation components related to explanation in- responses), and the enthusiast (6-7 likelihood responses). teraction, reflect on the modalities in which experts com- We sought to balance out the age range (twenties to for- municate the explanation. Medical experts mentioned ties) because research suggests that age could affect users’ how they usually ask for confirmation about the patient trust towards a system, where older adults are more likely symptoms and worries before making a diagnosis (input to trust the system than younger adults in a medical man- check). The second component related to the capability agement system (decision aid) [24]. We also balanced for non-experts to raise open questions. After giving pa- out the male-female participants by recruited one male tients their results, medical experts would always ask if in each group because we recognised despite male breast there were any more questions. This interaction usually cancer is only accounting for less than 1% of all breast involves a back and forth exchange, until the patients has cancer diagnoses worldwide [25], sometimes men are no further questions. ”...Then we will explain what’s the included in the decision making towards the usage of next step. And we will ask if they have any questions or a particular system for affected women close/related to not. Including the diagnosis and the plan.” - M3. ”whenever them. patients ask, we then answer the questions directly.” - M1. We followed the same interview structure as in the ex- One AI expert mentioned that showing how the output perts’ interviews. The main guiding questions we asked changes could help non experts to understand the system the non-expert users were: how do users currently un- better (input manipulation and visualisation). AI experts derstand AI explanations?; what does a user explanation also mentioned how it could be overwhelming for the looks like? We then showed the participants two exam- user to read all the explanation, and suggest it would ples of breast cancer-related systems to probe non-expert be better to give users the option to request details if users reflections and feedback on the AI system’s result they need them (details request). ”We need the user to and explanation. see the general output, but they can expand on some detail. Making it simple, just a few statements, and the generalResult and Analysis result, and if the user is curious, they can dig into it.” - A3. We carried out a Thematic Analysis [26] to analyse the The Expert Mental Model outcome from the analysis can interviews’ data. The same three sets of explanation be seen in Fig 2. components could be identified Content, Customisation, and Interaction. In receiving a diagnosis, participants 3.2. User Mental Model explained that they would like to know about the dis- ease information. They mentioned: disease name, disease In the User Mental Model research stage, we captured symptoms and the severity of the disease as key informa- users’ understanding and their perspective on how ex- tion they would like to receive. Participants also reported planation should be presented in an AI medical support they would like to know about the next step/action or ac- system. The purpose of this stage was to acquire knowl- tion that they could or should take, for example, informa- edge about how do users currently make sense of expla- tion about the disease treatment that they should undergo nations. This acquired knowledge was then structured in after diagnosis, or if they have to make an appointment Figure 3: User Mental Model analysis result: explanation components with their doctor or physician. ”you got cancer, and your have information about the volume of a database used options would be these, these, and these, and this is how I to train the algorithm or the data features used for the want to proceed. These are your options.” - E2. ”do I need prediction. ”So at least I have to know how big is their to contact my physician directly or is there a next step that database.” - OM2. ”Explain to you the quality of features is also provided by the application itself?”- OM1 and characteristics; it is because this thing has this colour These diagnosis-related explanations, both disease in- shape”- E2. However, participants also talked about the formation and next step/action, could be considered more data they provide, their personal input data, they ex- local/disease specific explanation. However, participants pressed concerns of data privacy, and demanded specific also wanted more general explanations about the AI sys- information about that. ”And how am I sure that my breast tem, system information, which was not related to either picture will not be leaked to be utilised for other intentions the inputs or the results. One of the participants asked and such.” - OM1. ”Where are these data going?”- OM4. for information about the system process/algorithm. ”I Participants also talked about the system accuracy, would want to know, what are they doing actually in the credibility. ”However, for my health, I think it will be background to do this?”- E1. However, not all participants quite beneficial if I know how accurate it can be”- OM1. expressed their interest in knowing the system process; The credibility they mentioned was related to the insti- some were not keen to know the information. They ar- tute/company that developed the system. Credibility gued that in a stressful situation, such as a positively could also mean if the system has been tested and ap- assessed cancer, their focus would not be on the system proved by the appropriate health institution. information and more on their well-being. ”Says that I Besides the information that should be included in the have cancer, then I am not going to be interested in the explanation, participants also talked about how the ex- system process”- S1. planation should be delivered. Participants demanded for This arguments match with the AI experts opinion the AI results to be presented with care and empathy, es- we mentioned above, recognising that non-experts may pecially if their result was not in their favour. Empathy is usually not be interested in knowing the technical side the ability to understand and share the feelings of others, of a system. Non-expert users reluctance to know the and an empathetic statement should include phrases that technical information was a matter of timing and their help to establish a connection with the user. Participants emotional state after receiving a diagnosis. However, it mentioned empathy or reassuring words only in the case also reflects their reluctance caused by the possibility of of ”bad news” or presented if the result is not good; there- not understanding the technical terms used to explain fore, we put it under customisation in the User Mental the process. ”I mean, the very hard, fine grain details? it Model. will be incomprehensible for me because I am not familiar ”if I want to use text explanation, I think you should be, in with the technology and everything.” - E2. this confirms terms of style of shaping the statement that you present to previous research, arguing that what people consider the user, I think you should always follow, sort of defensive acceptable and understandable explanation depends on language. So again, it might be quite direct and aggressive people’s domain knowledge or role [27][28]. to say to the user, you have cancer, exclamation mark. Another emerging explanation component was system [...] Be a little bit more reserved, rather than explicit, into data. non experts users mentioned they would like to your statements because it’s quite a sensitive matter.” - E2. disease info treatment next plan input process output Participants who expected care and empathy were more content 10 10 10 10 7 10 concerned with the choice of words and how ”delicately” explanation req user education empathy the AI system delivers the diagnosis results. custom 8 4 10 Other than text and words, participants mentioned the interact input check open question input manip detail use of graphics and images to communicate the explana- 10 7 10 10 tion, for example, by showing comparison images of nor- Table 1 mal condition vs abnormal condition. The graphic/image Median values of Expert Mental Model explanation compo- to show comparison, we put under explanation content nents rating to inform Target Mental Model in User Mental Model, because regardless of the result (good/bad), the user wants to see the opposite case and decide themselves if the result makes sense to them or ysis of User and Expert Mental Model, combined with not. ”Perhaps have some examples of how affected breast the analysis of users’ feedback of expert mental model looks like, how unaffected breast looks like. So you can views, we distilled the Target Mental Model. compare yourself with what is being put in your input.” - E2. ”and then the image comparing, you know, both, my re- Result and Analysis sults and the healthy ones.” - S1. Participants requested toThe median values of explanation components’ rating show the opposite case was in line with the literature in given by the users are reported in Table 1. Under the Con- cognitive psychology, which states that human explana- tent explanation components set in the Expert Mental tions are sought in response to particular counterfactual Model, the system process was not seen as crucial, since cases [29][30]. Our finding confirm that counterfactual not all participants were interested in knowing the tech- case/contrastive explanation is argued to be an explana- nicality of how the AI system made a decision/prediction. tion that is understandable for user [31][32][28]. Under the Customisation set of explanation compo- For interaction with the AI system, participants ex- nents, empathy/reassuring word was rated high by the pressed their needs for a course of action, which is an participants. User request was also rated relatively high additional feature of doctor appointment included in the because some of the participants argued that explana- explanation interface. In how they will interact with tion should always be available whether a user requested explanation, participants wanted to be able to request it or not. The lowest-rated component under the Cus- detailed information rather than presented with the full tomisation group was user education and was deemed long explanation in one go. The mentioned that the ex- unnecessary since explanation should be understandable planation detail could be presented as a link to an outside for lay users regardless of their educational background. source or as a piece of expandable information. The User Under the Interaction group, all components were rated Mental Model outcome from the analysis can be seen in as important except for open question. Some participants Fig 3. were sceptical about openly asking questions to the AI system and preferred to wait to ask questions to a doctor. 3.3. Target Mental Model The final Target Mental Model is shown in Figure 4. The explanation components included were obtained In the Target Mental Model research stage, we identified from the combination of explanation components from what key components of an explanation (from the expert the Expert Mental Model and User Mental Model, then re- perspective - Expert Mental Model) the users might want vised according on users’ perceptions and preference on to be included in a AI explanation User Interface (UI). The experts views. The explanation components with lower Expert Mental Model’s explanation components were rating score are indicated with lighter text in figure. As combined with the explanation components from the an additional step, we went back to the experts and asked User Mental Model to form the Target Mental Model. a follow-up question to the medical experts for the ex- We conducted semi-structured interviews with the same planation components that appeared in the User Mental group of non-expert participants involved in the User Model but not in the Expert Mental Model, such as system Mental Model definition. accountability and data and doctor appointment. Accord- During the interviews, the main guiding question was: ing to them, system’s certification and accountability which explanation components users want to be realised were not essential to be included in the explanation. If in a UI to explain AI results? We asked participants to the application is recommended by the healthcare au- reflect on the explanation components from the Expert thority (e.g., for the UK, NHS), it would be considered Mental Model and discuss which one they considered enough for them. The doctor appointment component most important and valuable. Participants were asked did not come up in the interview before because they to explicitly reflect on each explanation component by expected it as a given feature. giving a rating of importance (form 0-10) and expressing their opinion on each of them. Based on the critical anal- Figure 4: Target Mental Model analysis result: explanation components Explanation Design Guidelines (EDG) Descriptions Requisite Disease Information general disease information e.g.: name, symptoms, caused Yes Disease Treatment treatment options and information Yes Information Next Plan/Step next step user could take following the result Yes Included System Information general system information e.g.: data used, system certification Yes (EDG1-EDG7) System Input data the user inputted Yes System Process system algorithm or the technical process to gets its results Optional System Output system result e.g.:pre-diagnosis, recommendation Yes Empathy Information delicately deliver the results with carefully selected words Yes (Reassuring Words) Delivery uncomplicated wording that is acceptable for lay users (EDG8-EDG9) Simple and General Optional from various education background and level Input Check for the user to check the input (is it correct or not) Yes Interaction Doctor Appointment for the user to make a doctor appointment Yes Included Open Question for the user to ask open questions Optional (EDG10-EDG14) Input Comparison for the user to compare the result with other data Yes (Visualisation) Detail Request for the user to request detailed information Yes Table 2 Our 14 explanation design guidelines, categorised by information included, information delivery, and interaction included. 3.4. Design Guidelines and Prototype lation (GDPR) and European Commission Checklist for Trustworthy Artificial Intelligence (ALTAI)1 . According By reflecting on the findings of the Target Mental Model, to these regulations, explanation should be always pro- we propose 14 explanation components/design guide- vided, by law, to any uses when AI is involved. AI in lines for trustworthy AI medical support system inter- healthcare was classified as high-risk AI according to faces (See Table 2). Those guidelines were grouped into White Paper On Artificial Intelligence by European Com- three categories that mirrored the Target Mental Model’s mission 2 , which makes explanation availability even explanation components sets: Explanation Content/Infor- more essential in a healthcare scenario. We therefore re- mation to be Included, Explanation Customisation/Infor- moved the ”explanation request” option from the design mation Delivery, and Explanation Interaction/Interaction guidelines since, even if desirable from an non-expert to be afforded. Each guideline references the explana- users perceptive, would be an unethical and unlawful tion contents from the Target Mental Model, except for design choice. explanation request component. We decided to not include explanation request in the 1 https://ec.europa.eu/info/publications/white-paper-artificial- guidelines in consideration of several regulations, such intelligence-european-approach-excellence-and-trust as The European Union’s General Data Protection Regu- 2 https://ec.europa.eu/futurium/en/ai-alliance- consultation/guidelines We then designed a user interface prototype based on 5. Limitation and Future Works the guidelines at Table 2. We explored each guidelines’ presentation possibilities and the specific functionalities There are several limitations of our study that should be of the system that could realise them. We decided on addressed in future works. The stage-based participatory a website where the user could carry out breast self- process is not complete. The final stage, which eval- assessment based on screening images from their medical uates the developed prototype’s effectiveness, has not scan portable device. The final prototype was developed carried out yet. We need to test whether the prototype after several cycles of feedback between designers, and has reached the design goals and wholly followed the de- was then uploaded as a website at (https://retnolaras. sign guidelines. To test if our prototype has reached the github.io/care/). design goals, which is to design an explanation that can help user to make a considered trust judgements, we need to assess if there is any change in users’ perception and 4. Discussion their trust level. To measure the change in user’s trust, we plan to use a quantitative measurement instrument Previous research have used the development of mental [28] in a controlled experiment setting quantitatively models to fully explore users’ understanding and help measuring the extent to which each of the guidelines re- the design of transparent AI systems in various contexts alised in the prototype contributed to enable considered [18][33][34], including the research which we adapted trust judgements by non-expert users. In addition, we this stage-based participatory process from [8]. As men- will conduct a lab-study and interview to get qualitative tioned in the Background section, the difference between insight on both the prototype and the design guidelines. our mental models with previous research is on what the We also acknowledge limitations within the research mental model is about and it’s richness. Our mental mod- stages we had conducted. The participants involved were els can be considered limited, in that they only draw on recruited from our personal network, which might limit how the users perceive a prospective AI system but not the views variation in differing opinions. Finally, the ex- on how it works in details, in a real life context. There- planation design guidelines proposed by this paper have fore unlike previous research our study cannot provide a not yet been evaluated, both in the guidelines’ applica- detailed understanding of how and why the AI system bility across AI medical support systems variety; and the works in practice [34]. Nonetheless, we successfully dis- guidelines’ clarity. Finally, the prototype we developed tilled different stakeholders insights on explanation of only delved into one type of modality, a graphic user a AI medical support system, and formed them as very interface. How the design guidelines implemented to an detailed mental models. We critically discussed the differ- audio user interface or a conversational user interface ence in understanding and perceptions of AI explanation also needs further exploration. needs, from an expert and non-expert perspective, we discussed issues of explanation modality and interaction, and combined expert and non-expert views in a target 6. Conclusion mental model. The resulting design guidelines were also contextualised to current practice and health regulations. In this paper, we successfully applied a stage-based par- The explanation design guidelines we developed were ticipatory design process to define future design guide- based on critical reflections of Target Mental Model re- lines for trustworthy AI healthcare system explanation sults. There is definitely room for improvements where interfaces for non-expert users. We developed an Expert we can incorporate other AI design guidelines or explana- Mental Model, User Mental Model, and Target Mental tion recommendation to elaborate on our current guide- Model of AI medical support system’s explanation. These lines. For example, from the Amershi et al.’s guidelines mental models captured the needs and visions of the dif- [35]; AI should show contextually relevant information ferent stakeholders involved in a human-AI explanation (G4) and mitigate social biases (G6), we could add those process in a healthcare scenario. We used the developed guidelines to our guideline Information Delivery: Sim- Target Mental Model to inform a set of 14 explanation ple and General (EDG9). Another example, from [32]; design guidelines for the development of trustworthy AI suggesting that explanation should be contrastive, could Healthcare System Explanation Interfaces, which specifi- contribute to our guideline for Interaction: Input Com- cally catered for non-expert users, while still taking into parison (Visualisation). A follow on critical literature account medical experts’ practice and AI experts’ un- review would also help to verify and validate our pro- derstanding. These guidelines emerged as an outcome posed design guidelines. of several stages of interviews, feedback from different types of stakeholders, thorough analysis of the current literature, and critical reflections on the insights obtained through the participatory process. References plications to healthcare domain, arXiv preprint arXiv:1512.03542 (2015). [1] D. Gefen, E. Karahanna, D. W. Straub, Trust and [16] J. B. Lyons, G. G. Sadler, K. Koltai, H. Battiste, N. T. tam in online shopping: an integrated model, MIS Ho, L. C. Hoffmann, D. Smith, W. Johnson, R. Shiv- quarterly 27 (2003) 51–90. ely, Shaping trust through transparent design: the- [2] GOV.UK, The future of healthcare: our vision for oretical and experimental guidelines, in: Advances digital, data and technology in health and care, 2018. in human factors in robots and unmanned systems, (Accessed on 02/10/2019). Springer, 2017, pp. 127–136. [3] A. Alaszewski, Risk, trust and health, 2003. [17] H. Cramer, V. Evers, S. Ramlal, M. Van Someren, [4] A. Holzinger, C. Biemann, C. S. Pattichis, D. B. L. Rutledge, N. Stash, L. Aroyo, B. Wielinga, The Kell, What do we need to build explainable ai effects of transparency on trust in and acceptance of systems for the medical domain?, arXiv preprint a content-based art recommender, User Modeling arXiv:1712.09923 (2017). and User-adapted interaction 18 (2008) 455. [5] Z. C. Lipton, The doctor just won’t accept that!, [18] C.-H. Tsai, P. Brusilovsky, Designing explanation arXiv preprint arXiv:1711.08037 (2017). interfaces for transparency and beyond., in: IUI [6] D. Gunning, Explainable artificial intelligence (xai) Workshops, 2019. (2017). [19] B. G. Glaser, A. L. Strauss, Discovery of grounded [7] M. R. Cohen, J. L. Smetzer, Ismp medication error re- theory: Strategies for qualitative research, Rout- port analysis: Understanding human over-reliance ledge, 1967. on technology it’s exelan, not exelon crash cart drug [20] R. Buckman, How to break bad news: a guide for mix-up risk with entering a “test order”, Hospital health care professionals, JHU Press, 1992. pharmacy 52 (2017) 7. [21] M. W. Rabow, S. J. Mcphee, Beyond breaking bad [8] M. Eiband, H. Schneider, M. Bilandzic, J. Fazekas- news: how to help patients who suffer., Western Con, M. Haug, H. Hussmann, Bringing trans- Journal of Medicine 171 (1999) 260. parency design into practice, in: 23rd international [22] W. F. Baile, R. Buckman, R. Lenzi, G. Glober, E. A. conference on intelligent user interfaces, 2018, pp. Beale, A. P. Kudelka, Spikes—a six-step protocol 211–223. for delivering bad news: application to the patient [9] B. Y. Lim, A. K. Dey, Design of an intelligible mobile with cancer, The oncologist 5 (2000) 302–311. context-aware application, in: Proceedings of the [23] J. L. Szalma, G. S. Taylor, Individual differences in 13th international conference on human computer response to automation: The five factor model of interaction with mobile devices and services, 2011, personality., Journal of Experimental Psychology: pp. 157–166. Applied 17 (2011) 71. [10] P. Pu, L. Chen, Trust building with explanation in- [24] G. Ho, D. Wheatley, C. T. Scialfa, Age differences in terfaces, in: Proceedings of the 11th international trust and reliance of a medication management sys- conference on Intelligent user interfaces, ACM, tem, Interacting with Computers 17 (2005) 690–710. 2006, pp. 93–100. [25] B. M. Yalaza M, İnan A, Male breast cancer, J Breast [11] B. Y. Lim, A. K. Dey, Evaluating intelligibility us- Health (2016). age and usefulness in a context-aware application, [26] V. Braun, V. Clarke, Using thematic analysis in in: International Conference on Human-Computer psychology, Qualitative research in psychology 3 Interaction, Springer, 2013, pp. 92–101. (2006) 77–101. [12] E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, [27] B. F. Malle, How the mind explains behavior: Folk W. Stewart, Retain: An interpretable predictive explanations, meaning, and social interaction, Mit model for healthcare using reverse time attention Press, 2006. mechanism, in: Advances in Neural Information [28] R. Larasati, A. De Liddo, E. Motta, The effect of ex- Processing Systems, 2016, pp. 3504–3512. planation styles on user’s trust, in: Proceedings of [13] F. Jiang, Y. Jiang, H. Zhi, Y. Dong, H. Li, S. Ma, the Workshop on Explainable Smart Systems for Al- Y. Wang, Q. Dong, H. Shen, Y. Wang, Artificial gorithmic Transparency in Emerging Technologies intelligence in healthcare: past, present and future, co-located with IUI 2020, 2020. Stroke and vascular neurology 2 (2017) 230–243. [29] P. Lipton, Contrastive explanation, Royal Institute [14] A. Bussone, S. Stumpf, D. O’Sullivan, The role of ex- of Philosophy Supplements 27 (1990) 247–266. planations on trust and reliance in clinical decision [30] D. J. Hilton, Conversational processes and causal support systems, in: 2015 International Conference explanation., Psychological Bulletin 107 (1990) 65. on Healthcare Informatics, IEEE, 2015, pp. 160–169. [31] S. Wachter, B. Mittelstadt, C. Russell, Counterfac- [15] Z. Che, S. Purushotham, R. Khemani, Y. Liu, Dis- tual explanations without opening the black box: tilling knowledge from deep networks with ap- Automated decisions and the gpdr, Harv. JL & Tech. 31 (2017) 841. [32] T. Miller, Explanation in artificial intelligence: In- sights from the social sciences, Artificial Intelli- gence (2018). [33] T. Kulesza, S. Stumpf, M. Burnett, S. Yang, I. Kwan, W.-K. Wong, Too much, too little, or just right? ways explanations impact end users’ mental models, in: 2013 IEEE Symposium on Visual Languages and Human Centric Computing, IEEE, 2013, pp. 3–10. [34] T. Kulesza, S. Stumpf, M. Burnett, I. Kwan, Tell me more?: the effects of mental model soundness on personalizing an intelligent agent, in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, 2012, pp. 1–10. [35] S. Amershi, D. Weld, M. Vorvoreanu, A. Fourney, B. Nushi, P. Collisson, J. Suh, S. Iqbal, P. N. Bennett, K. Inkpen, et al., Guidelines for human-ai interac- tion, in: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, ACM, 2019, p. 3.