AI Healthcare System Interface: Explanation Design for
Non-Expert User Trust
Retno Larasati, Anna De Liddo and Enrico Motta
Knowledge Media Institute, The Open University. Walton Hall, Milton Keynes, United Kingdom


                                          Abstract
                                          Research indicates that non-expert users tend to either over-trust or distrust AI systems. This raises concerns when AI is
                                          applied to healthcare, where a patient trusting the advice of an unreliable system, or completely distrusting a reliable one, can
                                          lead to fatal incidents or missed healthcare opportunities. Previous research indicated that explanations can help users to
                                          make appropriate judgements on AI Systems’ trust, but how to design AI explanation interfaces for non-expert users in a
                                          medical support scenarios is still an open research challenge. This paper explores a stage-based participatory design process
                                          to develop a trustworthy explanation interface for non-experts in an AI medical support scenario. A trustworthy explanation
                                          is an explanation that helps users to make considered judgments on trusting (or not) and AI system for their healthcare. The
                                          objective of this paper was to identify the explanation components that can effectively inform the design of a trustworthy
                                          explanation interface. To achieve that, we undertook three data collections, examining experts’ and non-experts’ perceptions
                                          of AI medical support system’s explanations. We then developed a User Mental Model, an Expert Mental Model, and a Target
                                          Mental Model of explanation, describing how non-expert and experts understand explanations, how their understandings
                                          differ, and how it can be combined. Based on the Target Mental Model, we then propose a set of 14 explanation design
                                          guidelines for trustworthy AI Healthcare System explanation, that take into account non-expert users needs, medical experts
                                          practice, and AI experts understanding.

                                          Keywords
                                          Explanation, Trust, Explainable Artificial Intelligence, AI Healthcare, Design Guidelines, Participatory Design


1. Introduction                                                                                                  trust, and effectively manage the emerging generation
                                                                                                                 of artificially intelligent partners” [6]. Nevertheless, the
Trustworthiness, the capability to independently estab-                                                          lack of trust is not the only problem. Previous research
lish the right level of trust in an AI system, is progres-                                                       indicates that non expert users tend to over-trust and
sively becoming an ethical and societal need. Trust is hu-                                                       continue to rely on a system even when it malfunctions
mans’ primary reason for acceptance [1], without which                                                           in some circumstances [7]. To help non-expert health-
the fair and accountable adoption of AI in healthcare                                                            care customers to appropriately trust AI systems, not
may never actualise. The UK government issued a pol-                                                             over-trust or distrust, the system should be able to give
icy paper that declared its vision for AI to ”transform                                                          an appropriate understandable explanation for that spe-
the prevention, early diagnosis and treatment of chronic                                                         cific target audience. This paper aims at identifying the
diseases by 2030” [2], and this might not be achieved if                                                         explanation components of AI healthcare system inter-
there is an impediment to AI adoption and AI usage from                                                          faces, for non-expert users to appropriately inform their
the general public (non-expert healthcare customers).                                                            trust in the AI system. We carried out a user study to
   Developing trust is particularly crucial in healthcare                                                        determine these explanation components and then used
because it involves uncertainty and risks for vulnerable                                                         them to inform a set of design guidelines for trustworthy
patients [3]. However, the lack of explainability, trans-                                                        AI Healthcare Systems explanation interfaces.
parency, and human understanding of how AI works are                                                                We chose a stage-based participatory method, adapted
key reasons why people have little trust in AI healthcare                                                        from Eiband et al. [8], that has been previously suc-
applications; and research indicates that transparency                                                           cessfully applied to design explanation of recommender
[4] and understandability [5] can be effectively used as                                                         systems in fitness applications [8]. This method particu-
means to enhance trust in AI systems. Explainable AI                                                             larly fits our case since it enables an individual investiga-
is argued to be essential ”to understand, appropriately                                                          tion of expert and non-expert views on the problem and
Joint Proceedings of the ACM IUI 2021 Workshops, April 13-17, 2021,
                                                                                                                 then provides a framework to combine expert and non-
College Station, USA                                                                                             expert knowledge to inform design requirements. The
Envelope-Open retno.larasati@open.ac.uk (R. Larasati);                                                           stage-based participatory process consists of two phases.
anna.deliddo@open.ac.uk (A. De Liddo); enrico.motta@open.ac.uk                                                   The first phase focuses on ”what” to explain through the
(E. Motta)                                                                                                       construction of an Expert Mental Model (what ”can be
Orcid 0000-0002-6412-2598 (R. Larasati); 0000-0003-0301-1154 (A. De
                                                                                                                 explained”) and a User Mental Model (what ”needs” to
Liddo); 0000-0003-0015-1952 (E. Motta)
                                    © 2020 Copyright © 2021 for this paper by its authors. Use permitted under   be explained). The second phase focuses on synthesis-
                                    Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)                                      ing the two models in a Target Mental Model, which
Figure 1: The stage-based participatory process for our case. Inside the box: guideline question and data collection method


describes ”how to convey the explanation” by design and       sense of an explanation (what does the users’ mental
developing a prototype technology.                            model of an explanation looks like). Unlike previous
   To build the Expert Mental Model, depicting the key        studies [8][18], we did not have an available working
components of explanation that need to be communi-            system to understand the users’ mental models. This dif-
cated to patients, we carried out a series of interviews      ference affected how we elicited users’ feedback. We con-
with medical professionals. Second, we conducted semi-        ceptualised and used a hypothetical AI diagnosis system
structured interviews with non-experts to identify the        (inspired by similar commercial systems) to interrogate
User Mental Model, which captures users’ needs and            both expert and non-expert, and elicited their mental
expectations in terms of AI explanation. Finally, we con-     models from reflections on the system and previous ex-
ducted a third set of semi-structured interviews with both    perience with healthcare explanation. Our hypothetical
AI experts and non-experts to determine how explana-          system was a Breast Cancer Self-Assessment system, a
tion content could be communicated to the non-expert          medical system to assess breast cancer risk tailored for
users to respond to the identified users needs (Target        non-experts.
Mental Model). From the Target Mental Model, we then             As mentioned above, following Eiband et al. [8], we
derived a list of design guidelines, which we then used       carried out a stage-based participatory process consisting
to develop a prototype explanation interface for an AI        of two phases and five stages. The first phase focused on
breast cancer risk assessment support systems. In par-        ”what” to explain and consisted of two stages: the Expert
ticular, we focused on a self-managed breast cancer risk      Mental Model definition and the User Mental Model defi-
scenario, in which results of mammography scans are           nition. The second phase focused on ”how to convey the
automatically analysed by an AI system and need to be         explanation” and consisted of three stages: the Target
communicated to the prospective patients. We choose a         Mental Model construction, the Prototype development
self-managed health scenario, because it represents the       stage (to implement the Target Mental Model in a realistic
extreme case, in which non-expert users are presented         application case), and the Evaluation stage, to further test
with AI results, without any support from medial or AI        the prototype technology. In this paper, we conducted
experts, and therefore the explanation is the only medi-      four out of five stages in Figure 1, leaving further testing
ating interface between patients and the AI system.           and evaluation of the prototype for future research. Each
                                                              stage is described in details in the next section.

2. Background
                                                              3. STUDY DESIGN AND
In recent years, several studies explored different ap-
proaches to design explanation of the outputs from intel-
                                                                 METHODOLOGY:
ligent systems [9][10][11]. Some of the research focused         STAGE-BASED
on explanation designs for AI healthcare systems [12][13].
Despite the fact that many approaches have been pro-
                                                                 PARTICIPATORY PROCESS
posed, the explanation design for AI healthcare system           FOR EXPLANATION DESIGN
mostly targets expert user [14][15]. Explanation design
specifically targeted to non-expert users has received        3.1. Experts Mental Model
scarce attention, despite the recognised importance of
                                                              The expert mental model definition stage aimed at captur-
improving non-expert user’s understanding of the AI
                                                              ing experts’ understanding and vision of what an appro-
system to positively affect users’ trust in the system[16],
                                                              priate explanation of AI medical support system results
and trust in the system recommendations [17].
                                                              to non-experts should look like. The experts involved
   To improve users’ understanding of the AI system with
                                                              in its development were both machine learning technol-
explanation, we first need to determine how they make
                                                              ogists and medical professionals. This research stage
                         Figure 2: Expert Mental Model analysis result: explanation components


aimed at defining what can or should be explained to the       variables,[...]. They can interact with the app and see the
wider public from an expert perspective, by distilling a       simulation.” - A2
series of explanation components, which represent the             The system process answers varied considerably, and
Expert Mental Model.                                           spanned from providing information such as features’
   Six participants were recruited by email, from the          importance, to providing the name of the algorithm or
authors’ personal research and social networks, three          who made it. ”It’s like, for example, if they’re trying to
were AI/machine learning developers/researchers, and           recognise cancer in a certain image, so this is the feature
the other three were doctor/physicians (general practi-        that helped me the most having this conclusion”- A1. ”You
tioners and oncology specialists). The main guiding ques-      can try to show the formulation of the calculation. But some
tions that drove this stage were: what can be explained?;      algorithms do provide explanation on how it works.” - A3
and what does an expert explanation for non-experts            This means that even though this explanation component
looks like? We asked the questions based on participants       was deemed important, AI experts were not clear on what
respective expertise (medical professionals and AI ex-         and how to present it to non expert users.
perts). We also showed participants two examples of               From the medical experts perspective on the other
breast cancer-related systems currently in commerce,           hand, explanations they usually gave to patients consist
to understand how experts make sense of AI systems’            of disease information, possible treatments to choose, and
outputs and how they would explain the results to non-         the next step for the patient to take. They mentioned that
experts.                                                       explaining diagnosis works differently if the diagnosis re-
                                                               sult is bad. ”When we deliver the diagnosis to a patient, we
Result and Analysis                                            consider the situation as well. [...] For breaking bad news,
We analysed interviews’ data using Grounded Theory             we usually deliver the news layer by layer. so not directly
[19]. Three sets of explanation components emerged.            go to the diagnosis, we have some introduction first.” - M1.
The first set of explanation components entailed the Con-      If the result is bad, reassuring words are needed to help
tent of the explanation, and described what information        patients feel less stressed and worried. If the diagnosis
needs to be included in the explanation. The second set of     result is good or if there is no sign of distress from the
explanation components entailed the required Customi-          patient, there is less need for reassuring words. ”I think
sation of the explanation, what needs to be considered         one of the important things if it’s about serious conditions,
when explaining, and changed accordingly on a case             we need to put more empathy.” - M3 This is in line with
by case basis. The third set of explanation components         previous research on medical explanations How to Break
entailed the explanation Interaction, the interactivity op-    Bad News: A Guide for Health Care Professionals [20] and
portunities that need to be available to users during an       similar explanation protocols have been proposed and
explanation.                                                   tested in the literature [21][22].
   In terms of content of the explanation itself, AI experts      The medical experts mentioned that explanation was
answers were quite straight forward; users need to know        not given by default but based on patients’ request and
about input, system process, and output. ”We have the          customised to patients’ needs. ”It depends on how curious
inputs and intermediate results. The inputs are different      they are. If the patient just wants to know the diagnosis,
variables, as a driver for the predictor and explanatory       then I may just tell them about it.” - M3. AI experts also
mentioned explanation should probably only be provided             several key components of explanation, which constitute
on request. According to the AI experts, they rarely               the User Mental Model.
explain how the system works to non-expert user in a               Szalma & Taylor (2011) showed that trust propensity is
real-life situations unless the user asks for it. ”if the app      one of the human-related factors that could affect the
is working properly you don’t need to explain. But if there        response to an intelligence system [23]. To account for
is a problem, you need to explain what is going wrong.” -          trust propensity, we sampled the participants based on
A3. One AI expert even argued that non-expert were not             their dispositional trust towards an AI medical support
interested in knowing the logic behind/system process. ”I          application and made sure there was a nearly equal num-
have never met a common user that is interested in artificial      ber of people in each trust groups (the AI sceptic, the
intelligence or the machine learning of it. Even the expert        open-minded, the AI enthusiast). We recruited four par-
from the Ministry (people they work for), they were not            ticipants for the three groups representing three levels
really curious.” - A2.                                             of dispositional trust, with 12 participants in total. To
   The medical experts also reported that they assess              identify the level of trust, we asked the perspective par-
what the patient knows and the patient’s perception. One           ticipants to answer the following question: ”if there was
medical expert mentioned that people who live in a rural           a cancer risk assessment/self-detection application avail-
area might have different knowledge than people who                able on the market, how likely would you be to use it?
live in a big city, meaning the explanation is customised          Please rate the likelihood from 1-7”. This question was
to the patients’ knowledge. ”People in the rural area,             sent in advance of the interview invitation. The partici-
don’t get the privilege to get a proper education, so it’s         pants were then grouped into three groups, the sceptic (1-
challenging for them to absorbs the explanation.” - M2             3 likelihood responses), the open-minded (4-5 likelihood
   The explanation components related to explanation in-           responses), and the enthusiast (6-7 likelihood responses).
teraction, reflect on the modalities in which experts com-         We sought to balance out the age range (twenties to for-
municate the explanation. Medical experts mentioned                ties) because research suggests that age could affect users’
how they usually ask for confirmation about the patient            trust towards a system, where older adults are more likely
symptoms and worries before making a diagnosis (input              to trust the system than younger adults in a medical man-
check). The second component related to the capability             agement system (decision aid) [24]. We also balanced
for non-experts to raise open questions. After giving pa-          out the male-female participants by recruited one male
tients their results, medical experts would always ask if          in each group because we recognised despite male breast
there were any more questions. This interaction usually            cancer is only accounting for less than 1% of all breast
involves a back and forth exchange, until the patients has         cancer diagnoses worldwide [25], sometimes men are
no further questions. ”...Then we will explain what’s the          included in the decision making towards the usage of
next step. And we will ask if they have any questions or           a particular system for affected women close/related to
not. Including the diagnosis and the plan.” - M3. ”whenever        them.
patients ask, we then answer the questions directly.” - M1.           We followed the same interview structure as in the ex-
   One AI expert mentioned that showing how the output             perts’ interviews. The main guiding questions we asked
changes could help non experts to understand the system            the non-expert users were: how do users currently un-
better (input manipulation and visualisation). AI experts          derstand AI explanations?; what does a user explanation
also mentioned how it could be overwhelming for the                looks like? We then showed the participants two exam-
user to read all the explanation, and suggest it would             ples of breast cancer-related systems to probe non-expert
be better to give users the option to request details if           users reflections and feedback on the AI system’s result
they need them (details request). ”We need the user to             and explanation.
see the general output, but they can expand on some detail.
Making it simple, just a few statements, and the generalResult and Analysis
result, and if the user is curious, they can dig into it.” - A3.
                                                        We carried out a Thematic Analysis [26] to analyse the
The Expert Mental Model outcome from the analysis can   interviews’ data. The same three sets of explanation
be seen in Fig 2.                                       components could be identified Content, Customisation,
                                                        and Interaction. In receiving a diagnosis, participants
3.2. User Mental Model                                  explained that they would like to know about the dis-
                                                        ease information. They mentioned: disease name, disease
In the User Mental Model research stage, we captured symptoms and the severity of the disease as key informa-
users’ understanding and their perspective on how ex- tion they would like to receive. Participants also reported
planation should be presented in an AI medical support they would like to know about the next step/action or ac-
system. The purpose of this stage was to acquire knowl- tion that they could or should take, for example, informa-
edge about how do users currently make sense of expla- tion about the disease treatment that they should undergo
nations. This acquired knowledge was then structured in after diagnosis, or if they have to make an appointment
                          Figure 3: User Mental Model analysis result: explanation components


with their doctor or physician. ”you got cancer, and your       have information about the volume of a database used
options would be these, these, and these, and this is how I     to train the algorithm or the data features used for the
want to proceed. These are your options.” - E2. ”do I need      prediction. ”So at least I have to know how big is their
to contact my physician directly or is there a next step that   database.” - OM2. ”Explain to you the quality of features
is also provided by the application itself?”- OM1               and characteristics; it is because this thing has this colour
   These diagnosis-related explanations, both disease in-       shape”- E2. However, participants also talked about the
formation and next step/action, could be considered more        data they provide, their personal input data, they ex-
local/disease specific explanation. However, participants       pressed concerns of data privacy, and demanded specific
also wanted more general explanations about the AI sys-         information about that. ”And how am I sure that my breast
tem, system information, which was not related to either        picture will not be leaked to be utilised for other intentions
the inputs or the results. One of the participants asked        and such.” - OM1. ”Where are these data going?”- OM4.
for information about the system process/algorithm. ”I              Participants also talked about the system accuracy,
would want to know, what are they doing actually in the         credibility. ”However, for my health, I think it will be
background to do this?”- E1. However, not all participants      quite beneficial if I know how accurate it can be”- OM1.
expressed their interest in knowing the system process;         The credibility they mentioned was related to the insti-
some were not keen to know the information. They ar-            tute/company that developed the system. Credibility
gued that in a stressful situation, such as a positively        could also mean if the system has been tested and ap-
assessed cancer, their focus would not be on the system         proved by the appropriate health institution.
information and more on their well-being. ”Says that I              Besides the information that should be included in the
have cancer, then I am not going to be interested in the        explanation, participants also talked about how the ex-
system process”- S1.                                            planation should be delivered. Participants demanded for
   This arguments match with the AI experts opinion             the AI results to be presented with care and empathy, es-
we mentioned above, recognising that non-experts may            pecially if their result was not in their favour. Empathy is
usually not be interested in knowing the technical side         the ability to understand and share the feelings of others,
of a system. Non-expert users reluctance to know the            and an empathetic statement should include phrases that
technical information was a matter of timing and their          help to establish a connection with the user. Participants
emotional state after receiving a diagnosis. However, it        mentioned empathy or reassuring words only in the case
also reflects their reluctance caused by the possibility of     of ”bad news” or presented if the result is not good; there-
not understanding the technical terms used to explain           fore, we put it under customisation in the User Mental
the process. ”I mean, the very hard, fine grain details? it     Model.
will be incomprehensible for me because I am not familiar           ”if I want to use text explanation, I think you should be, in
with the technology and everything.” - E2. this confirms        terms of style of shaping the statement that you present to
previous research, arguing that what people consider            the user, I think you should always follow, sort of defensive
acceptable and understandable explanation depends on            language. So again, it might be quite direct and aggressive
people’s domain knowledge or role [27][28].                     to say to the user, you have cancer, exclamation mark.
   Another emerging explanation component was system            [...] Be a little bit more reserved, rather than explicit, into
data. non experts users mentioned they would like to            your statements because it’s quite a sensitive matter.” - E2.
                                                                        disease info      treatment        next plan     input process output
Participants who expected care and empathy were more           content
                                                                        10                10               10            10    7       10
concerned with the choice of words and how ”delicately”                 explanation req   user education   empathy
the AI system delivers the diagnosis results.                  custom
                                                                        8                 4                10
   Other than text and words, participants mentioned the       interact
                                                                        input check       open question    input manip   detail
use of graphics and images to communicate the explana-                  10                7                10            10
tion, for example, by showing comparison images of nor-        Table 1
mal condition vs abnormal condition. The graphic/image         Median values of Expert Mental Model explanation compo-
to show comparison, we put under explanation content           nents rating to inform Target Mental Model
in User Mental Model, because regardless of the result
(good/bad), the user wants to see the opposite case and
decide themselves if the result makes sense to them or         ysis of User and Expert Mental Model, combined with
not. ”Perhaps have some examples of how affected breast        the analysis of users’ feedback of expert mental model
looks like, how unaffected breast looks like. So you can       views, we distilled the Target Mental Model.
compare yourself with what is being put in your input.” -
E2. ”and then the image comparing, you know, both, my re-   Result and Analysis
sults and the healthy ones.” - S1. Participants requested toThe median values of explanation components’ rating
show the opposite case was in line with the literature in   given by the users are reported in Table 1. Under the Con-
cognitive psychology, which states that human explana-      tent explanation components set in the Expert Mental
tions are sought in response to particular counterfactual   Model, the system process was not seen as crucial, since
cases [29][30]. Our finding confirm that counterfactual     not all participants were interested in knowing the tech-
case/contrastive explanation is argued to be an explana-    nicality of how the AI system made a decision/prediction.
tion that is understandable for user [31][32][28].             Under the Customisation set of explanation compo-
   For interaction with the AI system, participants ex-     nents, empathy/reassuring word was rated high by the
pressed their needs for a course of action, which is an     participants. User request was also rated relatively high
additional feature of doctor appointment included in the    because some of the participants argued that explana-
explanation interface. In how they will interact with       tion should always be available whether a user requested
explanation, participants wanted to be able to request      it or not. The lowest-rated component under the Cus-
detailed information rather than presented with the full    tomisation group was user education and was deemed
long explanation in one go. The mentioned that the ex-      unnecessary since explanation should be understandable
planation detail could be presented as a link to an outside for lay users regardless of their educational background.
source or as a piece of expandable information. The User    Under the Interaction group, all components were rated
Mental Model outcome from the analysis can be seen in       as important except for open question. Some participants
Fig 3.                                                      were sceptical about openly asking questions to the AI
                                                            system and preferred to wait to ask questions to a doctor.
3.3. Target Mental Model                                       The final Target Mental Model is shown in Figure 4.
                                                            The explanation components included were obtained
In the Target Mental Model research stage, we identified from the combination of explanation components from
what key components of an explanation (from the expert the Expert Mental Model and User Mental Model, then re-
perspective - Expert Mental Model) the users might want vised according on users’ perceptions and preference on
to be included in a AI explanation User Interface (UI). The experts views. The explanation components with lower
Expert Mental Model’s explanation components were rating score are indicated with lighter text in figure. As
combined with the explanation components from the an additional step, we went back to the experts and asked
User Mental Model to form the Target Mental Model. a follow-up question to the medical experts for the ex-
We conducted semi-structured interviews with the same planation components that appeared in the User Mental
group of non-expert participants involved in the User Model but not in the Expert Mental Model, such as system
Mental Model definition.                                    accountability and data and doctor appointment. Accord-
   During the interviews, the main guiding question was: ing to them, system’s certification and accountability
which explanation components users want to be realised were not essential to be included in the explanation. If
in a UI to explain AI results? We asked participants to the application is recommended by the healthcare au-
reflect on the explanation components from the Expert thority (e.g., for the UK, NHS), it would be considered
Mental Model and discuss which one they considered enough for them. The doctor appointment component
most important and valuable. Participants were asked did not come up in the interview before because they
to explicitly reflect on each explanation component by expected it as a given feature.
giving a rating of importance (form 0-10) and expressing
their opinion on each of them. Based on the critical anal-
                          Figure 4: Target Mental Model analysis result: explanation components


  Explanation Design Guidelines (EDG)                                   Descriptions                             Requisite
                 Disease Information          general disease information e.g.: name, symptoms, caused           Yes
                 Disease Treatment            treatment options and information                                  Yes
  Information    Next Plan/Step               next step user could take following the result                     Yes
  Included       System Information           general system information e.g.: data used, system certification   Yes
  (EDG1-EDG7)    System Input                 data the user inputted                                             Yes
                 System Process               system algorithm or the technical process to gets its results      Optional
                 System Output                system result e.g.:pre-diagnosis, recommendation                   Yes
                 Empathy
  Information                                 delicately deliver the results with carefully selected words       Yes
                 (Reassuring Words)
  Delivery
                                              uncomplicated wording that is acceptable for lay users
  (EDG8-EDG9)    Simple and General                                                                              Optional
                                              from various education background and level
                Input Check                   for the user to check the input (is it correct or not)             Yes
  Interaction   Doctor Appointment            for the user to make a doctor appointment                          Yes
  Included      Open Question                 for the user to ask open questions                                 Optional
  (EDG10-EDG14) Input Comparison
                                              for the user to compare the result with other data                 Yes
                (Visualisation)
                Detail Request                for the user to request detailed information                       Yes
Table 2
Our 14 explanation design guidelines, categorised by information included, information delivery, and interaction included.


3.4. Design Guidelines and Prototype                            lation (GDPR) and European Commission Checklist for
                                                                Trustworthy Artificial Intelligence (ALTAI)1 . According
By reflecting on the findings of the Target Mental Model,
                                                                to these regulations, explanation should be always pro-
we propose 14 explanation components/design guide-
                                                                vided, by law, to any uses when AI is involved. AI in
lines for trustworthy AI medical support system inter-
                                                                healthcare was classified as high-risk AI according to
faces (See Table 2). Those guidelines were grouped into
                                                                White Paper On Artificial Intelligence by European Com-
three categories that mirrored the Target Mental Model’s
                                                                mission 2 , which makes explanation availability even
explanation components sets: Explanation Content/Infor-
                                                                more essential in a healthcare scenario. We therefore re-
mation to be Included, Explanation Customisation/Infor-
                                                                moved the ”explanation request” option from the design
mation Delivery, and Explanation Interaction/Interaction
                                                                guidelines since, even if desirable from an non-expert
to be afforded. Each guideline references the explana-
                                                                users perceptive, would be an unethical and unlawful
tion contents from the Target Mental Model, except for
                                                                design choice.
explanation request component.
   We decided to not include explanation request in the       1
                                                                https://ec.europa.eu/info/publications/white-paper-artificial-
guidelines in consideration of several regulations, such intelligence-european-approach-excellence-and-trust
as The European Union’s General Data Protection Regu-         2
                                                                https://ec.europa.eu/futurium/en/ai-alliance-
                                                                consultation/guidelines
   We then designed a user interface prototype based on       5. Limitation and Future Works
the guidelines at Table 2. We explored each guidelines’
presentation possibilities and the specific functionalities   There are several limitations of our study that should be
of the system that could realise them. We decided on          addressed in future works. The stage-based participatory
a website where the user could carry out breast self-         process is not complete. The final stage, which eval-
assessment based on screening images from their medical       uates the developed prototype’s effectiveness, has not
scan portable device. The final prototype was developed       carried out yet. We need to test whether the prototype
after several cycles of feedback between designers, and       has reached the design goals and wholly followed the de-
was then uploaded as a website at (https://retnolaras.        sign guidelines. To test if our prototype has reached the
github.io/care/).                                             design goals, which is to design an explanation that can
                                                              help user to make a considered trust judgements, we need
                                                              to assess if there is any change in users’ perception and
4. Discussion                                                 their trust level. To measure the change in user’s trust,
                                                              we plan to use a quantitative measurement instrument
Previous research have used the development of mental         [28] in a controlled experiment setting quantitatively
models to fully explore users’ understanding and help         measuring the extent to which each of the guidelines re-
the design of transparent AI systems in various contexts      alised in the prototype contributed to enable considered
[18][33][34], including the research which we adapted         trust judgements by non-expert users. In addition, we
this stage-based participatory process from [8]. As men-      will conduct a lab-study and interview to get qualitative
tioned in the Background section, the difference between      insight on both the prototype and the design guidelines.
our mental models with previous research is on what the          We also acknowledge limitations within the research
mental model is about and it’s richness. Our mental mod-      stages we had conducted. The participants involved were
els can be considered limited, in that they only draw on      recruited from our personal network, which might limit
how the users perceive a prospective AI system but not        the views variation in differing opinions. Finally, the ex-
on how it works in details, in a real life context. There-    planation design guidelines proposed by this paper have
fore unlike previous research our study cannot provide a      not yet been evaluated, both in the guidelines’ applica-
detailed understanding of how and why the AI system           bility across AI medical support systems variety; and the
works in practice [34]. Nonetheless, we successfully dis-     guidelines’ clarity. Finally, the prototype we developed
tilled different stakeholders insights on explanation of      only delved into one type of modality, a graphic user
a AI medical support system, and formed them as very          interface. How the design guidelines implemented to an
detailed mental models. We critically discussed the differ-   audio user interface or a conversational user interface
ence in understanding and perceptions of AI explanation       also needs further exploration.
needs, from an expert and non-expert perspective, we
discussed issues of explanation modality and interaction,
and combined expert and non-expert views in a target          6. Conclusion
mental model. The resulting design guidelines were also
contextualised to current practice and health regulations.    In this paper, we successfully applied a stage-based par-
   The explanation design guidelines we developed were        ticipatory design process to define future design guide-
based on critical reflections of Target Mental Model re-      lines for trustworthy AI healthcare system explanation
sults. There is definitely room for improvements where        interfaces for non-expert users. We developed an Expert
we can incorporate other AI design guidelines or explana-     Mental Model, User Mental Model, and Target Mental
tion recommendation to elaborate on our current guide-        Model of AI medical support system’s explanation. These
lines. For example, from the Amershi et al.’s guidelines      mental models captured the needs and visions of the dif-
[35]; AI should show contextually relevant information        ferent stakeholders involved in a human-AI explanation
(G4) and mitigate social biases (G6), we could add those      process in a healthcare scenario. We used the developed
guidelines to our guideline Information Delivery: Sim-        Target Mental Model to inform a set of 14 explanation
ple and General (EDG9). Another example, from [32];           design guidelines for the development of trustworthy AI
suggesting that explanation should be contrastive, could      Healthcare System Explanation Interfaces, which specifi-
contribute to our guideline for Interaction: Input Com-       cally catered for non-expert users, while still taking into
parison (Visualisation). A follow on critical literature      account medical experts’ practice and AI experts’ un-
review would also help to verify and validate our pro-        derstanding. These guidelines emerged as an outcome
posed design guidelines.                                      of several stages of interviews, feedback from different
                                                              types of stakeholders, thorough analysis of the current
                                                              literature, and critical reflections on the insights obtained
                                                              through the participatory process.
References                                                         plications to healthcare domain, arXiv preprint
                                                                   arXiv:1512.03542 (2015).
 [1] D. Gefen, E. Karahanna, D. W. Straub, Trust and          [16] J. B. Lyons, G. G. Sadler, K. Koltai, H. Battiste, N. T.
     tam in online shopping: an integrated model, MIS              Ho, L. C. Hoffmann, D. Smith, W. Johnson, R. Shiv-
     quarterly 27 (2003) 51–90.                                    ely, Shaping trust through transparent design: the-
 [2] GOV.UK, The future of healthcare: our vision for              oretical and experimental guidelines, in: Advances
     digital, data and technology in health and care, 2018.        in human factors in robots and unmanned systems,
     (Accessed on 02/10/2019).                                     Springer, 2017, pp. 127–136.
 [3] A. Alaszewski, Risk, trust and health, 2003.             [17] H. Cramer, V. Evers, S. Ramlal, M. Van Someren,
 [4] A. Holzinger, C. Biemann, C. S. Pattichis, D. B.              L. Rutledge, N. Stash, L. Aroyo, B. Wielinga, The
     Kell, What do we need to build explainable ai                 effects of transparency on trust in and acceptance of
     systems for the medical domain?, arXiv preprint               a content-based art recommender, User Modeling
     arXiv:1712.09923 (2017).                                      and User-adapted interaction 18 (2008) 455.
 [5] Z. C. Lipton, The doctor just won’t accept that!,        [18] C.-H. Tsai, P. Brusilovsky, Designing explanation
     arXiv preprint arXiv:1711.08037 (2017).                       interfaces for transparency and beyond., in: IUI
 [6] D. Gunning, Explainable artificial intelligence (xai)         Workshops, 2019.
     (2017).                                                  [19] B. G. Glaser, A. L. Strauss, Discovery of grounded
 [7] M. R. Cohen, J. L. Smetzer, Ismp medication error re-         theory: Strategies for qualitative research, Rout-
     port analysis: Understanding human over-reliance              ledge, 1967.
     on technology it’s exelan, not exelon crash cart drug    [20] R. Buckman, How to break bad news: a guide for
     mix-up risk with entering a “test order”, Hospital            health care professionals, JHU Press, 1992.
     pharmacy 52 (2017) 7.                                    [21] M. W. Rabow, S. J. Mcphee, Beyond breaking bad
 [8] M. Eiband, H. Schneider, M. Bilandzic, J. Fazekas-            news: how to help patients who suffer., Western
     Con, M. Haug, H. Hussmann, Bringing trans-                    Journal of Medicine 171 (1999) 260.
     parency design into practice, in: 23rd international     [22] W. F. Baile, R. Buckman, R. Lenzi, G. Glober, E. A.
     conference on intelligent user interfaces, 2018, pp.          Beale, A. P. Kudelka, Spikes—a six-step protocol
     211–223.                                                      for delivering bad news: application to the patient
 [9] B. Y. Lim, A. K. Dey, Design of an intelligible mobile        with cancer, The oncologist 5 (2000) 302–311.
     context-aware application, in: Proceedings of the        [23] J. L. Szalma, G. S. Taylor, Individual differences in
     13th international conference on human computer               response to automation: The five factor model of
     interaction with mobile devices and services, 2011,           personality., Journal of Experimental Psychology:
     pp. 157–166.                                                  Applied 17 (2011) 71.
[10] P. Pu, L. Chen, Trust building with explanation in-      [24] G. Ho, D. Wheatley, C. T. Scialfa, Age differences in
     terfaces, in: Proceedings of the 11th international           trust and reliance of a medication management sys-
     conference on Intelligent user interfaces, ACM,               tem, Interacting with Computers 17 (2005) 690–710.
     2006, pp. 93–100.                                        [25] B. M. Yalaza M, İnan A, Male breast cancer, J Breast
[11] B. Y. Lim, A. K. Dey, Evaluating intelligibility us-          Health (2016).
     age and usefulness in a context-aware application,       [26] V. Braun, V. Clarke, Using thematic analysis in
     in: International Conference on Human-Computer                psychology, Qualitative research in psychology 3
     Interaction, Springer, 2013, pp. 92–101.                      (2006) 77–101.
[12] E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz,   [27] B. F. Malle, How the mind explains behavior: Folk
     W. Stewart, Retain: An interpretable predictive               explanations, meaning, and social interaction, Mit
     model for healthcare using reverse time attention             Press, 2006.
     mechanism, in: Advances in Neural Information            [28] R. Larasati, A. De Liddo, E. Motta, The effect of ex-
     Processing Systems, 2016, pp. 3504–3512.                      planation styles on user’s trust, in: Proceedings of
[13] F. Jiang, Y. Jiang, H. Zhi, Y. Dong, H. Li, S. Ma,            the Workshop on Explainable Smart Systems for Al-
     Y. Wang, Q. Dong, H. Shen, Y. Wang, Artificial                gorithmic Transparency in Emerging Technologies
     intelligence in healthcare: past, present and future,         co-located with IUI 2020, 2020.
     Stroke and vascular neurology 2 (2017) 230–243.          [29] P. Lipton, Contrastive explanation, Royal Institute
[14] A. Bussone, S. Stumpf, D. O’Sullivan, The role of ex-         of Philosophy Supplements 27 (1990) 247–266.
     planations on trust and reliance in clinical decision    [30] D. J. Hilton, Conversational processes and causal
     support systems, in: 2015 International Conference            explanation., Psychological Bulletin 107 (1990) 65.
     on Healthcare Informatics, IEEE, 2015, pp. 160–169.      [31] S. Wachter, B. Mittelstadt, C. Russell, Counterfac-
[15] Z. Che, S. Purushotham, R. Khemani, Y. Liu, Dis-              tual explanations without opening the black box:
     tilling knowledge from deep networks with ap-                 Automated decisions and the gpdr, Harv. JL & Tech.
     31 (2017) 841.
[32] T. Miller, Explanation in artificial intelligence: In-
     sights from the social sciences, Artificial Intelli-
     gence (2018).
[33] T. Kulesza, S. Stumpf, M. Burnett, S. Yang, I. Kwan,
     W.-K. Wong, Too much, too little, or just right?
     ways explanations impact end users’ mental models,
     in: 2013 IEEE Symposium on Visual Languages and
     Human Centric Computing, IEEE, 2013, pp. 3–10.
[34] T. Kulesza, S. Stumpf, M. Burnett, I. Kwan, Tell me
     more?: the effects of mental model soundness on
     personalizing an intelligent agent, in: Proceedings
     of the SIGCHI Conference on Human Factors in
     Computing Systems, ACM, 2012, pp. 1–10.
[35] S. Amershi, D. Weld, M. Vorvoreanu, A. Fourney,
     B. Nushi, P. Collisson, J. Suh, S. Iqbal, P. N. Bennett,
     K. Inkpen, et al., Guidelines for human-ai interac-
     tion, in: Proceedings of the 2019 CHI Conference
     on Human Factors in Computing Systems, ACM,
     2019, p. 3.