<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Liddo);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Interface: Explanation Design for Non-Expert User Trust</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Retno Larasati</string-name>
          <email>retno.larasati@open.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna De Liddo</string-name>
          <email>anna.deliddo@open.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enrico Motta</string-name>
          <email>enrico.motta@open.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Explanation, Trust, Explainable Artificial Intelligence, AI Healthcare, Design Guidelines, Participatory Design</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AI Healthcare Systems explanation interfaces</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Knowledge Media Institute, The Open University.</institution>
          <addr-line>Walton Hall, Milton Keynes</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>construction of an Expert Mental Model (what ”can be</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>explained”) and a User Mental Model (what ”needs” to</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1952</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Research indicates that non-expert users tend to either over-trust or distrust AI systems. This raises concerns when AI is applied to healthcare, where a patient trusting the advice of an unreliable system, or completely distrusting a reliable one, can lead to fatal incidents or missed healthcare opportunities. Previous research indicated that explanations can help users to make appropriate judgements on AI Systems' trust, but how to design AI explanation interfaces for non-expert users in a medical support scenarios is still an open research challenge. This paper explores a stage-based participatory design process to develop a trustworthy explanation interface for non-experts in an AI medical support scenario. A trustworthy explanation is an explanation that helps users to make considered judgments on trusting (or not) and AI system for their healthcare. The objective of this paper was to identify the explanation components that can efectively inform the design of a trustworthy explanation interface. To achieve that, we undertook three data collections, examining experts' and non-experts' perceptions of AI medical support system's explanations. We then developed a User Mental Model, an Expert Mental Model, and a Target Mental Model of explanation, describing how non-expert and experts understand explanations, how their understandings difer, and how it can be combined. Based on the Target Mental Model, we then propose a set of 14 explanation design guidelines for trustworthy AI Healthcare System explanation, that take into account non-expert users needs, medical experts practice, and AI experts understanding.</p>
      </abstract>
      <kwd-group>
        <kwd>of artificially intelligent partners” [ 6]</kwd>
        <kwd>Nevertheless</kwd>
        <kwd>the</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
lish the right level of trust in an AI system, is
progressively becoming an ethical and societal need. Trust is
humans’ primary reason for acceptance [1], without which
the fair and accountable adoption of AI in healthcare
may never actualise. The UK government issued a
policy paper that declared its vision for AI to ”transform
the prevention, early diagnosis and treatment of chronic
diseases by 2030” [2], and this might not be achieved if
there is an impediment to AI adoption and AI usage from
the general public (non-expert healthcare customers).</p>
      <p>Developing trust is particularly crucial in healthcare
because it involves uncertainty and risks for vulnerable
patients [3]. However, the lack of explainability,
transparency, and human understanding of how AI works are
key reasons why people have little trust in AI healthcare
applications; and research indicates that transparency
[4] and understandability [5] can be efectively used as
means to enhance trust in AI systems. Explainable AI
is argued to be essential ”to understand, appropriately
(E. Motta)</p>
      <p>0000-0002-6412-2598 (R. Larasati); 0000-0003-0301-1154 (A. De
ing the two models in a Target Mental Model, which
describes ”how to convey the explanation” by design and sense of an explanation (what does the users’ mental
developing a prototype technology. model of an explanation looks like). Unlike previous</p>
      <p>To build the Expert Mental Model, depicting the key studies [8][18], we did not have an available working
components of explanation that need to be communi- system to understand the users’ mental models. This
difcated to patients, we carried out a series of interviews ference afected how we elicited users’ feedback. We
conwith medical professionals. Second, we conducted semi- ceptualised and used a hypothetical AI diagnosis system
structured interviews with non-experts to identify the (inspired by similar commercial systems) to interrogate
User Mental Model, which captures users’ needs and both expert and non-expert, and elicited their mental
expectations in terms of AI explanation. Finally, we con- models from reflections on the system and previous
exducted a third set of semi-structured interviews with both perience with healthcare explanation. Our hypothetical
AI experts and non-experts to determine how explana- system was a Breast Cancer Self-Assessment system, a
tion content could be communicated to the non-expert medical system to assess breast cancer risk tailored for
users to respond to the identified users needs (Target non-experts.</p>
      <p>Mental Model). From the Target Mental Model, we then As mentioned above, following Eiband et al. [8], we
derived a list of design guidelines, which we then used carried out a stage-based participatory process consisting
to develop a prototype explanation interface for an AI of two phases and five stages. The first phase focused on
breast cancer risk assessment support systems. In par- ”what” to explain and consisted of two stages: the Expert
ticular, we focused on a self-managed breast cancer risk Mental Model definition and the User Mental Model
defiscenario, in which results of mammography scans are nition. The second phase focused on ”how to convey the
automatically analysed by an AI system and need to be explanation” and consisted of three stages: the Target
communicated to the prospective patients. We choose a Mental Model construction, the Prototype development
self-managed health scenario, because it represents the stage (to implement the Target Mental Model in a realistic
extreme case, in which non-expert users are presented application case), and the Evaluation stage, to further test
with AI results, without any support from medial or AI the prototype technology. In this paper, we conducted
experts, and therefore the explanation is the only medi- four out of five stages in Figure 1, leaving further testing
ating interface between patients and the AI system. and evaluation of the prototype for future research. Each
stage is described in details in the next section.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>In recent years, several studies explored diferent ap</title>
        <p>proaches to design explanation of the outputs from
intelligent systems [9][10][11]. Some of the research focused
on explanation designs for AI healthcare systems [12][13].
Despite the fact that many approaches have been
proposed, the explanation design for AI healthcare system
mostly targets expert user [14][15]. Explanation design
specifically targeted to non-expert users has received
scarce attention, despite the recognised importance of
improving non-expert user’s understanding of the AI
system to positively afect users’ trust in the system[ 16],
and trust in the system recommendations [17].</p>
        <p>To improve users’ understanding of the AI system with
explanation, we first need to determine how they make</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. STUDY DESIGN AND</title>
    </sec>
    <sec id="sec-4">
      <title>METHODOLOGY:</title>
    </sec>
    <sec id="sec-5">
      <title>STAGE-BASED</title>
    </sec>
    <sec id="sec-6">
      <title>PARTICIPATORY PROCESS</title>
    </sec>
    <sec id="sec-7">
      <title>FOR EXPLANATION DESIGN</title>
      <sec id="sec-7-1">
        <title>3.1. Experts Mental Model</title>
        <sec id="sec-7-1-1">
          <title>The expert mental model definition stage aimed at captur</title>
          <p>ing experts’ understanding and vision of what an
appropriate explanation of AI medical support system results
to non-experts should look like. The experts involved
in its development were both machine learning
technologists and medical professionals. This research stage
aimed at defining what can or should be explained to the variables,[...]. They can interact with the app and see the
wider public from an expert perspective, by distilling a simulation.”- A2
series of explanation components, which represent the The system process answers varied considerably, and
Expert Mental Model. spanned from providing information such as features’</p>
          <p>Six participants were recruited by email, from the importance, to providing the name of the algorithm or
authors’ personal research and social networks, three who made it. ”It’s like, for example, if they’re trying to
were AI/machine learning developers/researchers, and recognise cancer in a certain image, so this is the feature
the other three were doctor/physicians (general practi- that helped me the most having this conclusion”- A1. ”You
tioners and oncology specialists). The main guiding ques- can try to show the formulation of the calculation. But some
tions that drove this stage were: what can be explained?; algorithms do provide explanation on how it works.”- A3
and what does an expert explanation for non-experts This means that even though this explanation component
looks like? We asked the questions based on participants was deemed important, AI experts were not clear on what
respective expertise (medical professionals and AI ex- and how to present it to non expert users.
perts). We also showed participants two examples of From the medical experts perspective on the other
breast cancer-related systems currently in commerce, hand, explanations they usually gave to patients consist
to understand how experts make sense of AI systems’ of disease information, possible treatments to choose, and
outputs and how they would explain the results to non- the next step for the patient to take. They mentioned that
experts. explaining diagnosis works diferently if the diagnosis
result is bad. ”When we deliver the diagnosis to a patient, we
Result and Analysis consider the situation as well. [...] For breaking bad news,
We analysed interviews’ data using Grounded Theory we usually deliver the news layer by layer. so not directly
[19]. Three sets of explanation components emerged. go to the diagnosis, we have some introduction first.” - M1.
The first set of explanation components entailed the Con- If the result is bad, reassuring words are needed to help
tent of the explanation, and described what information patients feel less stressed and worried. If the diagnosis
needs to be included in the explanation. The second set of result is good or if there is no sign of distress from the
explanation components entailed the required Customi- patient, there is less need for reassuring words. ”I think
sation of the explanation, what needs to be considered one of the important things if it’s about serious conditions,
when explaining, and changed accordingly on a case we need to put more empathy.”- M3 This is in line with
by case basis. The third set of explanation components previous research on medical explanations How to Break
entailed the explanation Interaction, the interactivity op- Bad News: A Guide for Health Care Professionals [20] and
portunities that need to be available to users during an similar explanation protocols have been proposed and
explanation. tested in the literature [21][22].</p>
          <p>In terms of content of the explanation itself, AI experts The medical experts mentioned that explanation was
answers were quite straight forward; users need to know not given by default but based on patients’ request and
about input, system process, and output. ”We have the customised to patients’ needs. ”It depends on how curious
inputs and intermediate results. The inputs are diferent they are. If the patient just wants to know the diagnosis,
variables, as a driver for the predictor and explanatory then I may just tell them about it.”- M3. AI experts also
mentioned explanation should probably only be provided several key components of explanation, which constitute
on request. According to the AI experts, they rarely the User Mental Model.
explain how the system works to non-expert user in a Szalma &amp; Taylor (2011) showed that trust propensity is
real-life situations unless the user asks for it. ”if the app one of the human-related factors that could afect the
is working properly you don’t need to explain. But if there response to an intelligence system [23]. To account for
is a problem, you need to explain what is going wrong.”- trust propensity, we sampled the participants based on
A3. One AI expert even argued that non-expert were not their dispositional trust towards an AI medical support
interested in knowing the logic behind/system process. ”I application and made sure there was a nearly equal
numhave never met a common user that is interested in artificial ber of people in each trust groups (the AI sceptic, the
intelligence or the machine learning of it. Even the expert open-minded, the AI enthusiast). We recruited four
parfrom the Ministry (people they work for), they were not ticipants for the three groups representing three levels
really curious.” - A2. of dispositional trust, with 12 participants in total. To</p>
          <p>The medical experts also reported that they assess identify the level of trust, we asked the perspective
parwhat the patient knows and the patient’s perception. One ticipants to answer the following question: ”if there was
medical expert mentioned that people who live in a rural a cancer risk assessment/self-detection application
availarea might have diferent knowledge than people who able on the market, how likely would you be to use it?
live in a big city, meaning the explanation is customised Please rate the likelihood from 1-7”. This question was
to the patients’ knowledge. ”People in the rural area, sent in advance of the interview invitation. The
particidon’t get the privilege to get a proper education, so it’s pants were then grouped into three groups, the sceptic
(1challenging for them to absorbs the explanation.”- M2 3 likelihood responses), the open-minded (4-5 likelihood</p>
          <p>The explanation components related to explanation in- responses), and the enthusiast (6-7 likelihood responses).
teraction, reflect on the modalities in which experts com- We sought to balance out the age range (twenties to
formunicate the explanation. Medical experts mentioned ties) because research suggests that age could afect users’
how they usually ask for confirmation about the patient trust towards a system, where older adults are more likely
symptoms and worries before making a diagnosis (input to trust the system than younger adults in a medical
mancheck). The second component related to the capability agement system (decision aid) [24]. We also balanced
for non-experts to raise open questions. After giving pa- out the male-female participants by recruited one male
tients their results, medical experts would always ask if in each group because we recognised despite male breast
there were any more questions. This interaction usually cancer is only accounting for less than 1% of all breast
involves a back and forth exchange, until the patients has cancer diagnoses worldwide [25], sometimes men are
no further questions. ”...Then we will explain what’s the included in the decision making towards the usage of
next step. And we will ask if they have any questions or a particular system for afected women close/related to
not. Including the diagnosis and the plan.”- M3. ”whenever them.
patients ask, we then answer the questions directly.”- M1. We followed the same interview structure as in the
ex</p>
          <p>One AI expert mentioned that showing how the output perts’ interviews. The main guiding questions we asked
changes could help non experts to understand the system the non-expert users were: how do users currently
unbetter (input manipulation and visualisation). AI experts derstand AI explanations?; what does a user explanation
also mentioned how it could be overwhelming for the looks like? We then showed the participants two
examuser to read all the explanation, and suggest it would ples of breast cancer-related systems to probe non-expert
be better to give users the option to request details if users reflections and feedback on the AI system’s result
they need them (details request). ”We need the user to and explanation.
see the general output, but they can expand on some detail.</p>
          <p>Making it simple, just a few statements, and the general
result, and if the user is curious, they can dig into it.”- A3.</p>
          <p>The Expert Mental Model outcome from the analysis can
be seen in Fig 2.</p>
          <p>Result and Analysis
We carried out a Thematic Analysis [26] to analyse the
interviews’ data. The same three sets of explanation
components could be identified Content, Customisation,
and Interaction. In receiving a diagnosis, participants
3.2. User Mental Model explained that they would like to know about the
disease information. They mentioned: disease name, disease
In the User Mental Model research stage, we captured symptoms and the severity of the disease as key
informausers’ understanding and their perspective on how ex- tion they would like to receive. Participants also reported
planation should be presented in an AI medical support they would like to know about the next step/action or
acsystem. The purpose of this stage was to acquire knowl- tion that they could or should take, for example,
informaedge about how do users currently make sense of expla- tion about the disease treatment that they should undergo
nations. This acquired knowledge was then structured in after diagnosis, or if they have to make an appointment
with their doctor or physician. ”you got cancer, and your have information about the volume of a database used
options would be these, these, and these, and this is how I to train the algorithm or the data features used for the
want to proceed. These are your options.” - E2. ”do I need prediction. ”So at least I have to know how big is their
to contact my physician directly or is there a next step that database.”- OM2. ”Explain to you the quality of features
is also provided by the application itself?”- OM1 and characteristics; it is because this thing has this colour</p>
          <p>These diagnosis-related explanations, both disease in- shape”- E2. However, participants also talked about the
formation and next step/action, could be considered more data they provide, their personal input data, they
exlocal/disease specific explanation. However, participants pressed concerns of data privacy, and demanded specific
also wanted more general explanations about the AI sys- information about that. ”And how am I sure that my breast
tem, system information, which was not related to either picture will not be leaked to be utilised for other intentions
the inputs or the results. One of the participants asked and such.”- OM1. ”Where are these data going?”- OM4.
for information about the system process/algorithm. ”I Participants also talked about the system accuracy,
would want to know, what are they doing actually in the credibility. ”However, for my health, I think it will be
background to do this?”- E1. However, not all participants quite beneficial if I know how accurate it can be” - OM1.
expressed their interest in knowing the system process; The credibility they mentioned was related to the
instisome were not keen to know the information. They ar- tute/company that developed the system. Credibility
gued that in a stressful situation, such as a positively could also mean if the system has been tested and
apassessed cancer, their focus would not be on the system proved by the appropriate health institution.
information and more on their well-being. ”Says that I Besides the information that should be included in the
have cancer, then I am not going to be interested in the explanation, participants also talked about how the
exsystem process”- S1. planation should be delivered. Participants demanded for</p>
          <p>This arguments match with the AI experts opinion the AI results to be presented with care and empathy,
eswe mentioned above, recognising that non-experts may pecially if their result was not in their favour. Empathy is
usually not be interested in knowing the technical side the ability to understand and share the feelings of others,
of a system. Non-expert users reluctance to know the and an empathetic statement should include phrases that
technical information was a matter of timing and their help to establish a connection with the user. Participants
emotional state after receiving a diagnosis. However, it mentioned empathy or reassuring words only in the case
also reflects their reluctance caused by the possibility of of ”bad news” or presented if the result is not good;
therenot understanding the technical terms used to explain fore, we put it under customisation in the User Mental
the process. ”I mean, the very hard, fine grain details? it Model.
will be incomprehensible for me because I am not familiar ”if I want to use text explanation, I think you should be, in
with the technology and everything.”- E2. this confirms terms of style of shaping the statement that you present to
previous research, arguing that what people consider the user, I think you should always follow, sort of defensive
acceptable and understandable explanation depends on language. So again, it might be quite direct and aggressive
people’s domain knowledge or role [27][28]. to say to the user, you have cancer, exclamation mark.</p>
          <p>Another emerging explanation component was system [...] Be a little bit more reserved, rather than explicit, into
data. non experts users mentioned they would like to your statements because it’s quite a sensitive matter.”- E2.
content 1d0isease info treatment next plan</p>
          <p>10 10
custom e8xplanation req 4user education e1m0pathy
interact i1n0put check open question input manip detail
7 10 10
Participants who expected care and empathy were more
concerned with the choice of words and how ”delicately”
the AI system delivers the diagnosis results.</p>
          <p>Other than text and words, participants mentioned the
use of graphics and images to communicate the
explanation, for example, by showing comparison images of nor- Table 1
mal condition vs abnormal condition. The graphic/image Median values of Expert Mental Model explanation
compoto show comparison, we put under explanation content nents rating to inform Target Mental Model
in User Mental Model, because regardless of the result
(good/bad), the user wants to see the opposite case and
decide themselves if the result makes sense to them or ysis of User and Expert Mental Model, combined with
not. ”Perhaps have some examples of how afected breast the analysis of users’ feedback of expert mental model
looks like, how unafected breast looks like. So you can views, we distilled the Target Mental Model.
compare yourself with what is being put in your
input.”E2. ”and then the image comparing, you know, both, my re- Result and Analysis
sults and the healthy ones.”- S1. Participants requested to The median values of explanation components’ rating
show the opposite case was in line with the literature in given by the users are reported in Table 1. Under the
Concognitive psychology, which states that human explana- tent explanation components set in the Expert Mental
tions are sought in response to particular counterfactual Model, the system process was not seen as crucial, since
cases [29][30]. Our finding confirm that counterfactual not all participants were interested in knowing the
techcase/contrastive explanation is argued to be an explana- nicality of how the AI system made a decision/prediction.
tion that is understandable for user [31][32][28]. Under the Customisation set of explanation
compo</p>
          <p>For interaction with the AI system, participants ex- nents, empathy/reassuring word was rated high by the
pressed their needs for a course of action, which is an participants. User request was also rated relatively high
additional feature of doctor appointment included in the because some of the participants argued that
explanaexplanation interface. In how they will interact with tion should always be available whether a user requested
explanation, participants wanted to be able to request it or not. The lowest-rated component under the
Cusdetailed information rather than presented with the full tomisation group was user education and was deemed
long explanation in one go. The mentioned that the ex- unnecessary since explanation should be understandable
planation detail could be presented as a link to an outside for lay users regardless of their educational background.
source or as a piece of expandable information. The User Under the Interaction group, all components were rated
Mental Model outcome from the analysis can be seen in as important except for open question. Some participants
Fig 3. were sceptical about openly asking questions to the AI
system and preferred to wait to ask questions to a doctor.
3.3. Target Mental Model The final Target Mental Model is shown in Figure 4.
The explanation components included were obtained
In the Target Mental Model research stage, we identified from the combination of explanation components from
what key components of an explanation (from the expert the Expert Mental Model and User Mental Model, then
reperspective - Expert Mental Model) the users might want vised according on users’ perceptions and preference on
to be included in a AI explanation User Interface (UI). The experts views. The explanation components with lower
Expert Mental Model’s explanation components were rating score are indicated with lighter text in figure. As
combined with the explanation components from the an additional step, we went back to the experts and asked
User Mental Model to form the Target Mental Model. a follow-up question to the medical experts for the
exWe conducted semi-structured interviews with the same planation components that appeared in the User Mental
group of non-expert participants involved in the User Model but not in the Expert Mental Model, such as system
Mental Model definition. accountability and data and doctor appointment.
Accord</p>
          <p>During the interviews, the main guiding question was: ing to them, system’s certification and accountability
which explanation components users want to be realised were not essential to be included in the explanation. If
in a UI to explain AI results? We asked participants to the application is recommended by the healthcare
aureflect on the explanation components from the Expert thority (e.g., for the UK, NHS), it would be considered
Mental Model and discuss which one they considered enough for them. The doctor appointment component
most important and valuable. Participants were asked did not come up in the interview before because they
to explicitly reflect on each explanation component by expected it as a given feature.
giving a rating of importance (form 0-10) and expressing
their opinion on each of them. Based on the critical
anal</p>
        </sec>
      </sec>
      <sec id="sec-7-2">
        <title>3.4. Design Guidelines and Prototype</title>
        <p>By reflecting on the findings of the Target Mental Model,
we propose 14 explanation components/design
guidelines for trustworthy AI medical support system
interfaces (See Table 2). Those guidelines were grouped into
three categories that mirrored the Target Mental Model’s
explanation components sets: Explanation
Content/Information to be Included, Explanation
Customisation/Information Delivery, and Explanation Interaction/Interaction
to be aforded. Each guideline references the
explanation contents from the Target Mental Model, except for
explanation request component.</p>
        <p>We decided to not include explanation request in the
guidelines in consideration of several regulations, such
as The European Union’s General Data Protection
Regulation (GDPR) and European Commission Checklist for
Trustworthy Artificial Intelligence (ALTAI) 1. According
to these regulations, explanation should be always
provided, by law, to any uses when AI is involved. AI in
healthcare was classified as high-risk AI according to
White Paper On Artificial Intelligence by European
Commission 2, which makes explanation availability even
more essential in a healthcare scenario. We therefore
removed the ”explanation request” option from the design
guidelines since, even if desirable from an non-expert
users perceptive, would be an unethical and unlawful
design choice.</p>
        <sec id="sec-7-2-1">
          <title>1https://ec.europa.eu/info/publications/white-paper-artificial</title>
          <p>intelligence-european-approach-excellence-and-trust
2https://ec.europa.eu/futurium/en/ai-allianceconsultation/guidelines</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>5. Limitation and Future Works</title>
      <p>We then designed a user interface prototype based on
the guidelines at Table 2. We explored each guidelines’
presentation possibilities and the specific functionalities There are several limitations of our study that should be
of the system that could realise them. We decided on addressed in future works. The stage-based participatory
a website where the user could carry out breast self- process is not complete. The final stage, which
evalassessment based on screening images from their medical uates the developed prototype’s efectiveness, has not
scan portable device. The final prototype was developed carried out yet. We need to test whether the prototype
after several cycles of feedback between designers, and has reached the design goals and wholly followed the
dewas then uploaded as a website at (https://retnolaras. sign guidelines. To test if our prototype has reached the
github.io/care/). design goals, which is to design an explanation that can
help user to make a considered trust judgements, we need
to assess if there is any change in users’ perception and
4. Discussion their trust level. To measure the change in user’s trust,
we plan to use a quantitative measurement instrument
Previous research have used the development of mental [28] in a controlled experiment setting quantitatively
models to fully explore users’ understanding and help measuring the extent to which each of the guidelines
rethe design of transparent AI systems in various contexts alised in the prototype contributed to enable considered
[18][33][34], including the research which we adapted trust judgements by non-expert users. In addition, we
this stage-based participatory process from [8]. As men- will conduct a lab-study and interview to get qualitative
tioned in the Background section, the diference between insight on both the prototype and the design guidelines.
our mental models with previous research is on what the We also acknowledge limitations within the research
mental model is about and it’s richness. Our mental mod- stages we had conducted. The participants involved were
els can be considered limited, in that they only draw on recruited from our personal network, which might limit
how the users perceive a prospective AI system but not the views variation in difering opinions. Finally, the
exon how it works in details, in a real life context. There- planation design guidelines proposed by this paper have
fore unlike previous research our study cannot provide a not yet been evaluated, both in the guidelines’
applicadetailed understanding of how and why the AI system bility across AI medical support systems variety; and the
works in practice [34]. Nonetheless, we successfully dis- guidelines’ clarity. Finally, the prototype we developed
tilled diferent stakeholders insights on explanation of only delved into one type of modality, a graphic user
a AI medical support system, and formed them as very interface. How the design guidelines implemented to an
detailed mental models. We critically discussed the difer- audio user interface or a conversational user interface
ence in understanding and perceptions of AI explanation also needs further exploration.
needs, from an expert and non-expert perspective, we
discussed issues of explanation modality and interaction,
and combined expert and non-expert views in a target 6. Conclusion
mental model. The resulting design guidelines were also
contextualised to current practice and health regulations. In this paper, we successfully applied a stage-based
par</p>
      <p>The explanation design guidelines we developed were ticipatory design process to define future design
guidebased on critical reflections of Target Mental Model re- lines for trustworthy AI healthcare system explanation
sults. There is definitely room for improvements where interfaces for non-expert users. We developed an Expert
we can incorporate other AI design guidelines or explana- Mental Model, User Mental Model, and Target Mental
tion recommendation to elaborate on our current guide- Model of AI medical support system’s explanation. These
lines. For example, from the Amershi et al.’s guidelines mental models captured the needs and visions of the
dif[35]; AI should show contextually relevant information ferent stakeholders involved in a human-AI explanation
(G4) and mitigate social biases (G6), we could add those process in a healthcare scenario. We used the developed
guidelines to our guideline Information Delivery: Sim- Target Mental Model to inform a set of 14 explanation
ple and General (EDG9). Another example, from [32]; design guidelines for the development of trustworthy AI
suggesting that explanation should be contrastive, could Healthcare System Explanation Interfaces, which
specificontribute to our guideline for Interaction: Input Com- cally catered for non-expert users, while still taking into
parison (Visualisation). A follow on critical literature account medical experts’ practice and AI experts’
unreview would also help to verify and validate our pro- derstanding. These guidelines emerged as an outcome
posed design guidelines. of several stages of interviews, feedback from diferent
types of stakeholders, thorough analysis of the current
literature, and critical reflections on the insights obtained
through the participatory process.
31 (2017) 841.
[32] T. Miller, Explanation in artificial intelligence:
Insights from the social sciences, Artificial
Intelligence (2018).
[33] T. Kulesza, S. Stumpf, M. Burnett, S. Yang, I. Kwan,</p>
      <p>W.-K. Wong, Too much, too little, or just right?
ways explanations impact end users’ mental models,
in: 2013 IEEE Symposium on Visual Languages and</p>
      <p>Human Centric Computing, IEEE, 2013, pp. 3–10.
[34] T. Kulesza, S. Stumpf, M. Burnett, I. Kwan, Tell me
more?: the efects of mental model soundness on
personalizing an intelligent agent, in: Proceedings
of the SIGCHI Conference on Human Factors in</p>
      <p>Computing Systems, ACM, 2012, pp. 1–10.
[35] S. Amershi, D. Weld, M. Vorvoreanu, A. Fourney,</p>
      <p>B. Nushi, P. Collisson, J. Suh, S. Iqbal, P. N. Bennett,
K. Inkpen, et al., Guidelines for human-ai
interaction, in: Proceedings of the 2019 CHI Conference
on Human Factors in Computing Systems, ACM,
2019, p. 3.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>