=Paper= {{Paper |id=Vol-2582/paper6 |storemode=property |title=The Effect of Explanation Styles on User's Trust |pdfUrl=https://ceur-ws.org/Vol-2582/paper6.pdf |volume=Vol-2582 |authors=Retno Larasati,Anna De Liddo,Enrico Motta |dblpUrl=https://dblp.org/rec/conf/iui/LarasatiLM20 }} ==The Effect of Explanation Styles on User's Trust== https://ceur-ws.org/Vol-2582/paper6.pdf
                       The Effect of Explanation Styles on User’s Trust
                  Retno Larasati                                           Anna De Liddo                                  Enrico Motta
           retno.larasati@open.ac.uk                                 anna.deliddo@open.ac.uk                       enrico.motta@open.ac.uk
           Knowledge Media Institute                                 Knowledge Media Institute                     Knowledge Media Institute
              The Open University                                      The Open University                           The Open University
                       UK                                                       UK                                            UK
ABSTRACT                                                                               that a key role in calibrating trust can be played by the way in which
This paper investigates the effects that different styles of textual                   explanation is expressed and presented to the users. Explanation
explanation have on explainee’s trust in an AI medical support                         style and modalities affect users’ trust toward algorithmic systems
scenario. From the literature, we focused on four different styles                     sometime improving sometime reducing trust [12][25]. This paper
of explanation: contrastive, general, truthful, and thorough. We                       aims to investigate the relation between explanation and trust by
conducted a user study in which we presented explanations of a                         exploring different explanation styles. We first conducted a litera-
fictional mammography diagnosis application system to 48 non-                          ture review in psychology, philosophy, and information systems,
expert users. We carried out a between-subject comparison between                      to understand what are the characteristics of meaningful expla-
four groups of 11-13 people each looking at a different explanation                    nations. We then designed several styles of explanation based on
style. Our findings suggest that contrastive and thorough explana-                     these characteristics. Since we are interested in assessing the effects
tions produce higher personal attachment trust scores compared to                      of explanation styles on users’ trust, we also defined a variety of
general explanation style, while truthful explanation shows no dif-                    trust components to measure users’ trust levels. Our proposed trust
ference compared to the rest of explanations. This means that users                    measurement was gathered from the literature in human factors
who received contrastive and thorough explanation types found                          and HCI research. Finally we carried out a user study to see if
the explanation given significantly more agreeable and suiting their                   any specific explanation style differently affects users’ trust. Our
personal taste. These findings, even though not conclusive, confirm                    contribution is twofold:
the impact of explanation style on users trust towards AI systems                         (1) we provide evidence which confirms the effect of explanation
and may inform future explanation design and evaluation studies.                              styles on different trust factors;
                                                                                          (2) we propose a reliable human-AI trust measurement (Cron-
CCS CONCEPTS                                                                                  bach’s α=0.88) to investigate explanation and trust in health-
• Human-centered computing → Human Computer Interaction.                                      care.
                                                                                       Thie rest of the paper is organized as follows: Section 2 introduces
KEYWORDS                                                                               the context of this research and summarises the relevant literature.
Explanation, Trust, Explainable Artificial Intelligence                                Section 3 describes the methodology of this study. Section 4 and 5
ACM Reference Format:                                                                  presents and analyse the results from the study. Finally, Section 6
Retno Larasati, Anna De Liddo, and Enrico Motta. 2020. The Effect of Expla-            discusses the limitation of this work and outlines the next steps of
nation Styles on User’s Trust. In Proceedings of IUI workshop on Explainable           the research.
Smart Systems and Algorithmic Transparency in Emerging Technologies (ExSS-
ATEC’20). , 6 pages.                                                                   2 BACKGROUND AND RELATED WORK
1    INTRODUCTION                                                                      2.1 Explanation
                                                                                       Explanation can be seen as an act or a product and can be cate-
One of the main arguments motivating Explainable Artificial Intel-
                                                                                       gorised as good or bad. A good explanation is an explanation that
ligence research is that the explicability of AI systems can improve
                                                                                       feels right because offers a phenomenologically familiar sense of
people’s trust and adoption of AI solutions [8][27]. Still, the rela-
                                                                                       understanding [1]. In this paper, we focus on meaningful explana-
tionships between trust and explanation is complex, and it is not
                                                                                       tion, to stress our interest and focus on the explanation’s capability
always the case that explicability improves users’ trust. Trust in
                                                                                       to improve understanding and sense-making of AI and algorithmic
AI systems is claimed to be enhanced by transparency [11] and
                                                                                       results. As such good explanations are not explanations that neces-
understandability [17]. In order to gain understandability, an AI
                                                                                       sarily improve trust, and can affect user’s trust both ways, by either
system should provide explanations that are meaningful to the ex-
                                                                                       improving or moderating trust.
plainee(someone who received explanation). Providing meaningful
                                                                                          We might ask what is meaningful explanation? There is no single
explanations could then support users to appropriately calibrate
                                                                                       definition of meaningful explanation. Guidotti et al. defined mean-
trust, by improving trust (when they tend to down-trust the system)
                                                                                       ingful explanation as explanation that is faithful and interpretable
and mitigating over-trust issues [31]. Previous research has shown
                                                                                       [7]. Thirumuruganathan et al. defined meaningful explanation as
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons   explanation that is personalised based on users’ demographic [29].
License Attribution 4.0 International (CC BY 4.0).
                                                                                       Regulators have also mentioned meaningful explanation. GDPR
ExSS-ATEC’20, March 2020, Cagliari, Italy,
© 2020                                                                                 Articles 13–15 state that users have the right to receive ‘meaningful
                                                                                       information about the logic involved’ in automated decisions, but
ExSS-ATEC’20, March 2020, Cagliari, Italy,
                                                                                                                                   Larasati, et al.


it fails to provide any specific definition of what is to be considered         Table 1: Characteristics of Meaningful Explanation
’meaningful information’. In this paper, we will refer to meaningful
explanation as explanation that is understandable.                         Explanation                 Description
    In cognitive psychology, explanation can be classified into dif-       contrastive                 the cause of something relative to some
ferent types: i. Causal explanation, which tells you what causes                                       other thing in contrast [16][10][22]
what, ii. Mechanical explanation, which tells you how a certain            domain/role dependent       pragmatic and relative to the back-
phenomenon comes about, and iii. Personal explanation which tells                                      ground context [10][20]
you what causes what in the context of personal reasons or beliefs         general                     simpler and broad explanation is prefer-
[32]. Approaching these definitions from an explainable AI and AI                                      able [26][18]
reasoning angle, we could say that causal and mechanical expla-            social/interactive          people explain to transfer knowledge,
nation could be the same, because the causal explanation of an AI                                      thus can be a social exchange [10][22]
system is mechanical by definition. For instance, if we ask why the        truthful                    how truthful each elements in an expla-
AI system gives us a certain prediction, the answer will consist of                                    nation is with respect to the underlying
an illustration of the AI’s mechanical process, which produced that                                    system [14]
prediction result. Personal explanation might also not be relevant,        thorough                    describes all of the underlying system
since all AI "personal" explanations are defined in terms of what                                      [14]
causes what in the context of a specific AI reasoning mechanism.
Therefore, in what follows we will focus on causal explanation.
    Hilton proposed that causal explanation proceeds through the          describes all of the underlying system. Completeness is argued
operation of counterfactual and contrastive criteria [10]. Lipton         to positively affect user understandability [13]. Even though both
suggested that "to explain why P rather than Q, we must cite a causal     of Kulesza’s studies used explanation in the case of a music rec-
difference between P and not-Q, consisting of a cause of P and the        ommender system, we think that being truthful (soundness) and
absence of a corresponding event in the history of not-Q” [16]. Miller    thorough (completeness) are key characteristics of explanations to
quoted Lipton and argued that everyday explanations, or human             be further explored. Building on the literature reviewed above, we
explanations, are “sought in response to particular counterfactual        therefore distilled 6 key characteristics of meaningful explanation,
cases. [...]people do not ask why event P happened, but rather why        that are defined in Table 1.
event P happened instead of some event Q” [22].
    Causal explanation happens through several processes [10]. First,     2.2    Explanation and User’s Trust
there is information collection: a person gathers the information         There is arguably a relation between explanation and users’ trust.
available. Second, a causal diagnosis takes place: a person tries         According to the Defense Advanced Research Projects Agency
to identify a connection between two events/instances based on            (DARPA), Explainable AI is essential to enable human users to un-
the information. Third, there is causal selection, a person dignifies     derstand and appropriately trust a machine learning system [8]. Pre-
a set of conditions as "the explanation". This selection process is       vious studies proposing different types of explanation [27][2][9][14]
influenced by the information gathered and the domain knowledge           further cemented the claim that explanations improves user trust
of a person [20]. This means that what people consider acceptable         [30][24][5].
and understandable is selected from the information provided and              However, users’ trust could be misplaced and lead to over-reliance
depends on people’s own domain knowledge or role. According to            or over-trust. In a healthcare scenario, a doctor could unknowingly
Lambrozo, explanations that are simpler are judged more likely to be      trust a technologically complex laboratory diagnostic test that incor-
believed and more valuable [18] and another study also highlighted        rectly calibrated and misdiagnosed patients [4]. Previous research
that users prefer a combination of simple and broad explanations          suggests that giving explanation could help users to moderate their
[26].                                                                     trust level [31], either by providing explanation as system’s accu-
    As mentioned previously, explanation can be seen as an act or         racy [33][23] or as system’s confidence level [15]. On one hand,
can be seen as a product. Explanation as an act involves the interac-     these findings are not applied to healthcare. Hence, while system’s
tion between one or more explainer and explainee [22]. According          accuracy and system’s confidence level might be highly affecting
to Hilton, explanation is understandable only when it involves ex-        users’ trust in dating app [33], or context aware app [15], it is un-
plainer and explainee engaging in information exchange through            clear if that would be the case in a healthcare scenario. On the
dialogue, visual representation, or other communication modalities        other hand, in the healthcare/medical domain, Bussone et al. found
[10]. This statement implies that static explanations could be harder     that a high system’s confidence level had only a slight effect on
to understand because they could be less engaging and would not           over-reliance [3].
involve a dynamic interchange between explainer and explainee. To             There are a number of ways to present an explanation. For exam-
achieve meaningful explanation, a social (interactive) characteristic     ple, a study mentioned above, used accuracy level as explanation. It
of explanation needs to be taken into account.                            is important to know, what kind of style we are going to present our
    Previous research also showed that participants place the highest     explanation. Research found that explanation style and modalities
trust in explanations that are sound and complete [14]. Soundness         affect users trust toward algorithmic systems, with the result that
here means nothing but the truth, how truthful each element in an         this can either improve or decrease [12][25].
explanation is with respect to the underlying system. Completeness            In addition, in each of the reviewed studies trust was measured
here means the whole truth, the extent to which an explanation            differently, hence the results are hard to compare and do not provide
                                                                                                            ExSS-ATEC’20, March 2020, Cagliari, Italy,
The Effect of Explanation Styles on User’s Trust


a clear picture of the extent to which different styles of explana-                  Table 2: Human-AI Trust Measurement
tion affect different types of trust. To better understand users’ trust
towards an AI medical system, a more comprehensive trust mea-              Trust Factors                   Description
surement instrument is needed and will be explored in the next             perceived technical ability     system is perceived to perform the tasks
section.                                                                                                   accurately and correctly based on the
                                                                                                           information that is input.
                                                                           perceived reliability           system is perceived to be, in the usual
2.3     Trust Measurement                                                                                  sense of repeated, consistent function-
In general, there is quite a large literature presenting scales for                                        ing.
measuring trust. This paper will focus on identifying an appropriate       perceived understandability     user can form a mental model and pre-
scale for the assessment of human trust in a machine prediction                                            dict future system behaviour.
system, which can be contextualised to a healthcare scenario.              personal attachment             user finds using the system agreeable,
    Some of the trust measurements reviewed from the automation                                            preferable, suits their personal taste.
literature are highly specific to particular application contexts. For     faith                           user has faith in the future ability of the
example, the scale developed by Schaefer [28] refers specifically to                                       system to perform even in situations in
the context of human reliance on a robot. The questions that are                                           which it is untried.
asked to users to measure trust are, for example: "Does it act as part     perceived helpfulness           user beliefs that the technology pro-
of a team?" and "Is it friendly?". Another example of specific trust                                       vides adequate, effective, and respon-
measurement is the scale developed by Dzindolet, et al. [6]. It was                                        sive help.
created in the context of aerial terrain photography, showing images
to detect camouflaged soldiers. The questions asked to measure
trust in this case are for example: "How many errors do you think
you will make during the 200 trials?". As these questions are very        with Cronbach’s alpha > 0.89. In the proposed scale, trust with a
specific to the task and the technical knowledge of the users in the      specific technology was analyzed into three factors: perceived func-
specific application context, it would be hard to translate them to a     tionality, perceived helpfulness, and perceived reliability. Perceived
healthcare scenario.                                                      functionality is users’ perceived capability of the system to prop-
    Madsen and Gregor [19] developed and tested a more generic            erly accomplish its main function. Perceived helpfulness is users’
human-computer trust measurement instrument, with the focus on            perception of the technology providing adequate, effective, and
trust in an intelligent decision aid. A validity analysis conducted       responsive help. Finally, perceived reliability means that the system
of this instrument showed high Cronbach’s alpha results, which            is perceived to operate continually or responding predictably to
makes this scale promising to be tested in a different application        inputs.
field. Trust factors here are divided in two groups, cognitive based         In our study we adopt a merged and modified version of the
trust and affect based trust. Madsen and Gregor [19] conceptualise        9 trust items proposed by Madsen and Gregor and by McKnight.
trust as consisting of five main factors: perceived reliability, per-     From the total 9 trust items, that have been described above, we
ceived technical competence, perceived understandability, faith,          merged items that overlapped in meaning and modified some of
and personal attachment. Perceived Technical competence means             their descriptions into the final 6 trust metrics: perceived under-
that the system is perceived to perform the tasks accurately and cor-     standability, perceived reliability, perceived technical competence,
rectly, based on the input information. Perceived Understandability       faith, personal attachment, and helpfulness (See Table 2).
means that the user can form a mental model and predict future
system behaviours. Perceived Reliability means that the system is         3   METHODOLOGY
perceived to be consistently functioning. Faith means that the user       We aimed to test to what extent different types of textual explana-
is confident in the future ability of the system to perform, even in      tions affect different factors of users’ trust. In section 2.1 we have
situations in which has never used the system before. Finally, per-       identified 6 characteristics of meaningful explanation: contrastive,
sonal attachment means that users find using the system agreeable,        truthful, general, thorough, social/interactive, and role/domain-
preferable, and that suits their personal taste.                          dependent explanations (see Table 1). We used these characteristics
    Some of these factors overlap with the trust factors identified       to design distinctive textual explanations, and then presented them
by McKnight [21]. McKnight provides an understanding of trust             to users. Since we focus on a healthcare scenario, we used a drama-
in technology in a wider societal context. McKnight [21] defines          tising vignette to probe participants responses. We asked them to
trust as consisting of three main components: propensity to trust         read the explanation after reading the vignette and then run an on-
general technology, institution-based trust in technology, and trust      line survey asking them to rate different explanation types. To elicit
in specific technology. In the context of this paper we only focus        feedback on the explanation types we used the trust measurement
on trust in a specific technology. McKnight [21] defines trust in         mentioned above.
a specific technology as a person’s relationship with a particular           We designed a between-subjects study, in which different groups
technology. Even if the study does not specifically target decision       of users were each presented with a different explanation type.
systems, the paper goes into a large literature and looks at different    When designing the explanations, we focused on 4 out of the 6
object of trust, trust attributes, and their empirical relationships,     explanation characteristics: contrastive, general, truthful, and thor-
thus proposing a scale of trust which demonstrated good reliability       ough. Social/interactive and role/domain-dependent characteristics
ExSS-ATEC’20, March 2020, Cagliari, Italy,
                                                                                                                                    Larasati, et al.


                      Table 3: Explanation Styles

 Characteristic       Presented Explanation
 contrastive          "From the screen image, Malignant lesions are
                      present. Benign cases and fluid cyst looks hol-
                      low and have a round shape. Your spots are not
                      hollow and and have irregular shapes. There-
                      fore, your spots are detected as Malignant."
 general              "Based on your screen image, your spots are
                      detected as Malignant. 19 in 20 similar images
                      are in Malignant class."
 truthful             "Using 5,600 of ultrasound images in our data-
                      base, your image have 95% similarities with Ma-
                      lignant cases."
 thorough             "Malignant lesions are present at 2 sites, 30mm
                      and 5mm. Non homogeneous. Non parallel. Not
                      circumscribed. Your risk of breast cancer as; 30-
                      50 years old, cyst history, woman is increased
                      20%"


were ignored at this stage for simplicity. In fact, these explana-
tion styles could not be expressed with a textual description, and
needed work on the UX design of the explanation type in order to
be realized. Therefore the assessment of the effects of these two
characteristics was left for future study. The AI system’s diagnosis
tool described in the dramatising vignette was a fictional AI system                       Figure 1: Thorough Explanation
for mammography diagnosis, used in a self managed health sce-
nario. With the system users could upload images of self-scanned
mammograms and then received a diagnosis result with an attached           Participants were randomly assigned to 1 of the 4 conditions, with
textual explanation.                                                       each condition being a different explanation type. The number of
                                                                           participants for each condition are not identical, with n 1 = 12,
3.1     Explanation Design                                                 n 2 = 12, n 3 = 11, and n 4 = 13.
In order to design the explanation, we first tried to look at breast           We asked participants to rate the AI system after having read
cancer diagnosis report and several screening reports including            the dramatizing vignette and to reflect on the 6 trust’s components
ultrasound. Next, we designed the possible textual explanations            while rating the explanation using a 7-points Likert scale. Following
based on each characteristic definition in a small-scale informal          a between-subject comparison of the results we were able to identify
design phase. We then consulted the designed explanations with             which explanation (if any) affects which of the 6 components of
researcher outside this study and medical professional. The expla-         trust, and to what extent. The overall aim of the study was to give
nations were identical from a UI perspective, with one graphic and         us insights on how different styles of linguistics explanations affect
followed by the diagnosis and the explanation text. The explanation        specific aspects of users’ trust. We also asked participants if they
texts were designed to stress the four explanation characteristics:        would have liked the presented explanation to be included in the
contrastive, truthful, general, thorough. We also tried to present a       AI system and explain why.
balanced level of system’s capability, for example in general style:           To analyse the data, we used ANOVA tests, followed by Tukey’s
"19 in 20 similar images" and in truthful style: "95% similarities". The   posthoc paired tests, to see the relative effects of different expla-
explanation text presented to the participants can be seen in Table        nation types. The ANOVA test tells us whether there is an overall
3 and how we presented it can be seen in Figure 1.                         difference between the groups, but it does not indicate which spe-
                                                                           cific groups differed. The Tukey’s post-hoc tests can confirm where
3.2     Data Collection and Analysis                                       the difference occurred between specific groups. In addition, we
The participants were recruited on Mechanical Turk, with a survey          evaluated the trust measurement instrument, by using Cronbach’s
set up using Google Form. Our target was initially 80 participants,        Alpha.
with 40 participants from the general public and 40 participants
from worker in the healthcare field. We choose the option of "master       4   RESULTS
worker" and added one check-in question in the survey, to maximise         From the online survey data, we ran two ANOVA tests, to check
participation quality and check if the participant read the vignette       the explanation styles and the trust factors. In the first ANOVA test,
carefully. The Mechanical Turk hits were up for a week, and in the         we compared the 4 explanations types in relation to an average
end, we got 48 participants (only 8 with some medical expertise).          trust factor (calculated as median value between the 6 trust scores).
                                                                                                              ExSS-ATEC’20, March 2020, Cagliari, Italy,
The Effect of Explanation Styles on User’s Trust


We found that different styles of explanation significantly affect
average trust values (pvalue=0.0033, α=0.05). We then ran a Tukey’s
posthoc test, and found that general explanation show significantly
lower trust scores compared to the rest of the explanation styles;
contrastive, truthful, and thorough (α=0.05). The Tukey’s posthoc
test analysis can be seen in Fig 2.




                                                                          Figure 4: Median of participant’s rating towards their expla-
                                                                          nation preference




     Figure 2: Tukey’s post hoc test in explanation styles
                                                                          presence of a clear rationale, and the use of lay terms, as the two
   In the second ANOVA test, we compared the four explanation             distinctive factors motivating the high trust rating. In turn, the need
styles for each trust factor, we therefore ran 6 comparisons and          of a rationale for the AI result was also explicitly mentioned as a
found that Personal Attachment was the only trust factor show-            way to improve general explanation (by 4 out of 11 people in the
ing significant difference (pvalue=0.02158, α=0.05). We then ran a        general explanation style group mentioned rationale as a need).
Tukey’s posthoc test for Personal Attachment, to identify where              The trust measurement was tested using the overall data from
the specific difference occurred, and found that contrastive and          48 participants. The reliability of the overall measurement was
thorough explanation styles shows significant difference compared         determined by Cronbach’s Aplha. We found that the alpha is quite
to general explanation style (α=0.05). The Tukey’s posthoc test           high, α=0.88. This is an encouraging result which may inform
analysis can be seen in Fig 3.                                            further use, testing and validation of the proposed human-AI trust
                                                                          measure in other healthcare applications.


                                                                          5    DISCUSSION
                                                                          Our study confirms previous research indicating that different styles
                                                                          of explanation significantly affect specific trust factors. In particular
                                                                          we found that Personal Attachment (pvalue=0.02158) was signif-
                                                                          icantly affected by different textual explanation styles, and was
                                                                          highly rated by the groups that were presented with thorough and
                                                                          contrastive explanation styles. This means that among the par-
                                                                          ticipant, thorough and contrastive styles suited their taste more,
                                                                          compared to the general explanation style.
                                                                             This finding was corroborated by the additional comparison
   Figure 3: Tukey’s post hoc test in Personal Attachment                 of the 4 explanations by average trust ratings, which showed that
                                                                          general style explanation was significantly rated lower than the rest
   As mentioned above, other than trust scaling, we also asked            of the explanation styles. Overall preferability scores also confirmed
participants if they would like the explanation style presented to        that general style explanation was rated the lowest.
them to be included in the app for self managed health. We can see           Participants seemed to prefer thorough and contrastive styles
in Fig 4, contrastive, truthful, and thorough explanation styles are      explanation because of the rationale provided, and because of the
rated quite high (6 = very), while the general explanation style is       layperson language used to provide the explanation. The need of ra-
rated lower (5 = moderately). This assessment is consistent with the      tionale was also suggested as a way to improve general explanation
explanation style-trust analysis we did. In the analysis, it shows that   style.
general explanation is the least performing explanation in affecting         However, further investigations about the extent to which expla-
personal attachment.                                                      nation affects trust judgement need to be conducted. The current
   We also asked why participants preferred or not to receive the         results are not conclusive and sufficient to develop an explanation
explanation given to them. By qualitative analysing the 25 answers        style and trust relation model. Additional studies to explore the
from thorough and contrasting style groups, users reported the            explanation mediums and interaction types are also necessary.
ExSS-ATEC’20, March 2020, Cagliari, Italy,
                                                                                                                                                                    Larasati, et al.


6    LIMITATIONS AND FUTURE WORK                                                          [19] Maria Madsen and Shirley Gregor. 2000. Measuring human-computer trust. In
                                                                                               11th australasian conference on information systems, Vol. 53. Citeseer, 6–8.
This preliminary study has several limitations that should be noted.                      [20] Bertram F Malle. 2006. How the mind explains behavior: Folk explanations, meaning,
This is an exploratory study of quite a broad topic and we only                                and social interaction. Mit Press.
                                                                                          [21] D Harrison Mcknight, Michelle Carter, Jason Bennett Thatcher, and Paul F Clay.
conducted one online survey with low number of participants. The                               2011. Trust in a specific technology: An investigation of its components and
fact that some explanation styles did not show significantly different                         measures. ACM Transactions on Management Information Systems (TMIS) 2, 2
effects on users trust judgements could be caused by the small                                 (2011), 12.
                                                                                          [22] Tim Miller. 2018. Explanation in artificial intelligence: Insights from the social
sample size. Future studies with a bigger sample size and a baseline                           sciences. Artificial Intelligence (2018).
group are needed to determine the extent of which explanation                             [23] Andrea Papenmeier, Gwenn Englebienne, and Christin Seifert. 2019. How
affects trust.                                                                                 model accuracy and explanation fidelity influence user trust. arXiv preprint
                                                                                               arXiv:1907.12652 (2019).
   We also acknowledge that trust is difficult to measure. Even                           [24] Alun Preece. 2018. Asking ‘Why’in AI: Explainability of intelligent systems–
though our trust measurement has shown high internal consistency,                              perspectives and challenges. Intelligent Systems in Accounting, Finance and
                                                                                               Management 25, 2 (2018), 63–72.
we have not fully investigated the validity of the measurement in                         [25] Pearl Pu and Li Chen. 2006. Trust building with explanation interfaces. In
other cases/fields. Moreover, in this experiments, we only measured                            Proceedings of the 11th international conference on Intelligent user interfaces. ACM,
user’s trust as a self reported measure. Our experimental design,                              93–100.
                                                                                          [26] Stephen J Read and Amy Marcus-Newhall. 1993. Explanatory coherence in social
and the use of a probing method, may have also possibly influ-                                 explanations: A parallel distributed processing account. Journal of Personality
enced participants’ reflection and self reporting. Further research                            and Social Psychology 65, 3 (1993), 429.
is needed to carefully determine whether this was the case.                               [27] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-agnostic
                                                                                               interpretability of machine learning. arXiv preprint arXiv:1606.05386 (2016).
                                                                                          [28] Kristin Schaefer. 2013. The perception and measurement of human-robot trust.
                                                                                               (2013).
REFERENCES                                                                                [29] Saravanan Thirumuruganathan, Mahashweta Das, Shrikant Desai, Sihem Amer-
 [1] Peter Achinstein. 1983. The nature of explanation. Oxford University Press on             Yahia, Gautam Das, and Cong Yu. 2012. Maprat: Meaningful explanation, interac-
     Demand.                                                                                   tive exploration and geo-visualization of collaborative ratings. Proceedings of the
 [2] Stavros Antifakos, Nicky Kern, Bernt Schiele, and Adrian Schwaninger. 2005.               VLDB Endowment 5, 12 (2012), 1986–1989.
     Towards improving trust in context-aware systems by displaying system con-           [30] Eric S Vorm. 2018. Assessing Demand for Transparency in Intelligent Systems Us-
     fidence. In Proceedings of the 7th international conference on Human computer             ing Machine Learning. In 2018 Innovations in Intelligent Systems and Applications
     interaction with mobile devices & services. ACM, 9–14.                                    (INISTA). IEEE, 1–7.
 [3] Adrian Bussone, Simone Stumpf, and Dympna O’Sullivan. 2015. The role of              [31] Danding Wang, Qian Yang, Ashraf Abdul, and Brian Y Lim. 2019. Designing
     explanations on trust and reliance in clinical decision support systems. In 2015          Theory-Driven User-Centric Explainable AI. In Proceedings of the 2019 CHI Con-
     International Conference on Healthcare Informatics. IEEE, 160–169.                        ference on Human Factors in Computing Systems. ACM, 601.
 [4] Pat Croskerry. 2009. Clinical cognition and diagnostic error: applications of a      [32] Sam Wilkinson. 2014. Levels and kinds of explanation: lessons from neuropsy-
     dual process model of reasoning. Advances in health sciences education 14, 1              chiatry. Frontiers in psychology 5 (2014), 373.
     (2009), 27–35.                                                                       [33] Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding
 [5] Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman,                 the Effect of Accuracy on Trust in Machine Learning Models. In Proceedings of
     David O’Brien, Stuart Schieber, James Waldo, David Weinberger, and Alexandra              the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 279.
     Wood. 2017. Accountability of AI under the law: The role of explanation. arXiv
     preprint arXiv:1711.01134 (2017).
 [6] Mary T Dzindolet, Scott A Peterson, Regina A Pomranky, Linda G Pierce, and
     Hall P Beck. 2003. The role of trust in automation reliance. International journal
     of human-computer studies 58, 6 (2003), 697–718.
 [7] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Dino Pedreschi, Franco
     Turini, and Fosca Giannotti. 2018. Local rule-based explanations of black box
     decision systems. arXiv preprint arXiv:1805.10820 (2018).
 [8] David Gunning. 2017. Explainable artificial intelligence (xai). (2017).
 [9] Jonathan L Herlocker, Joseph A Konstan, and John Riedl. 2000. Explaining col-
     laborative filtering recommendations. In Proceedings of the 2000 ACM conference
     on Computer supported cooperative work. ACM, 241–250.
[10] Denis J Hilton. 1990. Conversational processes and causal explanation. Psycho-
     logical Bulletin 107, 1 (1990), 65.
[11] Andreas Holzinger, Chris Biemann, Constantinos S Pattichis, and Douglas B Kell.
     2017. What do we need to build explainable AI systems for the medical domain?
     arXiv preprint arXiv:1712.09923 (2017).
[12] René F Kizilcec. 2016. How much information?: Effects of transparency on trust
     in an algorithmic interface. In Proceedings of the 2016 CHI Conference on Human
     Factors in Computing Systems. ACM, 2390–2395.
[13] Todd Kulesza, Simone Stumpf, Margaret Burnett, and Irwin Kwan. 2012. Tell me
     more?: the effects of mental model soundness on personalizing an intelligent
     agent. In Proceedings of the SIGCHI Conference on Human Factors in Computing
     Systems. ACM, 1–10.
[14] Todd Kulesza, Simone Stumpf, Margaret Burnett, Sherry Yang, Irwin Kwan, and
     Weng-Keen Wong. 2013. Too much, too little, or just right? Ways explanations
     impact end users’ mental models. In 2013 IEEE Symposium on Visual Languages
     and Human Centric Computing. IEEE, 3–10.
[15] Brian Y Lim and Anind K Dey. 2011. Design of an intelligible mobile context-
     aware application. In Proceedings of the 13th international conference on human
     computer interaction with mobile devices and services. ACM, 157–166.
[16] Peter Lipton. 1990. Contrastive explanation. Royal Institute of Philosophy Supple-
     ments 27 (1990), 247–266.
[17] Zachary C Lipton. 2017. The Doctor Just Won’t Accept That! arXiv preprint
     arXiv:1711.08037 (2017).
[18] Tania Lombrozo. 2006. The structure and function of explanations. Trends in
     cognitive sciences 10, 10 (2006), 464–470.