<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Contextualising local explanations for non-expert users: an XAI pricing interface for insurance</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Clara Bove</string-name>
          <email>clara.bove@axa.com</email>
          <email>clara.bove@lip6.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jonathan Aigrain</string-name>
          <email>jonathan.aigrain@axa.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marie-Jeanne Lesot</string-name>
          <email>marie-jeanne.lesot@lip6.fr</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Charles Tijus</string-name>
          <email>tijus@lutin-userlab.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcin Detyniecki</string-name>
          <email>marcin.detyniecki@axa.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AXA</institution>
          ,
          <addr-line>Paris</addr-line>
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Laboratoire CHArt-Lutin, University Paris 08</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Polish Academy of Science</institution>
          ,
          <addr-line>Warsaw</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Sorbonne Université</institution>
          ,
          <addr-line>CNRS, LIP6, Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Machine Learning has provided new business opportunities in the insurance industry, but its adoption is for now limited by the dificulty to explain the rationale behind the prediction provided. In this work, we explore how we can enhance local feature importance explanations for non-expert users. We propose design principles to contextualise these explanations with additional information about the Machine Learning system, the domain and external factors that may influence the prediction. These principles are applied to a car insurance smart pricing interface. We present preliminary observations collected during a pilot study using an online A/B test to measure objective understanding, perceived understanding and perceived usefulness of explanations. The preliminary results are encouraging as they hint that providing contextualisation elements can improve the understanding of ML predictions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Interpretability</kwd>
        <kwd>Explainability</kwd>
        <kwd>Interface Principle</kwd>
        <kwd>Interaction Human-ML</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Contextualisation</kwd>
        <kwd>Explanations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>The rise of Machine Learning (ML) has pro</title>
        <p>
          vided new business opportunities in the
insurance industry. ML can for instance help
improve pricing strategies, fraud detection,
claim management or the overall customer
experience. Yet, its adoption is for now
limited by the dificulty for ML to explain the
rationale behind predictions to end-users [
          <xref ref-type="bibr" rid="ref18">1, 2</xref>
          ].
This currently very active topic has led to the
development of the so-called eXplainable
Artificial Intelligence (XAI) domain. This need
for explanation is indeed an important issue,
as not being able to understand why a
prediction is provided can decrease the trust of
a potential customer for the ofered product.
From a legal point of view, Article 22 of the
recent General Data Protection Regulation
(GDPR) states that, for any decision made by
an algorithm, customers have a "Right to
Explanation" to allow them to make informed
decisions [3]. For these reasons,
explainability is a crucial issue for the insurance
industry as well.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. State of the Art</title>
      <p>The notion of explanation is a complex one
that has led to many works and definitions,
as discussed in Section 2.1. Among others, This section briefly reviews the complex
noit can be defined as "an answer to a why- tion of explanation, first considering the
cogquestion" [4], which takes into account con- nitive perspective. It then provides a short
textual and external information [5]. In the overview of diferent local explanations that
more specific context of interaction with a can be extracted from Machine Learning
modML-based system, it has been shown that users els. Finally, it presents how these
explanamostly try to understand the predictions they tions have been exploited for interfaces in the
receive, rather than the model or the training XAI literature.
data [6]. In a nutshell, they usually consider
the question "Why do I get this prediction?". 2.1. What is an Explanation?
Explaining such a specific prediction
corresponds to the so-called local interpretability The notion of explanation has been widely
task in the XAI literature. It is the one ad- studied, the objective of this section is not to
dressed by most existing XAI interactive tools. provide a complete review of works on this
In this paper, we address the challenge of build- topic, but only to point to some major
eleing and presenting local explanations of ML ments, exploited later on in the paper.
predictions to non-expert users, i.e. users with First, ofering an explanation requires to
expertise neither in the considered applica- identify the underlying, most often implicit,
tion nor in ML. Indeed, providing explana- question it should answer. It has been shown
tions to this target audience brings its own that an explanation can be defined as an
anset of challenges, which we propose to ad- swer to a why-question [4, 7] and that it
dress by contextualising the prediction, un- should provide a reason that justifies what
derstood as providing explanations on elementshappens [8, 9]. Besides, it is dependent on the
that surround it. context, as it must be adapted to the specific</p>
      <p>
        The three contributions of this work are user need [10]. The explanation is also the
the following: (i) we propose guidelines to social process of someone explaining
someadd contextual information in interfaces pre- thing to someone else [
        <xref ref-type="bibr" rid="ref2">11</xref>
        ]. This process is
senting local explanations of ML predictions also shown to be multidimensional: in order
for non-expert users, considering three lev- to be well constructed and well perceived, an
els: information about the Machine Learn- explanation answering a why-question needs
ing system, about the application domain and to be completed with external required
inabout relevant external information. (ii) We formation regarding the context, as well as
apply these guidelines to create a smart in- transparency over its pragmatic goals [5, 12].
surance pricing interface, illustrating it in the The specific question of explanation in the
case of car insurance. (iii) We conduct a pilot context of interaction with a ML system has
study with non-expert users to assess the ef- also been considered, in particular regarding
fectiveness of our propositions regarding sub- the underlying question a user may be
askjective understanding, objective understand- ing when requesting an explanation. Based
ing and perceived usefulness of the explana- on a rich user study, a question bank for
extions. plainability tools composed of 50 questions
has been established [6], beyond the
previously mentioned why-question defined from
a cognitive point of view. It proposes to
distinguish between questions related to the pro- to users with advanced knowledge in the
apcess understanding, the global understand- plication domain, for instance in the medical
ing of the model, the local understanding of a domain [2].
prediction and the exploration for local con- On the other hand, several works tackle
trasting explanations. Moreover, it has been the challenge of presenting local explanations
observed in this user study that explanations to non-expert users [18, 19, 20, 25] and
proare most frequently sought to gain further in- vide information about the most appropriate
sights or evidence on why a prediction has type of explanations. In the case of image
been made for this instance and not another data, it has been shown that normative
exprediction, bridging the gap to the cognitive planations lead to better ratings of the
underresults about the notion of explanation. lying ML models than comparative
explanations [18] and that example-based
explana2.2. What Local Explanations tions have a positive efect on users’ trust in
can be Extracted from ML, regardless of their familiarity with it [25].
      </p>
      <p>In the case of structured data, using
coun</p>
      <p>Machine Learning Models? terfactual examples as explanations has been
As recent ML models have become increas- explored in the ViCE tool [20] but this
propoingly accurate and complex, numerous inter- sition has not been evaluated experimentally.
pretability methods have been developed to Local feature importance has also been
conprovide local explanations [13]. Based on the sidered [19] and it has been shown that
comprovided explanation type, one can for in- bining these explanations with the possibility
stance distinguish between counterfactual ex- to interact with the ML model to explore its
amples [14, 3] local rules [15] or local fea- behaviour improves both subjective and
obture importance [16, 17]. The former high- jective understanding of non-expert users.
light the minimum changes one needs to
apply to a specific example to change its pre- 3. Interface Principles
diction. Local rules provide a combination of
simple IF-THEN rules to approximate the
decision boundary locally. Local feature
importance approaches provide a weight for each
feature describing its contribution to the
final decision for a specific instance.</p>
      <p>This section describes the general principles
of our propositions: after stating the purpose
of the interface as compared to existing ones,
it describes the guidelines we propose and
gives an overview of the interface before
presenting the three types of added contextual
information that can be considered as
missing in current systems: general information
on ML, domain information and external
information. It also describes the interface
principles we propose to include each type of
contextualised information.</p>
      <sec id="sec-2-1">
        <title>2.3. Which Explanations are</title>
      </sec>
      <sec id="sec-2-2">
        <title>Presented to Which Users?</title>
        <sec id="sec-2-2-1">
          <title>In the XAI research community, several in</title>
          <p>terfaces have recently been proposed for
explaining to a user a specific prediction of a
ML model [18, 19, 20, 21, 2, 22, 23, 24]. Many
are aimed at users with an expertise in ML
[21, 22, 23, 24] and propose visualization and 3.1. Purpose of the Interface
interactive tools to help data scientists better Based on the state of the art study presented
understand ML models. Others are dedicated in Section 2, we consider the definition of an
explanation as an answer to a why-question. choice is that it is straightforward when
conWe focus on users having expertise neither sidering the feature importance approach, that
about machine learning, nor about the appli- considers features individually.
cation domain. In the context of car insur- Each card is made of four parts containing
ance pricing, our goal is to help a non-expert diferent pieces of information related to the
user answer the following question: "Why feature. The top part shows the name of the
did I get this price?". We consider the user associated field, as present in the user-filled
iflled a form asking him/her for some per- form, as a reminder of the latter. An icon in
sonal information and is faced with a price the middle of the card provides a more
userproposition computed based on this informa- friendly visual representation of the feature.
tion. We aim at providing the necessary con- In the illustration of the general principles
textual information to allow the user to make given in Fig. 1, these pictures are geometric
an informed decision about this price. shapes, see Fig. 2 for some examples for real</p>
          <p>We believe that local feature importance features. Below the name, the card shows the
is the most relevant type of available local ex- efect of the feature, i.e. its individual
contriplanations for such a why-question about a bution to the prediction, as derived from the
prediction made on tabular data. Indeed, we feature importance method. Moreover, an
inargue that counterfactual examples are more tuitive color code helps the user get an
imrelevant to answer “Why-not” questions, i.e. mediate understanding of the feature efect,
to explain why another price has not been displaying in green the efects that help
reobtained and to provide indication about how duce the predicted price and in red the ones
to change the predicted price. As for local that increase it. Finally the bottom part of the
rules, they have mostly been applied to clas- card provides contextual information at the
sification tasks, whereas the pricing scenario level of the application domain, as discussed
we consider constitutes a regression task. In in Section 3.4.
addition, it has been shown that local feature The order in which the cards are displayed
importance is helpful to non-expert users [19], follows a double principle: first they are
grouwho are the target users we consider. ped to define categories and are then ordered
within these categories, as discussed in
Sec3.2. Interface Overview tion 3.3. This makes it possible to provide
contextual information at the model level,
whiA global view of the interface principles we le taking into account the user low expertise.
propose is illustrated in Figure 1, it is
commented in details in this section and the fol- 3.3. Contextualising with ML
lowing ones. Figure 2 illustrates its
implementation in the case of smart car insurance, Information
as discussed in Section 4. Machine Learning tools are used at several</p>
          <p>First, the proposed interface applies a card- levels of the user interaction with the system:
based design: it contains an individual card to predict the proposed price and to explain
for each of the fields the user is required to the role of the attributes. Non-expert users
ifll in when requesting a price prediction. In- do not know how the model has been trained
deed, the rationale of this design choice is and what basis it uses to make a prediction.
that it allows the user to get an overview of They may also confuse the displayed local
exall information he/she entered. More impor- planations with global ones, and thus
errotantly, the second motivation for this design neously think that the importance attached
to a field value is the same across the whole
value domain. Consequently, it is important
to give transparency on the model’s purpose
and basic operations. To that aim, the
guidelines we propose for an XAI interface include
the notion of ML transparency providing
guidance regarding how to interpret the
following explanations.</p>
          <p>First, we propose to show, in the top part of
the interface (region A in Fig. 1), contextual
preliminary information that can help users
build a mental model of how the ML system
works and better interpret explanations. This
additional explanation about the model should
be visible at first, so as to act as an on-boarding
guide to read adequately the local feature im- Figure 1: Contextualized Local Feature
Imporportance explanations provided below. tance explanations in XAI interface for non-expert</p>
          <p>Second, as mentioned in Section 3.2, the users. (A) Contextual information on the ML
syscard ordering follows a double principle. A tem, (B) categorical sorting, (C) contextual
infornatural choice would be to sort the cards in mation on the domain, (D) contextual external
indecreasing order of the absolute feature im- formation.
portance values, providing an obvious
representation of the ML model behaviour. Note
that the absolute value must be considered gated at a category level, summing the values
so that the attributes with major negative in- associated to all features in each category.
Catlfuence are not postponed to the end of the egories are then displayed showing the most
list, but shown at the top, together with the influential ones first (in absolute values). Then,
attributes with major positive influence. We within each category, features are sorted by
propose to achieve a compromise between this decreasing importance. This principle is
ilapproach and a categorical sorting more in lustrated in Figure 1 by the region denoted
concordance with a non-expert user. Indeed, with letter B.
in a classic user journey when interacting with
the system, there is no transition between the 3.4. Contextualising with
input stage when he/she fills the application Domain Information
form and the output stage when the
prediction is displayed. As a consequence, a non- As discussed in Section 2.1, from a cognitive
expert user might have trouble finding a logi- point of view, an explanation should provide
cal path between the information he/she gave a rationale about why a prediction is made,
and the provided explanation, if the order is which means it should be related with the
nocompletely diferent. tion of cause. Now the automatic extraction</p>
          <p>In order to facilitate this transition, we pro- of causality relations by ML models is a very
pose to contextualise the local feature impor- challenging task [26], it is not achieved by the
tance sorting to match the input stage and ex- local feature importance approach chosen for
ploit the field categories users encounter in the proposed interface. Thus, a non-expert
the form. The local feature scores are aggre- user might have dificulties understanding why
his/her specific input influences the output. used or not). This external transparency</p>
          <p>To compensate for this lack of rationale, can both improve understanding and the level
we propose to associate local feature impor- of trust in the prediction and its explanations.
tance explanations with information provided For the case of attributes requested in the
by a domain expert, e.g. an actuary for an in- form but excluded from the prediction model,
surance pricing platform. This added infor- we propose to highlight their specific role by
mation acts as a generic transparency over a diferent type of feature-associated card, as
the domain, called domain transparency, illustrated by the card denoted D in Fig. 1.
providing some brief justification about how
this feature might impact the outcome. In
other words, instead of trying to extract auto- 4. Considered Application
matically causality relations, which remains
a very dificult task, we require an expert to This section describes the implementation of
provide this piece of information. This do- the principles described in the previous
secmain information is generic, i.e. applicable tion to the case of a smart pricing insurance
to all instances, it is displayed on each fea- application.
ture card (see region C in Fig. 1).</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>4.1. Usage Scenario</title>
      </sec>
      <sec id="sec-2-4">
        <title>3.5. Contextualising with</title>
      </sec>
      <sec id="sec-2-5">
        <title>External Information</title>
        <sec id="sec-2-5-1">
          <title>We apply the proposed interface principles</title>
          <p>for contextualising local feature importance
explanations for a fictive car insurance
pricWhereas the ML interpretability approaches ing platform. In this scenario, a user first
proaim at providing explanations about a given vides several kinds of information, regarding
prediction model, the conception of the model his/her insurance background, the car to
initself, prior to its training phase, also afects sure, its usage and parking as well as
housthe outcome the user gets. For instance, some ing and personal information. The interface
ifelds the user is requested to fill can be ex- (see Fig. 2) then displays the price computed
cluded from the ML model by design. The by the ML model on the left, and the
proform may include a field about gender so that posed explanation interface on the right. This
the system knows how it should address the influence is computed using SHAP [16], a
louser, although it is not taken into account by cal feature importance method that provides
the prediction system so as to avoid any gen- the contribution of each feature value to the
der bias. We call external information this prediction as compared to the average
pretype of knowledge, which is not domain spe- diction.
cific and difers from the information
considered in the previous section. It is important
to provide the users with this global infor- 4.2. Implementing ML
mation which may impact the outcome they Transparency
get, even though this information is
external to the model. We believe users can
benefit from added transparency over the
reallife context (e.g. external events such as the
COVID crisis that indirectly influences the
prediction through the dataset) and
algorithmic processes (e.g. data that are collected and</p>
        </sec>
        <sec id="sec-2-5-2">
          <title>As discussed in Section 3.3, we integrate the</title>
          <p>proposed model transparency principle an
onboarding text. This text first states that the
impact of each feature is expressed relatively
to the average price predicted by the model
and it makes explicit the diference between
the predicted and the average prices. Second, theft or natural catastrophes to name a few).
it explains that the price has been personal- This pairs the local feature weight
explanaized based on the user’s information. Finally, tion with global domain information. In
adit introduces the feature-associated cards. dition, each type of risk is highlighted with a</p>
          <p>Regarding categorical sorting, we split fea- diferent color to improve visualization (see
tures in three categories, distinguishing be- region G in Fig. 2).
tween features related to the driver, those
related to the car and those related to residence.</p>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>4.4. Implementing External</title>
      </sec>
      <sec id="sec-2-7">
        <title>Transparency</title>
      </sec>
      <sec id="sec-2-8">
        <title>4.3. Implementing Domain</title>
      </sec>
      <sec id="sec-2-9">
        <title>Transparency</title>
        <sec id="sec-2-9-1">
          <title>We implement external transparency by pro</title>
          <p>viding information on the gender feature that
As discussed in Section 3.4, the role of do- is not included in the ML model, but which
main transparency is to provide users with is likely to be considered important by users.
a rationale about why the collected features Indeed, it may be the case that users are
susare useful for the ML model. For the car in- picious about how their gender can be used
surance pricing platform, we implement this to afect their prediction. Therefore, we
disprinciple by complementing each feature as- play that the gender information is not used
sociated card with the main kinds of risk that by the model in a feature-associated card. This
can be impacted by the feature (e.g. accident, card is presented in a diferent color from other
feature-associated cards to highlight the dif- more the prediction than feature Y ?". (ii) ML
ference of purpose. Information Questions measure the user’s
ef</p>
          <p>It is noteworthy that only removing the gen- fective understanding of how the considered
der feature from the training data does not SHAP method generates the influence of each
necessarily make a ML model fair [27]. Ex- attribute over an average price, e.g. "Are the
plaining to non-expert users more sophisti- explanations provided based on the average
precated fairness protocols is an important topic diction?". (iii) Local Explanation Questions
for the XAI community, but it is outside the measure the user’s understanding of the
difscope of this paper. ference between the influence of his/her
attributes and global explanations, e.g. "Will
the prediction remain for sure the same even
5. Evaluation if feature X is diferent?". (iv) Interpretation</p>
          <p>Questions measure the extent to which the
This section presents the experimental eval- user processes the explanations provided to
uation framework we propose to assess the understand the price rather than relying on
propositions described in the previous sec- potential cognitive biases, e.g. "Does this
intions, describing in turn the considered eval- formation/event influence the prediction?".
uation metrics, the experimental design and We design two quiz questions for each of
the results of the conducted pilot study. the four types. For statement questions, three
answer options are provided: "true", "false"
5.1. Evaluation Metrics and "I don’t know"; for one-choice questions,
Evaluating the efectiveness of explanations lists of possible answers are ofered as well
is a challenging task [28], various methods as an "I don’t know" option. We measure the
and quality criteria to measure understand- answer correctness and time to answer each
ability and usefulness have been proposed. question.</p>
          <p>Two categories can be distinguished: objec- Regarding subjective understanding, we adapt
tive understanding can for instance be eval- two self-reporting questions from the
Explauated through task completion [25] or quiz nation Satisfaction Scale [10], to assess the
questions [19], measuring the answer correct- perceived understanding and usefulness of
exness as well as answering duration. Subjec- planations to make an informed decision.
Partive criteria, on the other hand, measure the ticipants are required to answer on a 6-point
perceived usefulness and understanding of ex- Likert scale, from “Strongly disagree” (0) to
planations through self-reports [10]. We take “Strongly agree” (5), as it has been shown that
into account these two types of criteria, as 6-point response scales are a reasonable
fordetailed below. mat for psychological studies [29].</p>
          <p>Regarding objective understanding, using In addition, the questionnaire includes two
a similar quiz approach as Cheng et al. [19], questions regarding the participant literacy
we first propose four types of questions to in artificial intelligence/machine learning and
check the user objective understanding. The insurance, again using 6-point Likert scales,
details of the questionnaire are provided in from “Not familiar at all” to “Strongly
familappendix A. (i) Feature Importance Questions iar”. We also ask for basic demographic
inmeasure the extent to which the user under- formation such as age and education level.
stands the relative influence of the attributes Finally, participants can share their insights
on the prediction, e.g. "Does feature X impact and comments on the study in an open
response question.</p>
          <p>Interface A
Interface B
Interface A
Interface B</p>
          <p>Objective
understanding</p>
        </sec>
      </sec>
      <sec id="sec-2-10">
        <title>5.2. Experimental Design</title>
        <p>unable to conduct the pilot study in lab. Thus,
we conducted the pilot on Useberry. 20
participants were recruited from university and
professional social network.</p>
        <p>We conduct an A/B testing. Interface A,
displayed in Figure 3, presents local feature
importance explanations as extracted from SHAP,
following the same card-based design
described in Section 3.2 but it does not include the 5.3. Results
diferent elements of contextualisation. In- The obtained results are displayed in Table 1.
terface B includes all of our propositions, it is For objective questions, the results are
dedisplayed partly in Figure 2 and fully in Fig- ifned as the percentage of correct answers;
ure 4. Participants are randomly assigned to for the subjective questions, the results are
one version of the interface. the average scores on the Likert scale,
nor</p>
        <p>
          For the pilot experiment, we simulate a pric- malized to the [
          <xref ref-type="bibr" rid="ref18">0,1</xref>
          ] interval.
ing ML model and the use of SHAP to ex- The data of 6 participants were not
exploitract local feature importance. Each partici- table as they dropped of from the survey at
pant acts as a female persona with a given set the start. We also excluded the data of one
of 16 feature values related to the driver, the more participant, who completed the test in
vehicle and the residence of the driver. Prior an abnormally short time and who appeared
to the evaluation, participants are introduced not to scroll through the explanations to look
to the persona and her need to understand for the answers. Out of the 13 remaining
parthe price she gets. We also explain the plat- ticipants, 7 were assigned to interface A and
form uses an algorithm to determine a per- 6 to interface B. Participants assigned to
insonalized price based on her personal infor- terface A (resp. interface B) are 29.6 years old
mation. The evaluation starts with the ob- on average (resp. 29.8) and reported an
averjective understanding question quiz, which is age artificial intelligence literacy score of 0.71
displayed next to the interface to allow par- (resp. 0.37) and an average insurance literacy
ticipants to look for the answers. Then, the score of 0.47 (resp. 0.60).
subjective understanding questions and de- Participants assigned to interface B obtain
mographics information questions are asked. overall higher scores for the objective
under
        </p>
        <p>Because of the COVID-19 situation, we were standing questions (0.88) as compared to the
ones using interface A (0.73). When consid- ing the test. We hope to mitigate this issue by
ering the diferent types of questions, it ap- performing this experiment in a lab setting.
pears that interfaces A and B lead to compa- Looking at the feedback collected through
rable results for feature importance and local the open-question, it appears that participants
explanation questions. This could mean that using interface A, without contextual
inforproviding local feature importance is enough mation, report more uncertainty regarding their
for a non-expert user to correctly answer these answers and their understanding, as two of
questions, even without any contextualisa- them explicitly state. On the other hand, 1
tion. The results hint that there is an im- participant using interface B reports that the
provement for participants assigned to inter- explanations are "pleasantly surprising and
face B for interpretation and ML information help choosing among diferent insurance plans",
questions, which hints that contextual infor- while another participant states that the
exmation, especially external and ML transparen- planations are clear.
cy, may help non-expert users to answer this To conclude, although the sample size is
kind of questions. It is however noteworthy too small to provide strong and reliable
inthat the diference in ML Information ques- sights backed up with statistical tests, the
extion scores can be partly due to the fact that perimental results and the qualitative
feedthe answer was easier to retrieve in interface B back lead us to believe that contextualisation
thanks to the added ML transparency. can be an interesting solution to explore in</p>
        <p>In this preliminary study, participants who order to enhance local explanations.
used interface B report higher subjective
understanding (0.87) compared to version A (0.71)
and also rate higher the usefulness of the ex- 6. Conclusion
planations (0.91 for interface B and 0.63 for
interface A).</p>
        <p>Overall, we observe that the
contextualisation elements of interface B provide an
improvement for all considered evaluation
metrics: +0.14 for objective understanding, +0.15
for self-reported understanding and +0.29 for
self-reported usefulness.</p>
        <p>In this work, we study whether
contextualisation can help non-expert users understand
local explanations. We investigate three kinds
of information for contextualisation,
respectively regarding ML, the application domain
and external factors. In the context of a smart
pricing platform for car insurance, we
conducted a pilot study using an A/B experiment
online to measure objective understanding,
5.4. Discussion perceived understanding and perceived
useThese preliminary results indicate that inter- fulness of explanations. The preliminary
reface B seems to improve the explanation un- sults are encouraging as they hint that
proderstanding thanks to the three levels of added viding contextualisation elements can improve
contextual information. More specifically, this the understanding of ML predictions.
improvement is especially important for the Future work will include a larger user study
perceived understanding and usefulness of ex- in a more controlled environment, to draw
planations. Unfortunately we cannot analyze stronger conclusions regarding the
efectivewhether these added information leads to par- ness of our propositions. We also plan to run
ticipants spending more time on the inter- experiments using a ML model to extract
acface, since the reported time data are too noisy, tual local explanations instead of simulated
probably due to participants taking breaks dur- ones, as it may influence our findings.
non-expert stakeholders, in: Proc. of Conf. on Human Factors in Computing
the Int. Conf. on Human Factors in Systems, CHI’20 - Workshop on Fair &amp;
Computing Systems, CHI’19, 2019. Responsible AI, 2020.
[20] O. Gomez, S. Holter, J. Yuan, E. Bertini, [28] Z. C. Lipton, The mythos of model
inVice: visual counterfactual explana- terpretability, in: Proc. of the Int. Conf.
tions for machine learning models, in: on Machine Learning, ICML’16 -
WorkProc. of the 25th Int. Conf. on Intelligent shop on Human Interpretability in
MaUser Interfaces, IUI’20, 2020, pp. 531– chine Learning, 2016.</p>
        <p>535. [29] L. J. Simms, K. Zelazny, T. F. Williams,
[21] Y. Ming, H. Qu, E. Bertini, Rulematrix: L. Bernstein, Does the number of
reVisualizing and understanding classi- sponse options matter?
Psychometifers with rules, IEEE Transactions on ric perspectives using personality
quesvisualization and computer graphics 25 tionnaire data., Psychological
assess(2018) 342–352. ment 31 (2019) 557.
[22] D. K. I. Weidele, J. D. Weisz, E. Oduor,</p>
        <p>M. Muller, J. Andres, A. Gray, D. Wang,
AutoAIviz: Opening the blackbox of
automated artificial intelligence with
conditional parallel coordinates, in: Proc.
of the Int. Conf. on Intelligent User
Interfaces, IUI’20, 2020, pp. 308–312.
[23] J. Wexler, M. Pushkarna, T. Bolukbasi,</p>
        <p>M. Wattenberg, F. Viegas, J. Wilson,
The what-if tool: Interactive probing of
machine learning models, IEEE
Transactions on Visualization and Computer</p>
        <p>Graphics 26 (2020) 56–65.
[24] X. Zhao, Y. Wu, D. L. Lee, W. Cui,
iForest: Interpreting random forests via
visual analytics, IEEE Transactions on
visualization and computer graphics 25
(2018) 407–416.
[25] F. Yang, Z. Huang, J. Scholtz, D. L.</p>
        <p>Arendt, How do visual explanations
foster end users’ appropriate trust in
machine learning?, in: Proc. of the
Int. Conf. on Intelligent User Interfaces,</p>
        <p>IUI’20, 2020, pp. 189–201.
[26] R. Guo, L. Cheng, J. Li, P. R. Hahn,</p>
        <p>H. Liu, A survey of learning causality
with data: Problems and methods, ACM</p>
        <p>Computing Surveys 53 (2020) 1–36.
[27] B. Ruf, C. Boutharouite, M. Detyniecki,</p>
        <p>Getting fairness right: Towards a
toolbox for practitioners, in: Proc. of the Int.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>A. Detailed Questionnaire for Objective</title>
    </sec>
    <sec id="sec-4">
      <title>Understanding</title>
    </sec>
    <sec id="sec-5">
      <title>Questions</title>
      <p>My age
My vehicle’s power supply
My job occupation
This section gives the 8 questions asked to as- My residence area
sess the participants’ objective understand- I don’t know
ing.</p>
      <p>Question 5: Which one of your
information doesn’t influence your
price?
Question 3: Even if you were older, you
would get the same price for sure.</p>
      <p>Question 8: Your information increases
your price by 1.15€.</p>
      <p>Question 6: Again, which one of your
information doesn’t influence your
price?</p>
      <sec id="sec-5-1">
        <title>The model of my vehicle</title>
        <p>The number of children at my charge
My gender
My job occupation
I don’t know
Question 7: Your price is calculated
based on an average price of 15.5€</p>
      </sec>
      <sec id="sec-5-2">
        <title>True False I don’t know</title>
      </sec>
      <sec id="sec-5-3">
        <title>True False I don’t know</title>
        <p>Question 1: The model of your vehicle
influences more your price than the
number of children you have at charge.</p>
      </sec>
      <sec id="sec-5-4">
        <title>True False I don’t know</title>
        <p>Question 2: What is the influence of the
gearbox of your car on your price?</p>
      </sec>
      <sec id="sec-5-5">
        <title>It increases my price It doesn’t change my price It decreases my price I don’t know</title>
      </sec>
      <sec id="sec-5-6">
        <title>True False I don’t know</title>
      </sec>
      <sec id="sec-5-7">
        <title>True False I don’t know</title>
        <p>Question 4: If you were living in
another city, you would probably get a
diferent price.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>B. A/B Testing: Interfaces A and B</title>
      <p>Figure 4: Interface B with contextualization principles</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          preprint arXiv:
          <year>1812</year>
          .
          <volume>04608</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Hilton</surname>
          </string-name>
          , Conversational processes [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stumpe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Terry</surname>
          </string-name>
          , E. Reif, and causal explanation., Psychological
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>N.</given-names>
            <surname>Hegde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Smilkov</surname>
          </string-name>
          , Bulletin
          <volume>107</volume>
          (
          <year>1990</year>
          )
          <fpage>65</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Wattenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Viegas</surname>
          </string-name>
          , G. Corrado, [12]
          <string-name>
            <given-names>B. F.</given-names>
            <surname>Malle</surname>
          </string-name>
          , Attribution theories: How
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>imperfect algorithms during medical ries in social psychology 23 (</article-title>
          <year>2011</year>
          )
          <fpage>72</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>decision-making</article-title>
          ,
          <source>in: Proc. of the Int. 95.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          Conf. on Human Factors in Computing [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Adadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Berrada</surname>
          </string-name>
          , Peeking inside
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Systems</surname>
          </string-name>
          , CHI'
          <volume>19</volume>
          ,
          <year>2019</year>
          .
          <article-title>the black-box: A survey on</article-title>
          explainable [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abdul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Y.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <source>artificial intelligence (xai)</source>
          , IEEE Access
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Designing</given-names>
            <surname>Theory-Driven</surname>
          </string-name>
          User-Centric 6
          <article-title>(</article-title>
          <year>2018</year>
          )
          <fpage>52138</fpage>
          -
          <lpage>52160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Explainable</surname>
            <given-names>AI</given-names>
          </string-name>
          ,
          <source>in: Proc. of the Int</source>
          . [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Laugel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Lesot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Marsala</surname>
          </string-name>
          , X. Re-
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Systems</surname>
          </string-name>
          , CHI'
          <volume>19</volume>
          ,
          <year>2019</year>
          .
          <article-title>of post-hoc interpretability:</article-title>
          <source>Unjustified</source>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wachter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mittelstadt</surname>
          </string-name>
          , L. Floridi, counterfactual explanations,
          <source>in: Proc.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <article-title>mated decision-making does not exist ligence</article-title>
          ,
          <source>IJCAI'19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2801</fpage>
          -
          <lpage>2807</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <article-title>in the general data protection regula-</article-title>
          [15]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          , An-
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <article-title>tion, International Data Privacy Law 7 chors: High-precision model-agnostic</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          (
          <year>2017</year>
          )
          <fpage>76</fpage>
          -
          <lpage>99</lpage>
          . explanations,
          <source>in: Proc. of the 32nd</source>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <source>Explanation in artificial in- AAAI Conference on Artificial Intelli-</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <article-title>telligence: Insights from the social sci- gence</article-title>
          , AAAI'
          <fpage>18</fpage>
          ,
          <year>2018</year>
          , pp.
          <fpage>1527</fpage>
          -
          <lpage>1535</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>ences</surname>
          </string-name>
          ,
          <source>Artificial Intelligence</source>
          <volume>267</volume>
          (
          <year>2019</year>
          ) [16]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          , A unified
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          1-
          <fpage>38</fpage>
          . approach to interpreting model predic[5]
          <string-name>
            <given-names>B. F.</given-names>
            <surname>Malle</surname>
          </string-name>
          ,
          <article-title>How the mind explains tions</article-title>
          ,
          <source>in: Proc. of the Int. Conf of</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>and social interaction</article-title>
          , MIT Press,
          <year>2006</year>
          . cessing Systems,
          <source>NeurIPS'17</source>
          ,
          <year>2017</year>
          , pp. [6]
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gruen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Miller</surname>
          </string-name>
          , Question-
          <volume>4765</volume>
          -4774.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>ing the ai: Informing design practices</article-title>
          [17]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>Factors in Computing Systems, CHI'20, the ACM Int. Conf. on Knowledge Dis-</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          . covery and Data Mining, SIGKDD'
          <volume>16</volume>
          , [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pearl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mackenzie</surname>
          </string-name>
          ,
          <source>The Book of 2016</source>
          , pp.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <article-title>Why: The New Science of Cause</article-title>
          and Ef- [18]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jongejan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Holbrook</surname>
          </string-name>
          , The
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>fect</surname>
          </string-name>
          , Basic Books, Inc.,
          <year>2018</year>
          .
          <article-title>efects of example-based explanations</article-title>
          [8]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Dennett</surname>
          </string-name>
          ,
          <article-title>The Intentional Stance, in a machine learning interface</article-title>
          , in:
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          MIT press,
          <year>1989</year>
          .
          <source>Proc. of the 24th Int. Conf. on Intelligent</source>
          [9]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Dennett</surname>
          </string-name>
          ,
          <article-title>From bacteria to Bach User Interfaces</article-title>
          ,
          <source>IUI'19</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>258</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <article-title>and back: The evolution of minds</article-title>
          ,
          <source>WW</source>
          <volume>262</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Norton</surname>
          </string-name>
          &amp; Company,
          <year>2017</year>
          . [19]
          <string-name>
            <given-names>H. F.</given-names>
            <surname>Cheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , [10]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. O</given-names>
            <surname>'Connell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Harper</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>