=Paper=
{{Paper
|id=Vol-3124/paper1
|storemode=property
|title=Explaining Health Recommendations to Lay Users: The Dos and
Dont's
|pdfUrl=https://ceur-ws.org/Vol-3124/paper1.pdf
|volume=Vol-3124
|authors=Maxwell Szymanski,Vero Vanden Abeele,Katrien Verbert
|dblpUrl=https://dblp.org/rec/conf/iui/SzymanskiAV22
}}
==Explaining Health Recommendations to Lay Users: The Dos and
Dont's==
Explaining health recommendations to lay users: The dos
and don’ts
Maxwell Szymanski1 , Vero Vanden Abeele1 and Katrien Verbert1
1
Department of Computer Science, KU Leuven, Leuven, Belgium
Abstract
In recent years, mobile health recommendations are used in an increasing number of applications. Researchers have highlighted
the importance of explaining these recommendations to lay users, with benefits such as increased trust and a higher tendency
to follow up on these recommendations. However, a different explanation modality can impact the way users perceive the
recommendation, either in a positive or negative way. This paper will explore and evaluate six different explanation designs
through a qualitative user study, and give general design guidelines and considerations regarding explaining pain-related
health recommendations to lay users.
Keywords
explainable AI, explainable recommender systems, explanation interpretation, lay users, health recommendations, HRS
1. Introduction & Related Work recommendations with the user’s expectations. Such
mismatch can not only lead to a decrease in system effec-
Recommender systems are becoming more prevalent in tiveness [5], but a decrease in trust towards the system
health-related domains. However, several key aspects as well, potentially steering the user away from future
have to be taken into account when designing recom- use of such HRS. Early research mainly focused on in-
mender systems, such as transparency through explana- creasing the accuracy of RS in order to mitigate this issue.
tions and end user expertise. However, Valdez et al. [6] explain that recent research
has undergone a shift in focus from improving accuracy,
1.1. RecSys in Health to exploring the effects of human factors. This broader
approach in reasoning about RS should allow researchers
Recommender systems (RS) have become prominent in to improve RS effectiveness beyond quantitative algorith-
health applications, where they help retrieve relevant mic capability. The new approach includes the research
information or recommend possible next actions tailored on and addition of: explanations to increase transparency,
to the needs of the end user. These health recommender human-in-the-loop feedback to correct misunderstand-
systems (HRS) are used both in clinical settings as well as ings, and using conversational RS to increase familiarity
in personal contexts where health applications aid users towards the system’s interface.
in their daily lives. A recent systematic review [1] of In this paper, we will focus on the explanation aspect,
HRS for lay users shows that the majority of HRS that more specifically, on designing and assessing different
used a graphical user interface focus on mobile appli- explanation types for a mobile health recommender sys-
cations. These mobile HRS span several fields, such as tem. The research is conducted in the context of a per-
sports, mental health and nutrition, and include applica- sonal coaching app that guides users with chronic mus-
tions that e.g. suggest the appropriate action to take for culoskeletal pain through various informative and inter-
users with diabetes [2], recommend activities to promote active topics, such as activity- and stress-management,
healthier lifestyles [3] or help with anxiety by recom- pain-education, etc. Additionally, the app also includes
mending external apps that will suit the user’s needs a pain logbook, that can be used for logging pain flare-
[4]. These recommender systems all have a shared main ups. Using this logged information, which consists of
goal of potentially steering the user towards a better and the context in which the pain occurred, as well as the
healthier lifestyle. thoughts and reactions users had, the app is able to give
However, the increased use of HRS is also paralleled personalised recommendations to better cope with pain
with certain barriers. One such issue is a mismatch in flare-ups in the future. In this study, we look into sev-
eral designs that are deemed fit for explaining these pain
Joint Proceedings of the ACM IUI Workshops 2022, March 2022, related recommendations to end users. There remain,
Helsinki, Finland
however, several research challenges that need to be ad-
$ maxwell.szymanski@kuleuven.be (M. Szymanski);
vero.vandenabeele@kuleuven.be (V. V. Abeele); dressed, such as explanation interpretability and end user
katrien.verbert@kuleuven.be (K. Verbert) expertise which are discussed in the next related work
© 2022 Copyright for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). section.
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org)
1
Maxwell Szymanski et al. CEUR Workshop Proceedings 1–10
1.2. Explaining health recommendations Keeping the aforementioned biases in mind that lay users
are prone to, it is therefore tantamount to assess whether
As highlighted earlier, adding explanations to recom-
explanations are indeed interpretable to make sure no
mendations can improve the overall effectiveness. These
misalignment in trust is created.
make the system interpretable, which in turn can im-
With these considerations in mind, we investigate the
prove trust towards the system [7]. There exist HRS that
following research questions:
explain their rationale to the end user, such as the food
recommender system of Wayman et al. that explains why RQ1 What explanation design do lay users prefer when
certain recipes are recommended based on the user’s nu- explaining health recommendations and why?
tritional intake [8], or a visualisation for medical experts
RQ2 What design considerations are substantial when
that is able to explain breast cancer similarities [9]. How-
explaining health recommendations to lay users?
ever, the systematic review of De Croon et al. states that
only 10% of HRS that focus on lay users make use of
explanations. This makes HRS explanations for lay users 2. Explanation designs
a novel, but under-explored topic. Additionally, a study
of Bussone et al. points out that providing overly detailed As mentioned in section 1.1, we will focus on designing
explanations for health recommenders can create unfore- different explanations that will explain why users are
seen effects, such as creating over-reliance on explana- receiving specific recommendations for their pain flare-
tions [10], which points out that health recommender ups. Keeping the context and type of end users in mind,
explanations should be designed with sufficient care. This the following design guidelines have to be kept in mind
makes designing explanations with non-expert users in for all variants of explanations:
mind, and evaluating them with end users, paramount.
• Mobile-friendly: as the explanations will be
offered within the context of the mobile health
1.3. End user expertise app, the explanations have to be well-suited for
An increasing amount of research has pointed out that display on a small mobile screen.
the expertise of end users should be taken into account • Summative: the explanations should possess the
when designing explanations. Ribera et al. [11] have pro- ability to summarise categorical data, as input
posed three main categories of end users: non-experts consists of (semi-)unstructured user input.
(lay users), domain experts (in our context medical profes- • Suited for non-experts: as the end users are
sionals or health coaches) and software- and AI-experts. non-experts, the explanations should not use any
Each category of users comes with its own needs, goals advanced and statistical concepts to explain why
and limitations. AI expert users, for example, use XAI the recommendation is suggested.
to verify or improve the underlying AI system, whereas
Keeping these criteria in mind, we came up with the
domain experts can leverage explanations to gain addi-
following designs in Figure 1 based on well-known and
tional insights and learn from the system. Lay users have
widely used explanation types:
their own set of goals, but more interestingly their own
array of limitations as well. Wang et al. have pointed out
• Text-based: briefly explain why the recommen-
several shortcomings in non-expert users related to cog-
dation is related to the most prevalent input. The
nitive biases, such as confirmation and anchoring bias,
wording is based on the "communicating health-
due to a backward-oriented, hypothesis-driven reason-
related news to patients" guidelines described by
ing process [12]. Tsai et al. also noticed a reinforcing
[16] and these explanations were collaboratively
effect, where users avoid interacting with content they
designed for the purpose of this study by six ergo-
are not familiar with [13]. Szymanski et al. additionally
and physiotherapists.
pointed out that non-expert users, despite having these
• Text-based + inline reply: an addition to the
biases and incorrectly interpreting certain complex expla-
textual explanation, where the inline-reply shows
nations, can still have a preference for them over other,
which specific user message most contributed to
simpler explanation modalities [14].
the recommendation.
Thus we see that interpretability through explanations
has multiple benefits and can result in an increased trust • Tags: tags are a common method of communi-
towards the system. However, as previously mentioned, cating all topics that are relevant to a recommen-
the adoption of explanations in HRS is still low. Fur- dation (e.g. Bidargaddi et al. [17]).
thermore, most health-related AI explanations are being • Word clouds: in addition to showing all relevant
researched with AI and domain expert users in mind [15], topics, word clouds are able to additionally com-
which leaves a big gap for explanations w.r.t. lay users. municate relative importance/relevance of these
topics (e.g. [18, 19]).
2
Maxwell Szymanski et al. CEUR Workshop Proceedings 1–10
(a) Purely textual (b) Inline reply (c) Tags
(d) Word cloud (e) Feature importance (f) Feature importance + %
Figure 1: Explanation designs for pain-related health recommendations used throughout the user study
• Feature-importances (FI): feature importance inputs, by also displaying the exact values used by the
bars communicate contributing themes of the underlying RS.
user input, as well as their input relevance, albeit
in a more specific way compared word clouds. 2.1. Participants
• Feature-importances (FI) + percentages: adds
percentages to the FI bars to communicate exact For the user study, we recruited 11 participants out of a
topic importances. pool of 286 people who were already using the mobile
health coaching application without the pain logbook
These explanation designs are sorted from least to most and its recommender system, as mentioned in section 1.1,
by the amount of information they convey regarding the and thus knew and have interacted with the content and
inputs relevant to the recommendation. The textual ex- different modules. The group consisted of nine women
planation only focuses on one input, with the inline reply and two men, of which four finished graduate school, six
being able to also show which specific input triggered college, and one high school. Age-wise, 2 participants
the recommendation, whereas the tags are able to dis- were between 21-30, 5 between 31-40, 3 between 41-50
play all relevant input categories that are related to the and 1 between 51-60. All 11 users noted to use the in-
recommendation. The word-cloud further builds on this ternet on the regular basis, with 6 participants stating
by also displaying the relative importance of each input to be average computer and IT users, and 5 participants
related to the recommendation, and the FI shows the ex- stating to be advanced computer and IT users.
act sorting of input according to importance. The added
percentages give the most transparency regarding the
3
Maxwell Szymanski et al. CEUR Workshop Proceedings 1–10
2.2. Protocol of the evaluation study Insights through XAI (+)
At the start of the study, users were briefed on the pur- Six users liked the fact that they were able to gain more
pose and context of the think-aloud study, and gave their insight through this explanation modality. Four users
consent to having the audio recorded, after which they also stated that the percentages were a “nice-to-know”,
filled in the ResQue demographics questionnaire [20]. making the explanation more useful and informative.
Afterwards, they were guided through the pain logbook,
which they had to fill in with recent pain-episode they Negative sentiment towards XAI (-)
experienced in mind. Having done so, they received
some information regarding the recommendations that On the flip-side, two users disliked the addition of display-
are going to be given, along with the explanations. We ing percentages, stating that when it comes to emotions
briefly went over the six explanation designs in a fixed and feelings, certain aspects are not quantifiable. U4
order, after which we asked the participant to “explain stated: “Personally I think feelings are not quantifiable.
what they like or dislike about the explanation” sepa- The bars are good, but don’t put an exact number on it. It’s
rately for each design once they have seen them all. To okay if you’re communicating frequencies, like how often
conclude this preference elicitation, the users had to sort an emotion occurred for example.”.
the explanations by preference, with 1 being their most
preferred one, and 6 their least preferred. They also had Visual/information overload (-)
to give (or repeat) a key reason as to why they are giving Two users also stated that the addition of percentages is
each explanation a certain ranking. The audio recordings unnecessary, mentioning that only using bars to com-
of both the preference elicitation and ranking are used municate importances is sufficient.
afterwards for a thematic analysis.
3.2. Feature importance
2.3. Data analysis
Rank: 2 · The feature importance explanation was
The thematic analysis was done in two phases, with the among the most preferred explanations, liked for the
first phase consisting of deriving granular themes from fact that is was able to give a summary of the user input
the thematic analysis with two researchers, and the sec- (𝑛 = 11), as well as being able to give additional insights
ond phase focusing on merging them to higher level (𝑛 = 2).
themes with a third researcher. The resulting higher
level themes are displayed in Figure 3, along with the
Provides summary (+)
frequencies in which they occur per explanation design.
The agreement percentage of the first phase two-coder Six users found the feature importance bars to be a clear
thematic analysis is 88.1%, with Cohen’s kappa being way of communicating input topics and their importance.
𝜅 = 0.66, resulting in a substantial inter-coder agree- Four users stated that it gives them a nice overview of
ment [21]. their input.
Insights through XAI (+)
3. Results
Two users specifically liked the additional insights that
Taking the average raking scores of all explanation de- they were able to get from the feature importances. U4
signs, we are now able to rank the 6 explanation modal- mentioned: “There are of course no numbers given, but
ities from best to worst ranked, along with the results I can assume that I am really frustrated, and a bit less
from the thematic analysis to explain why each explana- angry. I find it interesting to reflect on results that come
tion type scored poorly or adequately. Figure 2 shows out of a questionnaire.”
the frequencies of the rankings given to each explanation
design.
Negative sentiment towards XAI (-)
3.1. Feature importance + percentage Three users were unsure of the ranking of some topics,
stating that they agreed with the general content, but not
Rank: 1 (best) · This explanation type was favored by as to why one topic was deemed more important over
most users, mainly due to the fact that it provided the others. This caused these users to slightly dislike and
most insight and transparency (𝑛 = 10). Only three out distrust the system, and give it a lower ranking.
of 11 people found the addition of the percentages to
feature importance bars to be inefficacious.
4
Maxwell Szymanski et al. CEUR Workshop Proceedings 1–10
Figure 2: Frequencies of rankings per explanation type
Figure 3: TA themes per explanation design and their frequencies
Visual/information overload (-) Insights through explanation (+)
Two users found the bars to be unnecessary, giving Three users were fond of the additional insights they
them information as to what contributed towards the rec- got from the tags and the general themes that were
ommendation, but not why, like the textual explanation present in their input. U3 stated: “When inputting my
did. U6 stated: “There is not a lot of background given. It feelings I did not necessarily perceive them as negative or
shows that these inputs contributed to my recommendation, angry. But based on these tags, I’m able to see: okay, this
but not why.” is how the app interprets my feelings.”
3.3. Tags Visual/information overload (-)
Rank: 3 · Tags scored relatively better than the previous Only two users stated that tags were unnecessary or
three explanations in terms of average ranking, and were provided too much information. U6 stated: “Yes it’s clear,
liked for their summative ability (𝑛 = 8). Only people but less practical. I tend to focus on one thing at a time.”
who disliked having a lot of information, were less in
favor of the tag explanation (𝑛 = 2). 3.4. Purely textual
Rank: 4 · Purely textual explanations received mixed
Provides summary (+)
reactions during the think-aloud study. When users liked
Four users found using tags to be a nice way of providing or agreed with the recommendation, the textual explana-
a summary of their input. Four users also stated that tion was a welcome addition helping them understand the
doing in such a way is a clear and concise method of recommendation process and the recommendation itself,
explaining why the recommendation is given. and gave users a nice summary of why the recommenda-
tion matched their inputs (𝑛 = 8). However, when the
recommendation wasn’t in line with the user’s expecta-
tions, the textual explanation highlighted the mismatch
5
Maxwell Szymanski et al. CEUR Workshop Proceedings 1–10
even more and caused a poor reception of the recom- Problem with representation (-)
mender system in general (𝑛 = 5). Here is an overview
Only some minor and infrequent negative remarks were
of these topics:
given surrounding inline replies. Three users disliked
the fact that by highlighting or repeating their negative
Provides summary (+) input, they are more confronted with it. One user ad-
Six users found that the textual explanation was able to ditionally mentioned that this explanation feels like the
summarize their input quite well, albeit only focusing recommendation is only tuned to one input instead of
on one topic (the most relevant one) surrounding the multiple user inputs, making it feel too specific.
recommendation.
3.6. Word cloud
Positive sentiment towards explanation (+)
Rank: 6 (last) · The word cloud received the lowest av-
Two users stated that the written explanation was con- erage score. In general, users like the addition of display-
firming and comforting. One user also stated that the ing keyword or topic importance, however using a word
wording of the textual explanation felt less confronting cloud to do so proves to be an inferior solution. The the-
regarding their negative input. matic analysis points out two main negative themes as to
why this explanation is disliked: problems with represen-
Negative sentiment towards explanation (-) tation and content (𝑛 = 9) and visual/information over-
load (𝑛 = 4) and one positive theme, insights through
On the other hand, three users mentioned that they can- explanation (𝑛 = 4).
not relate to the recommendation, and that the textual
explanation highlighted this fact. U4 also found the ex-
Problems with representation (-)
planation to also be provoking, stating the following: “I
know that I’m frustrated and that it does not help. However, Three users pointed out having keyword size commu-
explaining that acts like waving a red flag in front of a nicate importance was unclear, and would rather have
bull.” something concrete like bars indicating exact relevance.
Three users also pointed out that the inconsistent sizes
3.5. Inline-reply inherent to the design of word clouds were visually dis-
pleasing. Two users additionally stated highlighting
Rank: 5 · During the think-aloud study, the inline reply important keywords might be too confronting with re-
received relatively positive feedback and comments re- spect to their own input, e.g. if a user inputs that they
garding the succinct summary it gave of the users input are feeling sad, having it displayed as a large word might
(𝑛 = 7), with only some minor remarks regarding the confront the user too much with their state of mind.
presentation of the explanation (𝑛 = 3). However, it
scored quite low during the preference ranking itself due Visual/information overload (-)
to other explanation modalities simply being preferred
over the inline-reply. Three users found the addition of displaying relevance
in such a way unnecessary, one of which additionally
Provides summary (+) stated that adding the information in such way is too
distracting.
Six users found the explanation modality to be clear and
more concrete, and one user additionally stated that Insights through explanation (+)
showing which message triggered the recommendation
requires less analysis from the user. Four users stated however that adding this information
of keyword relevance gives more insight due to not
Insights through explanation (+) only showing the relevant topics, but their importance
as well.
Three users liked the fact that the inline-reply raises
awareness of the fact that the recommendation is related
to one of their own inputs. U3 stated: “I find it better than 4. Discussion
the textual explanation. There, they state ’You seem to be
frustrated’, and here you really are made aware of the fact We will now discuss some of the most prevalent obser-
that it’s your own input.“ vations that were present in several explanation designs,
as well as suggest guidelines on how to design health
explanations for lay users experiencing (chronic) pain.
6
Maxwell Szymanski et al. CEUR Workshop Proceedings 1–10
4.1. Beware of confronting people with Figure 4. Keeping the control aspect in mind from previ-
negative sentiments ous section, users are also able to tap on different topics
to request recommendations regarding said topic.
People experiencing (chronic) pain or illness can feel dis-
tress when receiving negative information surrounding
their state. In our study, we noticed that highlighting
keywords that are potentially negative (e.g. negative
emotions, reactions, etc.), can cause distress with users
and therefore make them dislike the explanation. This
was apparent with the inline reply and word cloud expla-
nations, where visually highlighting negative sentiments
that relate to the recommendation caused users to dislike
the explanation.
4.2. Use tags or feature importance when
control is needed
Due to the fact that tags and FI/FI+% are able to dis-
play multiple input categories, users positively expressed
that this would provide them more control over the rec-
ommendation process, if the design or implementation
allows for it. One user suggested that tapping certain
topics could be useful to request recommendations in a
Figure 4: Adapted feature importance explanation design
more user-controlled way. Other users additionally sug-
gested U9:“It’s nice if you can individually remove certain
topics”, and U7: “... especially of you notice something that
wasn’t interpreted the way you intended it”.
4.4. Insight vs. information overload
4.3. Design FI through a lay user’s Users generally liked the holistic approach of the feature
importances, and were more inclined to look into the
perspective recommendation itself. When asked why they liked the
The FI and FI+% designs were favored by most users, recommendations more when explained using FI com-
giving most users the insight and summary they needed. pared to the purely textual explanation, they stated that
However, as mentioned in section 3.2, U4 interpreted the the FI were able to show them a general overview of them
FI bars as “... I can assume that I am really frustrated, as a person.
and a bit less angry”, indicating that they saw it as an On the other hand, there were also some users who dis-
overview of their input, and not how strongly their input agreed with the ordering of keyword importances that
relates to the recommendation. In total, 10 out of 11 lay the feature importance bars were displaying, causing
users interpreted FI differently than intended. Only U4 a slight increase in distrust towards the recommender
was able to correctly interpret the bars (after reading the system, ranking the explanation lower. This is to be ex-
text above the FI bars - “This is how your inputs relate pected, as increasing transparency of explanations can
to the recommendation”), saying “The frustrated bar is cause a higher drop in trust towards the system if the
the biggest, okay, so that contributes most to my recom- content of the explanation or recommendation does not
mendation”. Having a wrong interpretation could lead to align with the user’s expectation. However, the effect
confusion towards the system when, for example, a next of a misaligned textual explanation is still stronger, as
recommendation is shown, and the input keywords and users who did not agree with either the recommendation
their relevance change with respect to this new recom- or the explanation expressed a more negative sentiment
mendation. However, overcoming biases and changing towards the recommendation, and gave the textual rec-
mental models of lay users often proves to be difficult. ommendation a lower ranking. This is in line with similar
A possible design adaptations to the FI and FI+% design, research by Balog et al. [5], in which they state that mis-
may show a general overview/summary of the user in- aligned recommendations that focus on a single topic or
put to be in line with what users were interpreting, and item are more susceptible to a lower perceived quality of
then highlight the keywords that are relevant to the rec- explanation compared to multi-item recommendations.
ommendation that is being shown. This can be seen in
7
Maxwell Szymanski et al. CEUR Workshop Proceedings 1–10
5. Conclusion G0A3319N, financed by Research Foundation Flanders
(FWO).
This paper introduced several explanation designs for
mobile pain related health recommendations, and com-
pared them among lay users. Most users preferred the References
added transparency that was provided by the tags and FI
/ FI+% designs, stating that it gave them a brief and clear [1] R. De Croon, L. Van Houdt, N. N. Htun, G. Štiglic,
overview of their input which helped them understand V. Vanden Abeele, K. Verbert, Health recommender
why they received certain recommendations. Another in- systems: Systematic review, J Med Internet Res 23
teresting aspect is the fact that designs should be careful (2021) e18035. URL: https://www.jmir.org/2021/6/
with visually highlighting negative sentiments of users. e18035. doi:10.2196/18035.
Designs that did so, i.e. the inline-reply and word cloud, [2] F. Torrent-Fontbona, B. Lopez, Personalized adap-
were received poorly by users. Lastly, we confirmed that tive cbr bolus recommender system for type 1 di-
lay users might interpret certain visual explanations dif- abetes, IEEE Journal of Biomedical and Health In-
ferently than intended, yet still prefer them over others. formatics 23 (2019) 387–394. doi:10.1109/JBHI.
Given their feedback, we presented an adapted design 2018.2813424, robin’s Paper: [93].
of the favoured FI / FI+% explanation to be in line with [3] R. Gouveia, E. Karapanos, M. Hassenzahl, How do
what lay users expect. we engage with activity trackers? a longitudinal
study of habito, UbiComp 2015 - Proceedings of
the 2015 ACM International Joint Conference on
6. Limitations & Future work Pervasive and Ubiquitous Computing (2015) 1305–
1316. doi:10.1145/2750858.2804290.
The qualitative aspect of this study was already able [4] K. Cheung, W. Ling, C. J. Karr, K. Weingardt, S. M.
to point out several key aspects related to designing Schueller, D. C. Mohr, Evaluation of a recommender
health explanations for patients experiencing chronic app for apps for the treatment of depression and
pain. However, a larger scale quantitative user study is anxiety: An analysis of longitudinal user engage-
needed to further investigate these results. One such ment, Journal of the American Medical Informat-
aspect is the fact that some users preferred textual expla- ics Association 25 (2018) 955–962. doi:10.1093/
nations over explanations that offered more information. jamia/ocy023.
Investigating whether this correlates to the user’s need [5] K. Balog, F. Radlinski, Measuring Recommenda-
for cognition (NFC), and what its implications are, can tion Explanation Quality: The Conflicting Goals
prove to be an interesting research direction similar to the of Explanations, in: Proceedings of the 43rd Inter-
research of Millecamp et al. [22]. Another aspect is the national ACM SIGIR Conference on Research and
fact that while most users disliked being confronted with Development in Information Retrieval, SIGIR ’20,
their negative input, some did not mind. This could be Association for Computing Machinery, New York,
related to the "warriors vs. worriers" research, in which NY, USA, 2020, p. 329–338. URL: https://doi.org/
some users experiencing chronic pain actually prefer be- 10.1145/3397271.3401032. doi:10.1145/3397271.
ing exposed to negative feedback so they could address it, 3401032.
and could prove useful for further research [23]. Future [6] A. Calero Valdez, M. Ziefle, K. Verbert, Hci for rec-
research should also consider other designs to explain ommender systems: The past, the present and the
health recommendations and elaborate design guidelines future, in: Proceedings of the 10th ACM Confer-
that can be used by researchers and practitioners in this ence on Recommender Systems, RecSys ’16, As-
exciting domain. In addition, an interesting further line sociation for Computing Machinery, New York,
of research is to personalise these explanations on-the- NY, USA, 2016, p. 123–126. URL: https://doi.org/
fly, based on interaction data of end-users. As in work 10.1145/2959100.2959158. doi:10.1145/2959100.
of [24], clicks and hover interactions as well as eye gaze 2959158.
data can be considered for such personalisation. [7] D. V. Carvalho, E. M. Pereira, J. S. Cardoso, Ma-
chine learning interpretability: A survey on meth-
ods and metrics, Electronics 8 (2019). URL: https://
Acknowledgments www.mdpi.com/2079-9292/8/8/832. doi:10.3390/
This work is part of the research projects Personal electronics8080832.
Health Empowerment (PHE) with project number [8] E. Wayman, S. Madhvanath, Nudging Grocery
HBC.2018.2012, financed by Flanders Innovation & En- Shoppers to Make Healthier Choices, in: Proceed-
trepreneurship, and IMPERIUM with project number ings of the Ninth Conference on Recommender
8
Maxwell Szymanski et al. CEUR Workshop Proceedings 1–10
Systems, ACM, 2015, pp. 289–292. doi:10.1145/ ommendation service for a curated list of read-
2792838.2799669. ily available mental health and well-being mobile
[9] J.-B. Lamy, B. Sekar, G. Guezennec, J. Bouaud, apps for young people: Randomized controlled
B. Séroussi, Explainable artificial intelligence trial, Journal of Medical Internet Research 19 (2017).
for breast cancer: A visual case-based reasoning doi:10.2196/jmir.6775, robin’s Paper: [55].
approach, Artificial Intelligence in Medicine 94 [18] Y. Wu, M. Ester, Flame: A probabilistic model
(2019) 42–53. URL: https://www.sciencedirect. combining aspect based opinion mining and col-
com/science/article/pii/S0933365718304846. laborative filtering, in: Proceedings of the Eighth
doi:https://doi.org/10.1016/j.artmed. ACM International Conference on Web Search
2019.01.001. and Data Mining, WSDM ’15, Association for
[10] A. Bussone, S. Stumpf, D. M. O’Sullivan, The role Computing Machinery, New York, NY, USA, 2015,
of explanations on trust and reliance in clinical de- p. 199–208. URL: https://doi.org/10.1145/2684822.
cision support systems, 2015 International Confer- 2685291. doi:10.1145/2684822.2685291.
ence on Healthcare Informatics (2015) 160–169. [19] C.-H. Tsai, P. Brusilovsky, Evaluating Visual Ex-
[11] M. Ribera, A. Lapedriza, Can we do better explana- planations for Similarity-Based Recommendations:
tions? a proposal of user-centered explainable ai, User Perception and Performance, in: Proceed-
CEUR Workshop Proceedings 2327 (2019). ings of the 27th ACM Conference on User Mod-
[12] D. Wang, Q. Yang, A. Abdul, B. Y. Lim, Designing eling, Adaptation and Personalization, UMAP ’19,
Theory-Driven User-Centric Explainable AI, Asso- Association for Computing Machinery, New York,
ciation for Computing Machinery, New York, NY, NY, USA, 2019, p. 22–30. URL: https://doi.org/
USA, 2019, p. 1–15. URL: https://doi.org/10.1145/ 10.1145/3320435.3320465. doi:10.1145/3320435.
3290605.3300831. 3320465.
[13] C.-H. Tsai, P. Brusilovsky, Beyond the ranked list: [20] P. Pu, L. Chen, R. Hu, A user-centric evaluation
User-driven exploration and diversification of so- framework for recommender systems, in: Pro-
cial recommendation, in: 23rd International Con- ceedings of the Fifth ACM Conference on Rec-
ference on Intelligent User Interfaces, IUI ’18, As- ommender Systems, RecSys ’11, Association for
sociation for Computing Machinery, New York, Computing Machinery, New York, NY, USA, 2011,
NY, USA, 2018, p. 239–250. URL: https://doi.org/ p. 157–164. URL: https://doi.org/10.1145/2043932.
10.1145/3172944.3172959. doi:10.1145/3172944. 2043962. doi:10.1145/2043932.2043962.
3172959. [21] N. J.-M. Blackman, J. J. Koval, Interval es-
[14] M. Szymanski, M. Millecamp, K. Verbert, Visual, timation for cohen’s kappa as a measure of
textual or hybrid: The effect of user expertise on agreement, Statistics in Medicine 19 (2000)
different explanations, in: 26th International Con- 723–741. doi:https://doi.org/10.1002/
ference on Intelligent User Interfaces, IUI ’21, As- (SICI)1097-0258(20000315)19:5<723::
sociation for Computing Machinery, New York, AID-SIM379>3.0.CO;2-A.
NY, USA, 2021, p. 109–119. URL: https://doi.org/ [22] M. Millecamp, N. N. Htun, C. Conati, K. Verbert, To
10.1145/3397481.3450662. doi:10.1145/3397481. explain or not to explain: The effects of personal
3450662. characteristics when explaining music recommen-
[15] J. Ooge, G. Stiglic, K. Verbert, Explaining arti- dations, in: Proceedings of the 24th International
ficial intelligence with visual analytics in health- Conference on Intelligent User Interfaces, IUI ’19,
care, WIREs Data Mining and Knowledge Dis- Association for Computing Machinery, New York,
covery 12 (2021). URL: https://wires.onlinelibrary. NY, USA, 2019, p. 397–407. URL: https://doi.org/
wiley.com/doi/abs/10.1002/widm.1427. doi:https: 10.1145/3301275.3302313. doi:10.1145/3301275.
//doi.org/10.1002/widm.1427. 3302313.
[16] M. Schmid Mast, A. Kindlimann, W. Lange- [23] J. Geuens, T. Swinnen, L. Geurts, R. Westhovens,
witz, Recipients’ perspective on breaking bad R. De Croon, V. Vanden Abeele, Worriers versus
news: How you put it really makes a difference, warriors: Tailoring mhealth to address differences
Patient Education and Counseling 58 (2005) in patients with chronic arthritis, in: 2020 IEEE In-
244–251. URL: https://www.sciencedirect.com/ ternational Conference on Healthcare Informatics
science/article/pii/S0738399105001473. doi:https: (ICHI), 2020, pp. 1–12. doi:10.1109/ICHI48887.
//doi.org/10.1016/j.pec.2005.05.005, 2020.9374322.
medical Education and Training in Communication. [24] M. Millecamp, T. Willemot, K. Verbert, Your eyes ex-
[17] N. Bidargaddi, P. Musiat, M. Winsall, G. Vogl, plain everything: exploring the use of eye tracking
V. Blake, S. Quinn, S. Orlowski, G. Antezana, to provide explanations on-the-fly, in: Proceedings
G. Schrader, Efficacy of a web-based guided rec- of the 8th Joint Workshop on Interfaces and Hu-
9
Maxwell Szymanski et al. CEUR Workshop Proceedings 1–10
man Decision Making for Recommender Systems
co-located with 15th ACM Conference on Recom-
mender Systems (RecSys 2021), volume 2948, CEUR
Workshop Proceedings, 2021, pp. 89–100.
10