=Paper= {{Paper |id=None |storemode=property |title=An Empirical Study on the Persuasiveness of Fact-based Explanations for Recommender Systems |pdfUrl=https://ceur-ws.org/Vol-1253/paper6.pdf |volume=Vol-1253 }} ==An Empirical Study on the Persuasiveness of Fact-based Explanations for Recommender Systems== https://ceur-ws.org/Vol-1253/paper6.pdf
    An empirical study on the persuasiveness of fact-based
           explanations for recommender systems

                                 Markus Zanker                                                  Martin Schoberegger
                    Alpen-Adria-Universitaet Klagenfurt                                    Alpen-Adria-Universitaet Klagenfurt
                         9020 Klagenfurt, Austria                                               9020 Klagenfurt, Austria
                              mzanker@acm.org                                              m3schobe@edu.uni-klu.ac.at


ABSTRACT                                                                             recommendation list and promote objectives such as users’
Recommender Systems (RS) help users to orientate them-                               trust in the system and confidence in decision making. In
selves in large product assortments and provide decision sup-                        the domain of expert systems explanations have already a
port. Explanations help recommender systems to enhance                               long tradition, where formal argumentation traces can serve
their impact on users by, for instance, justifying made rec-                         as explanations that justify the output of a system [8]. Ac-
ommendations. Arguments provide reason in a more struc-                              cording to [5] an argument is (a) a series of sentences, state-
tured way, by denoting a conclusion that follows from one                            ments, or propositions (b) where some are premises (c) and
or more premises. While expert systems’ explanation have                             one is the conclusion (d) where the premises are intended
a long tradition in using argumentative patterns, argumen-                           to give a reason for the conclusion. As we believe that re-
tative explanations for recommendations have not yet been                            search on explanations in general and comparative studies
systematically researched. This paper compares therefore                             on competing explanation styles are rare (a few pointers to
the persuasion potential of different explanation styles (sen-                        more recent exceptions [7, 6, 3]), we conducted a supervised
tences, facts or argument style) by comparing the robustness                         lab study that had the purpose to research the impact of
of subjects’ preferences when employing an additive utility                          different explanation styles of knowledgeable explanations
model from conjoint analysis.                                                        [11]. In particular we are interested in effects on the robust-
                                                                                     ness of users’ preferences when confronted with additional
                                                                                     explanations, i.e. exploring the persuasion potential of ex-
Keywords                                                                             planations. More concretely we compared fact-based expla-
Recommender Systems, Explanation styles, Persuasion po-                              nations, that presented keywords as explanations to users,
tential of explanations                                                              such as A, B, C, with a basic argument style with A and B
                                                                                     as premises and C as a consequent, i.e. A, B therefore C.
1. INTRODUCTION                                                                      Furthermore, we compared these fact-based explanations to
                                                                                     sentence-based explanations requiring more cognitive effort
   Recommender Systems (RS) support online customers in
                                                                                     to understand them. We selected three different item do-
their decision making and should help them to avoid poor
                                                                                     mains that typically trigger high involvement of users, i.e.
decisions [4]. Persuasive systems [9] are focusing on chang-
                                                                                     hiking routes from the tourism and leisure domain (hiking
ing a user’s belief or actions in an intended way. In this
                                                                                     routes), energy plans and mobile phone plans, and controlled
context recommender systems need to be also seen as per-
                                                                                     for user preferences, item portfolio and the semantics of the
suasive systems, as their purpose lies in pointing users to-
                                                                                     explanations themselves. We would like to note that the
wards unknown items that presumably match their interest,
                                                                                     study was conducted in the scope of the O-STAR project
i.e. making serendipitous propositions. This clearly differen-
                                                                                     that researches techniques for personalized route planning
tiates a recommendation system (RS) from an information
                                                                                     for hikers in alpine regions. Next we will provide details on
retrieval (IR) system that assumes an objective information
                                                                                     our study design and finally discuss results and conclusions.
need of a user that can satisfied. In general explanations can
be seen as an attempt to fit a particular phenomenon into
a general pattern in order to increase understanding and re-                         2.   STUDY DESIGN
move bewilderment or surprise [5]. In the context of product
                                                                                        We researched the question if the introduction of an argument-
recommendation scenarios explanations can be seen as ad-
                                                                                     based writing style, i.e. use of the keyword therefore to
ditional information about recommendations [2] that serves
                                                                                     denote the conclusion of the preceding premises, has an im-
the purpose of justifying why a specific item is part of a
                                                                                     pact on the robustness of users’ preferences in face of addi-
                                                                                     tional explanations. As already mentioned we asked users
                                                                                     to disclose their preferences for three different item domains
Permission to make digital or hard copies of all or part of this work for            (hiking routes, mobile phone plans and energy plans) in a
personal or classroom use is granted without fee provided that copies are            supervised offline questionnaire. Figures 1 and 2 depict two
not made or distributed for profit or commercial advantage and that copies           exemplary items from the hiking domain. Subjects were in-
bear this notice and the full citation on the first page. To copy otherwise, to      vited to participate in a seminar room, where they had to
republish, to post on servers or to redistribute to lists, requires prior specific   answer a paper & pencil survey with two parts. The first
IntRS 2014,
permission       October
            and/or  a fee. 6, 2014, Silicon Valley, CA, USA.
                                                                                     part included for each of the three domains exactly 6 items,
Copyright
Workshop       2014
           IntRS       by the
                  at RecSys  ’14author(s).
                                  Foster City, CA USA
                                                                                     that are described by either 4 or 5 characteristics.
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.
Hiking routes
Distance                        in km
Altitude                        in m
Level of difficulty               easy or demanding
Physical fitness                 (not) required
Possibility for meal on route   yes/no
Energy plans
Renewable energy                100%/no
Pricing                         dynamic vs. fixed
Fixed contract duration         in months
Guaranteed price                yes/no
Mobile phone plans
Basic fee                       in EUR
Type of phone                   Smartphone vs. simple phone
Anytime minutes                 amount
Fixed contract duration         in months                          Figure 2: Excerpt from questionnaire - part 2

    Table 1: Attributes describing item domains




                                                                       Figure 3: Big picture of research design


                                                                (solely fact-based, argumentative facts and argumentative
                                                                sentences). Explanation style is permuted within subjects,
                                                                i.e. participants are confronted with all three explanation
   Figure 1: Excerpt from questionnaire - part 1                styles for a different item domain and in different orders,
                                                                while the combination of item domain and explanation style
                                                                is varied between subjects. For each item exactly two ar-
   Table 1 depicts the three item domains and the artificial     guments, each with two premises and one conclusion, are
design space of the item portfolios. To avoid confusion the     added as additional information (see examples in Table 2).
semantics of the domain attributes were defined in a sidebar     See Figure 2 for a depiction of two exemplary items from
(e.g. Smartphone: denotes a device in the range of HTC          the hiking domain with explanations following the style of
Desire X or Nokia Lumia 625). Participants had to rank the      argumentative facts.
6 options according to their general preference with respect       Finally, the questionnaire controlled for demographic char-
to the particular item domain. After disclosing their pref-     acteristics and checked if participants noticed the interven-
erences in the first part of the questionnaire (see Figure 1     tion, i.e. one question asked what was relevant for rank-
for a translated excerpt of the questionnaire) users had to     ing the items with multiple answering options. For analysis
solve a picture puzzle, where 10 different errors were hidden.   we selected only participants that considered the additional
The purpose of this task is twofold: first, it distracts users   explanations provided in the second part in their ranking
from their thoughts on the ranking tasks and, second, we        decision.
could use the numerical measure of correctly marked errors         In Figure 3 we sketch the big picture of the study design.
to assess how concentrated participants followed the ques-      Thus, participants rank sets of items from three different
tionnaire. Once participants had finished the first part they     domains twice, where item sets in the first and second part
handed it in and received the second part of the survey. This   of the questionnaire do not overlap. Due to measuring user
way we were able to avoid that participants could have taken    preferences twice for each domain (without and with inter-
a look on their first-round ranking when answering the sec-      vention of a specific explanation style), we can control for
ond part.     In the second part participants had again to      the participants’ preferences on item sets and their presenta-
rank sets of five items from the three item domains. How-        tion. We employ an additive model from conjoint analysis,
ever, in addition to the item characteristics already used in   that allows us to estimate the utilities for each item char-
the first-round, additional explanations were given for each     acteristic [1], i.e. the ∑
                                                                                         overall utility of an item yi is com-
item. The explanation style acts as the manipulated variable    puted as the sum µ + Z βZ , where µ is a basic utility and
Hiking routes                                                          the picture puzzle.
Solely facts                    low altitude                        2. We asked participants what they considered to be rele-
                                easy distance                          vant for making their decisions on the rankings. Based
                                very family-friendly                   on the answers to this multiple choice question we in-
Argumentative facts             low altitude                           cluded only respondents who had noticed the addi-
                                easy distance                          tional information (explanations) and excluded all re-
                                therefore very family-friendly         spondents who answered that they relied on their gut
Argumentative sentences         This route is of low altitude          feelings.
                                and easy distance, therefore
                                                                    3. We also asked participants how they experienced this
                                it is very family-friendly.
                                                                       survey with the answering options interesting, chal-
Energy plans                                                           lenging, boring, unclear and useless. For further con-
Solely facts                    100% renewable energy                  sideration we only kept respondents that answered chal-
                                low environmental impact               lenging and were thus captivated by the ranking tasks.
                                high sustainability                    We assumed that the option interesting is a polite way
                                                                       of saying boring or useless.
Argumentative facts             100% renewable energy
                                low environmental impact            4. Finally we cleaned records from the dataset, where the
                                therefore high sustainability          estimation of individual utilities for product charac-
Argumentative sentences         This energy plan offers 100%            teristics was not reliable, i.e. rank correlation between
                                renewable energy with a low            the a priori rankings based on estimated utility weights
                                environmental impact, therefore        and the actual a priori ranking of participants had to
                                its sustainability is high.            be above 0.7.
Mobile phone plans
                                                                  After applying this extremely restrictive selection procedure
Solely facts                    low basic fee                     we derived at the following size of the dataset (see Table
                                many anytime minutes              3). In order to check for the robustness of preferences af-
                                ideal for heavy use
Argumentative facts             low monthly basic fee
                                                                                                 Hiking      Energy       Mobile
                                many anytime minutes
                                therefore ideal for heavy use     Solely facts                       10           12            7
Argumentative sentences         This mobile phone plan            Argumentative facts                 6           12           13
                                offers a low monthly basic fee     Argumentative sentences            10            5            8
                                with many anytime minutes,
                                therefore it is ideal for          Table 3: Respondents per domain and expl. style
                                heavy use.
                                                                  ter introducing argument-based explanations we compute
Table 2: Example explanations/arguments for each                  Spearman rank correlation coefficient between the a priori
of the three item domains                                         rankings based on estimated utility weights and the empir-
                                                                  ical rankings by participants. Table 4 reports the averaged
                                                                  Spearman’s ρ for each explanation style and aggregated over
βZ denotes the positive or negative utility contributed by        domains. As can be seen from Table 4 the argumentation-
a specific item characteristic Z (for instance, the possibility
to have your meal on route in the hiking domain). Having
                                                                  Explanation style                             Rank correlation
estimated the individual utilities of each item characteristic
we computed an a priori ranking for the unseen item sets          Solely facts                                              0.43
in the survey’s second part that is then compared with the        Argumentative facts                                       0.36
observed ranks for each user.                                     Argumentative sentences                                   0.67

3. RESULTS AND DISCUSSION                                         Table 4: Robustness of preferences in face of differ-
   In total 136 subjects, mostly students from Alpen-Adria-       ent explanation styles
Universtät Klagenfurt, participated in our survey. From
each participant we received three rankings in the second         styled facts that included the keyword therefore to denote
part of the survey (one for each domain), i.e. a total of         a conclusion reduced the robustness of participants’ prefer-
408 computed rank correlations before cleaning. More than         ences more than the pure fact-based explanations, i.e. sup-
80% of all participants were young people aged between 18         porting our hypothesis that an argumentative explanation
and 25. Two thirds of our participants were females. All          style would influence users more. Argumentative sentences
respondents had a high-school degree and a few of them had        preserved user preferences more than the fact-based explana-
already a graduation degree from a university. Before analy-      tion styles. Obviously, sentences need more cognitive effort
sis we rigorously excluded participants whose answers might       from users to be understood and the effect of the keyword
be unreliable due to the following criteria:                      therefore was seemingly lost in the sentence structure. The
                                                                  difference between Spearman’s ρ in all three categories is
  1. Only respondents who demonstrated a thorough atti-           statistically significant according to Kruskall-Wallis test (p
     tude by identifying at least 50% of all hidden errors in     = 0.037).
  In addition we checked for interaction effects between ex-        able features of recommender systems. Limitations or pos-
planation style and product domain. As can be seen from            sible lines of future research include varying the complexity
Table 5 fact-based explanation styles lead to less robust          of arguments (i.e the number of premises) or its number as
preferences than sentence-based explanation styles. Further-       well as additional item domains.
more, argumentative facts seem to reduce participant’s ro-
bustness of preferences even more than a pure facts based          Acknowledgements
explanation style. The only exception is the hiking domain,
where the order between facts and argumentative facts is           Authors acknowledge the financial support from the Eu-
inverted. However, in this product domain preference ro-           ropean Union (EU), the European Regional Development
bustness is generally lower and it might have been harder          Fund (ERDF), the Austrian Federal Government and the
for respondents to determine own preferences in the hiking         State of Carinthia in the Interreg IV Italien-Österreich pro-
domain than in the other two domains.                              gramme (project acronym O-STAR).

                               Hiking       Energy       Mobile    5.   REFERENCES
                                                                    [1] Klaus Backhaus, Bernd Erichson, Wulff Plinke, and
Solely facts                      0.27         0.48         0.58
                                                                        Rolf Weiber. Multivariate Analysemethoden: Eine
Argumentative facts               0.38         0.34         0.38
                                                                        anwendungsorientierte Einführung. Springer, Berlin,
Argumentative sentences           0.58         0.78         0.71
                                                                        12., vollständig überarbeitete auflage. edition, 2008.
Table 5: Spearman’s ρ per domain and expl. style                    [2] Gerhard Friedrich and Markus Zanker. A taxonomy
                                                                        for generating explanations in recommender systems.
                                                                        AI Magazine, 32(3):90–98, 2011.
   This study therefore showed, that fact-based explanations
                                                                    [3] Fatih Gedikli, Dietmar Jannach, and Mouzhi Ge. How
and an argumentative explanation style impacted partici-
                                                                        should i explain? a comparison of different
pants’ preferences stronger than full sentence explanations.
                                                                        explanation types for recommender systems. Int. J.
Objections against these conclusions might be the lack of
                                                                        Hum.-Comput. Stud., 72(4):367–382, 2014.
a control group and the paper & pencil design without a
                                                                    [4] Dietmar Jannach, Markus Zanker, Alexander
real recommendation situation. A control group would al-
                                                                        Felfernig, and Gerhard Friedrich. Recommender
low us to estimate the natural stability of preferences be-
                                                                        Systems: An Introduction. Cambridge Univ Pr, 2010.
tween both rounds and without any intervention. However,
in this study we were not interested in absolute rank corre-        [5] W. Sinnott-Armstrong and R. Fogelin. Cengage
lation measures, but only in the comparison of robustness               Advantage Books: Understanding Arguments.
of respondents’ preferences between different conditions and             Wadsworth, 2014.
assumed that some natural instability would affect all expla-        [6] Nava Tintarev and Judith Masthoff. Evaluating the
nation styles the same way. In order to assess the impact               effectiveness of explanations for recommender systems.
of an argumentative explanation style we wanted to control              User Modeling and User-Adapted Interaction,
for other effects and biases as good as possible. The super-             22(4-5):399–439, 2012.
vised paper & pencil approach allowed us to control for user        [7] Jesse Vig, Shilad Sen, and John Riedl. Tagsplanations:
preferences, the item portfolio and the persuasiveness of the           Explaining recommendations using tags. In
explanation content itself as well as insisting on a high reli-         Proceedings of the 14th International Conference on
ability of the measurements by excluding participants, who              Intelligent User Interfaces, IUI ’09, pages 47–56, New
made arbitrary rankings or did not notice the additional ex-            York, NY, USA, 2009. ACM.
planations. In a previous study [10] we already compared            [8] L. R. Ye and P. E. Johnson. The impact of
the sentence-based explanations with a no-explanations con-             explanation facilities in user acceptance of expert
trol group and observed their positive impact on the percep-            system advice. MIS Quarterly, 19(2):157–172, 1995.
tion of the recommender system as a whole. However, one             [9] Kyung Hyan Yoo, Ulrike Gretzel, and Markus Zanker.
could not isolate the impact on the robustness of prefer-               Persuasive Recommender Systems - Conceptual
ences by controlling for the different recommendation lists,             Background and Implications. Springer Briefs in
the different explanation content that would apply to dif-               Electrical and Computer Engineering. Springer, 2013.
ferent recommendations or the differing appreciation of the         [10] Markus Zanker. The influence of knowledgeable
recommendation results themselves by participants.                      explanations on users’ perception of a recommender
                                                                        system. In Padraig Cunningham, Neil J. Hurley, Ido
4. CONCLUSIONS                                                          Guy, and Sarabjot Singh Anand, editors, RecSys,
   This short paper presented an innovative study design for            pages 269–272. ACM, 2012.
measuring the impact of different explanation styles on par-        [11] Markus Zanker and Daniel Ninaus. Knowledgeable
ticipants’ robustness of preferences in face of additional ex-          explanations for recommender systems. In
planations. The results indicate that fact-based explana-               Jimmy Xiangji Huang, Irwin King, Vijay V.
tions have a stronger impact on participants preference sta-            Raghavan, and Stefan Rueger, editors, Web
bility than sentence-based explanations. Furthermore, the               Intelligence, pages 657–660. IEEE, 2010.
use of the keyword therefore indicating a conclusion drawn
from premises and an argumentative explanation style had
already a measurable impact on participants. Thus argu-
ments and fact-based explanations make users change their
minds about the item portfolio and can therefore be valu-