=Paper=
{{Paper
|id=None
|storemode=property
|title=An Empirical Study on the Persuasiveness of Fact-based Explanations for Recommender Systems
|pdfUrl=https://ceur-ws.org/Vol-1253/paper6.pdf
|volume=Vol-1253
}}
==An Empirical Study on the Persuasiveness of Fact-based Explanations for Recommender Systems==
An empirical study on the persuasiveness of fact-based explanations for recommender systems Markus Zanker Martin Schoberegger Alpen-Adria-Universitaet Klagenfurt Alpen-Adria-Universitaet Klagenfurt 9020 Klagenfurt, Austria 9020 Klagenfurt, Austria mzanker@acm.org m3schobe@edu.uni-klu.ac.at ABSTRACT recommendation list and promote objectives such as users’ Recommender Systems (RS) help users to orientate them- trust in the system and confidence in decision making. In selves in large product assortments and provide decision sup- the domain of expert systems explanations have already a port. Explanations help recommender systems to enhance long tradition, where formal argumentation traces can serve their impact on users by, for instance, justifying made rec- as explanations that justify the output of a system [8]. Ac- ommendations. Arguments provide reason in a more struc- cording to [5] an argument is (a) a series of sentences, state- tured way, by denoting a conclusion that follows from one ments, or propositions (b) where some are premises (c) and or more premises. While expert systems’ explanation have one is the conclusion (d) where the premises are intended a long tradition in using argumentative patterns, argumen- to give a reason for the conclusion. As we believe that re- tative explanations for recommendations have not yet been search on explanations in general and comparative studies systematically researched. This paper compares therefore on competing explanation styles are rare (a few pointers to the persuasion potential of different explanation styles (sen- more recent exceptions [7, 6, 3]), we conducted a supervised tences, facts or argument style) by comparing the robustness lab study that had the purpose to research the impact of of subjects’ preferences when employing an additive utility different explanation styles of knowledgeable explanations model from conjoint analysis. [11]. In particular we are interested in effects on the robust- ness of users’ preferences when confronted with additional explanations, i.e. exploring the persuasion potential of ex- Keywords planations. More concretely we compared fact-based expla- Recommender Systems, Explanation styles, Persuasion po- nations, that presented keywords as explanations to users, tential of explanations such as A, B, C, with a basic argument style with A and B as premises and C as a consequent, i.e. A, B therefore C. 1. INTRODUCTION Furthermore, we compared these fact-based explanations to sentence-based explanations requiring more cognitive effort Recommender Systems (RS) support online customers in to understand them. We selected three different item do- their decision making and should help them to avoid poor mains that typically trigger high involvement of users, i.e. decisions [4]. Persuasive systems [9] are focusing on chang- hiking routes from the tourism and leisure domain (hiking ing a user’s belief or actions in an intended way. In this routes), energy plans and mobile phone plans, and controlled context recommender systems need to be also seen as per- for user preferences, item portfolio and the semantics of the suasive systems, as their purpose lies in pointing users to- explanations themselves. We would like to note that the wards unknown items that presumably match their interest, study was conducted in the scope of the O-STAR project i.e. making serendipitous propositions. This clearly differen- that researches techniques for personalized route planning tiates a recommendation system (RS) from an information for hikers in alpine regions. Next we will provide details on retrieval (IR) system that assumes an objective information our study design and finally discuss results and conclusions. need of a user that can satisfied. In general explanations can be seen as an attempt to fit a particular phenomenon into a general pattern in order to increase understanding and re- 2. STUDY DESIGN move bewilderment or surprise [5]. In the context of product We researched the question if the introduction of an argument- recommendation scenarios explanations can be seen as ad- based writing style, i.e. use of the keyword therefore to ditional information about recommendations [2] that serves denote the conclusion of the preceding premises, has an im- the purpose of justifying why a specific item is part of a pact on the robustness of users’ preferences in face of addi- tional explanations. As already mentioned we asked users to disclose their preferences for three different item domains Permission to make digital or hard copies of all or part of this work for (hiking routes, mobile phone plans and energy plans) in a personal or classroom use is granted without fee provided that copies are supervised offline questionnaire. Figures 1 and 2 depict two not made or distributed for profit or commercial advantage and that copies exemplary items from the hiking domain. Subjects were in- bear this notice and the full citation on the first page. To copy otherwise, to vited to participate in a seminar room, where they had to republish, to post on servers or to redistribute to lists, requires prior specific answer a paper & pencil survey with two parts. The first IntRS 2014, permission October and/or a fee. 6, 2014, Silicon Valley, CA, USA. part included for each of the three domains exactly 6 items, Copyright Workshop 2014 IntRS by the at RecSys ’14author(s). Foster City, CA USA that are described by either 4 or 5 characteristics. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. Hiking routes Distance in km Altitude in m Level of difficulty easy or demanding Physical fitness (not) required Possibility for meal on route yes/no Energy plans Renewable energy 100%/no Pricing dynamic vs. fixed Fixed contract duration in months Guaranteed price yes/no Mobile phone plans Basic fee in EUR Type of phone Smartphone vs. simple phone Anytime minutes amount Fixed contract duration in months Figure 2: Excerpt from questionnaire - part 2 Table 1: Attributes describing item domains Figure 3: Big picture of research design (solely fact-based, argumentative facts and argumentative sentences). Explanation style is permuted within subjects, i.e. participants are confronted with all three explanation Figure 1: Excerpt from questionnaire - part 1 styles for a different item domain and in different orders, while the combination of item domain and explanation style is varied between subjects. For each item exactly two ar- Table 1 depicts the three item domains and the artificial guments, each with two premises and one conclusion, are design space of the item portfolios. To avoid confusion the added as additional information (see examples in Table 2). semantics of the domain attributes were defined in a sidebar See Figure 2 for a depiction of two exemplary items from (e.g. Smartphone: denotes a device in the range of HTC the hiking domain with explanations following the style of Desire X or Nokia Lumia 625). Participants had to rank the argumentative facts. 6 options according to their general preference with respect Finally, the questionnaire controlled for demographic char- to the particular item domain. After disclosing their pref- acteristics and checked if participants noticed the interven- erences in the first part of the questionnaire (see Figure 1 tion, i.e. one question asked what was relevant for rank- for a translated excerpt of the questionnaire) users had to ing the items with multiple answering options. For analysis solve a picture puzzle, where 10 different errors were hidden. we selected only participants that considered the additional The purpose of this task is twofold: first, it distracts users explanations provided in the second part in their ranking from their thoughts on the ranking tasks and, second, we decision. could use the numerical measure of correctly marked errors In Figure 3 we sketch the big picture of the study design. to assess how concentrated participants followed the ques- Thus, participants rank sets of items from three different tionnaire. Once participants had finished the first part they domains twice, where item sets in the first and second part handed it in and received the second part of the survey. This of the questionnaire do not overlap. Due to measuring user way we were able to avoid that participants could have taken preferences twice for each domain (without and with inter- a look on their first-round ranking when answering the sec- vention of a specific explanation style), we can control for ond part. In the second part participants had again to the participants’ preferences on item sets and their presenta- rank sets of five items from the three item domains. How- tion. We employ an additive model from conjoint analysis, ever, in addition to the item characteristics already used in that allows us to estimate the utilities for each item char- the first-round, additional explanations were given for each acteristic [1], i.e. the ∑ overall utility of an item yi is com- item. The explanation style acts as the manipulated variable puted as the sum µ + Z βZ , where µ is a basic utility and Hiking routes the picture puzzle. Solely facts low altitude 2. We asked participants what they considered to be rele- easy distance vant for making their decisions on the rankings. Based very family-friendly on the answers to this multiple choice question we in- Argumentative facts low altitude cluded only respondents who had noticed the addi- easy distance tional information (explanations) and excluded all re- therefore very family-friendly spondents who answered that they relied on their gut Argumentative sentences This route is of low altitude feelings. and easy distance, therefore 3. We also asked participants how they experienced this it is very family-friendly. survey with the answering options interesting, chal- Energy plans lenging, boring, unclear and useless. For further con- Solely facts 100% renewable energy sideration we only kept respondents that answered chal- low environmental impact lenging and were thus captivated by the ranking tasks. high sustainability We assumed that the option interesting is a polite way of saying boring or useless. Argumentative facts 100% renewable energy low environmental impact 4. Finally we cleaned records from the dataset, where the therefore high sustainability estimation of individual utilities for product charac- Argumentative sentences This energy plan offers 100% teristics was not reliable, i.e. rank correlation between renewable energy with a low the a priori rankings based on estimated utility weights environmental impact, therefore and the actual a priori ranking of participants had to its sustainability is high. be above 0.7. Mobile phone plans After applying this extremely restrictive selection procedure Solely facts low basic fee we derived at the following size of the dataset (see Table many anytime minutes 3). In order to check for the robustness of preferences af- ideal for heavy use Argumentative facts low monthly basic fee Hiking Energy Mobile many anytime minutes therefore ideal for heavy use Solely facts 10 12 7 Argumentative sentences This mobile phone plan Argumentative facts 6 12 13 offers a low monthly basic fee Argumentative sentences 10 5 8 with many anytime minutes, therefore it is ideal for Table 3: Respondents per domain and expl. style heavy use. ter introducing argument-based explanations we compute Table 2: Example explanations/arguments for each Spearman rank correlation coefficient between the a priori of the three item domains rankings based on estimated utility weights and the empir- ical rankings by participants. Table 4 reports the averaged Spearman’s ρ for each explanation style and aggregated over βZ denotes the positive or negative utility contributed by domains. As can be seen from Table 4 the argumentation- a specific item characteristic Z (for instance, the possibility to have your meal on route in the hiking domain). Having Explanation style Rank correlation estimated the individual utilities of each item characteristic we computed an a priori ranking for the unseen item sets Solely facts 0.43 in the survey’s second part that is then compared with the Argumentative facts 0.36 observed ranks for each user. Argumentative sentences 0.67 3. RESULTS AND DISCUSSION Table 4: Robustness of preferences in face of differ- In total 136 subjects, mostly students from Alpen-Adria- ent explanation styles Universtät Klagenfurt, participated in our survey. From each participant we received three rankings in the second styled facts that included the keyword therefore to denote part of the survey (one for each domain), i.e. a total of a conclusion reduced the robustness of participants’ prefer- 408 computed rank correlations before cleaning. More than ences more than the pure fact-based explanations, i.e. sup- 80% of all participants were young people aged between 18 porting our hypothesis that an argumentative explanation and 25. Two thirds of our participants were females. All style would influence users more. Argumentative sentences respondents had a high-school degree and a few of them had preserved user preferences more than the fact-based explana- already a graduation degree from a university. Before analy- tion styles. Obviously, sentences need more cognitive effort sis we rigorously excluded participants whose answers might from users to be understood and the effect of the keyword be unreliable due to the following criteria: therefore was seemingly lost in the sentence structure. The difference between Spearman’s ρ in all three categories is 1. Only respondents who demonstrated a thorough atti- statistically significant according to Kruskall-Wallis test (p tude by identifying at least 50% of all hidden errors in = 0.037). In addition we checked for interaction effects between ex- able features of recommender systems. Limitations or pos- planation style and product domain. As can be seen from sible lines of future research include varying the complexity Table 5 fact-based explanation styles lead to less robust of arguments (i.e the number of premises) or its number as preferences than sentence-based explanation styles. Further- well as additional item domains. more, argumentative facts seem to reduce participant’s ro- bustness of preferences even more than a pure facts based Acknowledgements explanation style. The only exception is the hiking domain, where the order between facts and argumentative facts is Authors acknowledge the financial support from the Eu- inverted. However, in this product domain preference ro- ropean Union (EU), the European Regional Development bustness is generally lower and it might have been harder Fund (ERDF), the Austrian Federal Government and the for respondents to determine own preferences in the hiking State of Carinthia in the Interreg IV Italien-Österreich pro- domain than in the other two domains. gramme (project acronym O-STAR). Hiking Energy Mobile 5. REFERENCES [1] Klaus Backhaus, Bernd Erichson, Wulff Plinke, and Solely facts 0.27 0.48 0.58 Rolf Weiber. Multivariate Analysemethoden: Eine Argumentative facts 0.38 0.34 0.38 anwendungsorientierte Einführung. Springer, Berlin, Argumentative sentences 0.58 0.78 0.71 12., vollständig überarbeitete auflage. edition, 2008. Table 5: Spearman’s ρ per domain and expl. style [2] Gerhard Friedrich and Markus Zanker. A taxonomy for generating explanations in recommender systems. AI Magazine, 32(3):90–98, 2011. This study therefore showed, that fact-based explanations [3] Fatih Gedikli, Dietmar Jannach, and Mouzhi Ge. How and an argumentative explanation style impacted partici- should i explain? a comparison of different pants’ preferences stronger than full sentence explanations. explanation types for recommender systems. Int. J. Objections against these conclusions might be the lack of Hum.-Comput. Stud., 72(4):367–382, 2014. a control group and the paper & pencil design without a [4] Dietmar Jannach, Markus Zanker, Alexander real recommendation situation. A control group would al- Felfernig, and Gerhard Friedrich. Recommender low us to estimate the natural stability of preferences be- Systems: An Introduction. Cambridge Univ Pr, 2010. tween both rounds and without any intervention. However, in this study we were not interested in absolute rank corre- [5] W. Sinnott-Armstrong and R. Fogelin. Cengage lation measures, but only in the comparison of robustness Advantage Books: Understanding Arguments. of respondents’ preferences between different conditions and Wadsworth, 2014. assumed that some natural instability would affect all expla- [6] Nava Tintarev and Judith Masthoff. Evaluating the nation styles the same way. In order to assess the impact effectiveness of explanations for recommender systems. of an argumentative explanation style we wanted to control User Modeling and User-Adapted Interaction, for other effects and biases as good as possible. The super- 22(4-5):399–439, 2012. vised paper & pencil approach allowed us to control for user [7] Jesse Vig, Shilad Sen, and John Riedl. Tagsplanations: preferences, the item portfolio and the persuasiveness of the Explaining recommendations using tags. In explanation content itself as well as insisting on a high reli- Proceedings of the 14th International Conference on ability of the measurements by excluding participants, who Intelligent User Interfaces, IUI ’09, pages 47–56, New made arbitrary rankings or did not notice the additional ex- York, NY, USA, 2009. ACM. planations. In a previous study [10] we already compared [8] L. R. Ye and P. E. Johnson. The impact of the sentence-based explanations with a no-explanations con- explanation facilities in user acceptance of expert trol group and observed their positive impact on the percep- system advice. MIS Quarterly, 19(2):157–172, 1995. tion of the recommender system as a whole. However, one [9] Kyung Hyan Yoo, Ulrike Gretzel, and Markus Zanker. could not isolate the impact on the robustness of prefer- Persuasive Recommender Systems - Conceptual ences by controlling for the different recommendation lists, Background and Implications. Springer Briefs in the different explanation content that would apply to dif- Electrical and Computer Engineering. Springer, 2013. ferent recommendations or the differing appreciation of the [10] Markus Zanker. The influence of knowledgeable recommendation results themselves by participants. explanations on users’ perception of a recommender system. In Padraig Cunningham, Neil J. Hurley, Ido 4. CONCLUSIONS Guy, and Sarabjot Singh Anand, editors, RecSys, This short paper presented an innovative study design for pages 269–272. ACM, 2012. measuring the impact of different explanation styles on par- [11] Markus Zanker and Daniel Ninaus. Knowledgeable ticipants’ robustness of preferences in face of additional ex- explanations for recommender systems. In planations. The results indicate that fact-based explana- Jimmy Xiangji Huang, Irwin King, Vijay V. tions have a stronger impact on participants preference sta- Raghavan, and Stefan Rueger, editors, Web bility than sentence-based explanations. Furthermore, the Intelligence, pages 657–660. IEEE, 2010. use of the keyword therefore indicating a conclusion drawn from premises and an argumentative explanation style had already a measurable impact on participants. Thus argu- ments and fact-based explanations make users change their minds about the item portfolio and can therefore be valu-