If you liked Herlocker et al.’s explanations paper, then you might like this paper too Derek Bridge Kevin Dunleavy Insight Centre for Data Analytics School of Computer Science and IT University College Cork, Ireland University College Cork, Ireland derek.bridge@insight-centre.org kevdunleavy@gmail.com ABSTRACT We present explanation rules, which provide explanations of user-based collaborative recommendations but in a form that is familiar from item-based collaborative recommenda- tions; for example, “People who liked Toy Story also like Figure 1: An explanation rule Finding Nemo”. We present an algorithm for computing ex- planation rules. We report the results of a web-based user trial that gives a preliminary evaluation of the perceived ef- fectiveness of explanation rules. In particular, we find that The problem that we examine in this paper is how to nearly 50% of participants found this style of explanation to produce effective explanations of user-based collaborative be helpful, and nearly 80% of participants who expressed a recommendations. It is relatively easy to explain the rec- preference found explanation rules to be more helpful than ommendations of content-based recommenders, e.g. by dis- similar rules that were closely-related but partly-random. playing meta-descriptions (such as features or tags) that the active user’s profile and the recommended item have in com- Categories and Subject Descriptors mon [10, 13]. Item-based collaborative recommendations are H.3.3 [Information Storage and Retrieval]: Information also amenable to explanation, e.g. by displaying items in the Search and Retrieval—Information Filtering user’s profile that are similar to the recommended item [8, 6]. User-based collaborative recommendations, on the other General Terms hand, are harder to explain. Displaying the identities of the active user’s neighbours is unlikely to be effective, since Algorithms, Experimentation, Human Factors the user will in general not know the neighbours; displaying their profiles is unlikely to be effective, since even the parts Keywords of their profiles they have in common with the active user Recommender Systems, Explanations will be too large to be readily comprehended. It is possible to explain a recommendation using data other than that which the recommender used to generate 1. INTRODUCTION the recommendation [2]. For example, a system could ex- An explanation of a recommendation is any content, ad- plain a user-based collaborative recommendation using the ditional to the recommendation itself, that is presented to kind of data that a content-based recommender uses (fea- the user with one or more of the following goals: to reveal tures and tags), e.g. [9]. In our work, however, we try to how the system works (transparency), to reveal the data it preserve a greater degree of fidelity between the explana- has used (scrutability), to increase confidence in the system tion and the operation of the recommender. Specifically, we (trust), to convince the user to accept the recommendation generate the explanation from co-rated items on which the (persuasion), to help the user make a good decision (effec- active user and her nearest-neighbour agree. tiveness), to help the user make a decision more quickly We propose an algorithm for making item-based expla- (efficiency), or to increase enjoyment in use of the system nations, also referred to as influence-style explanations [1]; (satisfaction) [11, 14]. The focus in this paper is effective- for example, “People who liked Toy Story also like Find- ness: explanations that help users to decide which item to ing Nemo”. This style of explanation is familiar to users of consume. amazon.com [6], for example. These are the kind of explana- tion most commonly produced by item-based collaborative Permission to make digital or hard copies of all or part of this work for recommenders. But we will show how to produce them in personal or classroom use is granted without fee provided that copies are the case of user-based collaborative recommenders. The al- not made or distributed for profit or commercial advantage and that copies gorithm is adapted from one recently proposed to explain bear this notice and the full citation on the first page. To copy otherwise, to case-based classifiers [7]. It produces explanations in the republish, to post on servers or to redistribute to lists, requires prior specific form of explanation rules. The antecedent of an explanation permission and/or a fee. rule characterizes a subset of the active user’s tastes that IntRS 2014, October IntRS Workshop: 6, 2014, Interfaces Silicon and Human Valley,Making Decision CA, USA. for Recom- are predictive of the recommended item, which appears in Copyright 2014 by the author(s). mender Systems at RecSys’14 October 6–10, Foster City, CA, USA Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. the consequent of the rule; see the example in Figure 1. Alien Brazil Crash Dumbo E.T. Fargo Input: user profiles U , recommended item i, active Ann 2 4 1 2 4 user u, explanation partner v Bob 5 4 1 5 Output: an explanation rule for i R ← if then i; Table 1: A ratings matrix Cs ← candidates(u, v); while accuracy(R) < 100 ∧ Cs 6= { } do Rs ← the set of all new rules formed by adding 2. EXPLANATION ALGORITHM singly each candidate condition in Cs to the We use a conventional user-based collaborative recom- antecedent of R; mender of the kind described in [4]. Like theirs, our rec- R∗ ← most accurate rule in Rs, using rule coverage ommender finds the active user’s 50 nearest neighbours us- to break ties between equally accurate rules; ing significance-weighted Pearson correlation; for each item if accuracy(R∗ ) ≤ accuracy(R) then that the neighbours have rated but the active user has not, it return R; predicts a rating as the similarity-weighted average of devia- R ← R∗ ; tions of neighbours’ ratings from their means; it recommends Remove from Cs the candidate condition that was the items with the highest predicted ratings. used to create R; Before presenting the explanation algorithm, we define some terms: return R; Algorithm 1: Creating an explanation rule Explanation partner: The explanation partner is the mem- ber of the set of nearest neighbours who is most similar to the active user and who likes the recommended item. Rule coverage: A rule covers a user if and only if the rule Often this will be the user who is most similar to the antecedent is satisfied by the user’s profile. For exam- active user — but not always. In some cases, the most ple, the rule in Figure 1 covers any user u whose profile similar user may not have liked the recommended item: contains ratings ru,TheShining > 3 and ru,Frequency > 3, the recommendation may be due to the votes of other irrespective of what else it contains. Rule coverage is neighbours. In these cases, one of these other neigh- then the percentage of users that the rule covers. bours will be the explanation partner. It may appear that recommendations exploit the opinions of a set of Rule accuracy: A rule is accurate for a user if and only if neighbours (for accuracy), but explanations exploit the the rule covers the user and the rule consequent is also opinions of just one of these neighbours, the explana- satisfied by the user’s profile. For example, the rule tion partner. But this is not completely true. As we in Figure 1 is accurate for any user u whose profile will explain below, the items included in the explana- additionally contains ru,TheSilenceoftheLambs > 3. Rule tion are always members of the explanation partner’s accuracy is then the percentage of covered users other profile, but they are also validated by looking at the than the active user for whom the rule is accurate. opinions of all other users (see the notions of coverage The algorithm for building an explanation rule works in- and accuracy below). crementally and in a greedy fashion; see Algorithm 1 for pseudocode. Initially, the rule has an empty antecedent, Candidate explanation conditions: Let u be the active and a consequent that contains the recommended item i, user and v be the explanation partner; let j be a co- written as ‘if then i’ in Algorithm 1. On each iteration, rated item; and let ruj and rvj be their ratings for j. the antecedent is refined by conjoining one of the candidate We define candidate explanation conditions as co-rated explanation conditions, specifically the one that leads to the items j on which the two users agree. most accurate new rule, resolving ties in favour of coverage. This continues until either the rule is 100% accurate or no In the case of numeric ratings, we do not insist on rat- candidate explanation conditions remain. ing equality for there to be agreement. Rather, we define agreement in terms of liking, indifference and disliking. For a 5-point rating scale, the candidate ex- 3. EXPERIMENTS planation conditions would be defined as follows: We tested three hypotheses, the first using an offline ex- periment, the other two using a web-based user trial. candidates(u, v) = {likes(j) : ruj > 3 ∧ rvj > 3} ∪ 3.1 Practicability of explanation rules {indiff(j) : ruj = 3 ∧ rvj = 3} ∪ The number of candidate explanation conditions can be quite large. If explanation rules are to be practicable, then {dislikes(j) : ruj < 3 ∧ rvj < 3} the number of conditions that the algorithm includes in the antecedent of each explanation rule needs to be quite small. For example, the candidate explanation conditions for users Ann and Bob in Table 1 are Hypothesis 1: that explanation rules will be short enough to be practicable. {likes(Brazil ), dislikes(Dumbo), likes(Fargo)} We ran the user-based collaborative recommender that we Alien does not appear in a candidate condition be- described at the start of the previous section on the Movie- cause Ann’s and Bob’s ratings for it disagree; Crash Lens 100k dataset, and obtained its top recommendation for and E.T. do not appear in candidate conditions be- each user in the dataset. We then ran the explanation al- cause neither of them is co-rated by Ann and Bob. gorithm to produce an explanation rule that would explain Figure 3: A redacted explanation rule Figure 2: Rule length the recommended item to that user. In Figure 2, we plot the number of candidate explanation conditions (vertical axis) against the number of these conditions that the algorithm Figure 4: A redacted explanation in the style of [5] includes in the rule (horizontal axis). From the Figure, we see that the longest rules contained only three items in their antecedents. Not only that, but wording differs from that used by [5]. They asked how likely actually only 4% of the rules had three items in their an- the user would be to go and see the movie, with answers on tecedents; the other 96% were split nearly evenly between a 7-point scale. Our wording focuses on explanation effec- those having one and those having two items. We also see tiveness (helpfulness in making a decision), whereas theirs that the more candidates there are, the shorter the expla- focuses on persuasiveness.2 nation rule tends to be. We have not investigated the exact To encourage participants to focus on explanation style, reasons for this. we followed [5] in redacting the identity of the recommended We repeated this experiment using a dataset with unary movie. A participant’s feedback is then not a function of the ratings to see what difference this might make. We took quality of the recommendation itself. For the same reasons, a LastFM dataset that contains artist play counts for 360 we obscured the identities of the movies in the antecedent thousand users and 190 thousand artists.1 . We converted of the explanation rule; see the example in Figure 3. play counts to unary ratings, i.e. recording 1 if and only if To obtain a ‘yardstick’, we also showed participants an- a user has played something by an artist. The results were other explanation and asked them whether it too was help- very similar to those in Figure 2 (which is why we do not ful. For this purpose, we used the most persuasive expla- show them here), again with no rule having more than three nation style from [5]. This explanation takes the form of items in its antecedent. a histogram that summarizes the opinions of the nearest These are encouraging results for the practicability of ex- neighbours. Figure 4 contains an example of this style of planation rules. explanation (again with the recommended item redacted). In the experiment, the software randomly decides the or- 3.2 Effectiveness of this style of explanation der in which it shows the two explanation styles. Approxi- We designed a web-based user trial, partly inspired by the mately 50% of participants see and rate the explanation rule experiment reported in [5], drawing data from the Movie- before seeing and rating the histogram, and the remainder Lens 1M dataset. Trial participants visited a web site where see and rate them in the opposite order. they progressed through a series of web pages, answering Prior to asking them to rate either style of explanation, just three questions. An initial page established a context, users saw a web page that told them that we had obscured essentially identical to the one in [5]: the movie titles, and we showed them an explicit example of a redacted movie title. We conducted a pilot run of the Imagine you want to go to the cinema but only experiment with a handful of users before launching the real if there is a movie worth seeing. You use an on- experiment. Participants in the pilot run did not report and line movie recommender to help you decide. The difficulty in understanding the redacted movie titles or the movie recommender recommends one movie and redacted explanation rules. provides an explanation. We had 264 participants who completed all parts of the First, we sought to elicit the perceived effectiveness of this experiment. We did not collect demographic data about style of explanation with the following hypothesis: the participants but, since they were reached through our Hypothesis 2: that users would find explanation rules to own contact lists, the majority will be undergraduate and be an effective style of explanation. postgraduate students in Irish universities. Figure 5 shows how the participants rated explanation We showed participants an explanation rule for a recom- rules for helpfulness. Encouragingly, nearly 50% of partici- mendation and we asked them to rate its helpfulness on a pants found explanation rules to be a helpful or very helpful 5-point scale. Specifically, we asked “Would this style of ex- style of explanation (100 and 16 participants out of the 264, planation help you make a decision?” with options Very un- 2 helpful, Unhelpful, Neutral, Helpful, and Very helpful. Our This is an observation made by Joseph A. Konstan in lecture 4-4 of the Coursera course Introduction to Recom- 1 mtg.upf.edu/node/1671 mender Systems, www.coursera.org. a promising style of explanation: many users perceive them to be a helpful style of explanation, and they are therefore deserving of further study in a more realistic setting. We note as a final comment in this subsection that the ex- periment reported in [1], which uses a very different method- ology and no redaction of movie titles, found item-based ex- planations (there referred to as influence style explanations) to be better than neighbourhood style explanations. 3.3 Effectiveness of the selection mechanism Next, we sought to elicit the perceived effectiveness of our algorithm’s way of building explanation rules: Hypothesis 3: that users would find the algorithm’s selec- Figure 5: Helpfulness of redacted explanation rules tion of conditions in the antecedents of the rules (based on accuracy and coverage) to be better than random. In the same web-based user trial, we showed the partic- ipants two rules side-by-side (the ordering again being de- termined at random). One rule was constructed by Algo- rithm 1. The other rule was constructed so as to have the same number of conditions in its antecedent, but these were selected at random from among the candidate explanation conditions. Note they are not wholly random: they are still candidate explanation conditions (hence they are co-rated items on which the user and explanation partner agree) but they are not selected using accuracy and coverage. We asked participants to compare the two rules. They selected one of four options: the first rule was more helpful than the second; the second was more helpful than the first; Figure 6: Helpfulness of redacted histograms the two rules were equally helpful; and they were unable to tell which was the more helpful (“don’t know”). There was no redaction in this part of the experiment. It resp.); but about a quarter of participants found them neu- was important that participants judged whether the movie tral (69 participants), and a quarter found them unhelpful preferences described in the antecedents of the rules did sup- or very unhelpful (52 and 17, resp.). Figure 6 shows the port the recommended movie. Prior to asking users to rate same for the other style of explanation. Just over 70% of the two explanation rules, users saw a web page that told participants found this style of explanation to be helpful or them: that they would see a recommendation; that they very helpful (158 and 31 participants, resp.). should pretend that the recommended movie was one that Note that we did not ask participants to compare the two they would like; that they would see two explanations; that styles of explanation. They are not in competition. It is movie titles would no longer be obscured; and that they conceivable that a real recommender would use both, either should compare the two explanations for helpfulness. There side-by-side or showing one of the two explanations by de- are, of course, the risks that measuring effectiveness before fault and only showing the other to users who click through consumption like this may result in judgements that overlap to a more detailed explanation page. with persuasiveness, and that measuring perceived effective- Furthermore, as the reader can judge by comparing Fig- ness is not as reliable as measuring something more objective ures 3 and 4, any direct comparison of the results is unfair [12]. to the explanation rules since they have two levels of redac- Figure 7 shows the outcomes of this part of the experi- tion (the recommended movie and the antecedents in the ment. We see that 32% found the explanation rule to be rules) whereas the histogram has just one (the recommended more helpful (85 participants) and only 10% (27 partici- movie). As far as we can tell, there is no explanation style pants) found the partly-random rules to be more helpful. in [5] that would give comparable levels of redaction for a This means that, of those who expressed a preference (85 fair experiment. plus 27 participants), 76% preferred the explanation rules For some readers, this may raise the question of why we and only 24% preferred the partly-random rules. Further- showed participants the redacted histograms at all. The more, a two-tailed z-test shows the difference to be signifi- reason is to give a ‘yardstick’. If we simply reported that cant at the 0.01 level. This suggests that the algorithm does nearly 50% of participants found explanation rules to be select candidate explanation conditions in a meaningful way. helpful or very helpful, readers would not know whether this However, 36% of participants found the rules to be equally was a good outcome or not. helpful and 22% could not make a decision (95 and 57 par- From the results, we cannot confidently conclude that the ticipants resp.). This means, for example, that (again using hypothesis holds: results are not in the same ball-park as the ‘yardstick’.3 But we can conclude that explanation rules are redacted explanation rules is 3.21 (st.dev. 1.03), the mean rating for the redacted histograms is 3.66 (st.dev. 0.94); and, 3 For readers who insist on a comparison: using Very Un- using Welch’s t-test, we reject at the 0.01 level the null hy- helpful = 1, Unhelpful = 2, etc., the mean rating for the pothesis that there is no difference in the means. corresponding explanation rules, thus giving us much more equivocal results. 4. CONCLUSIONS We have presented an algorithm for building explanation rules, which are item-based explanations for user-based col- laborative recommendations. We ran an offline experiment and web-based user trial to test three hypotheses. We con- clude that explanation rules are a practicable form of expla- nation: on two datasets no rule antecedent ever contained more than three conditions. We conclude that explanation rules offer a promising style of explanation: nearly 50% of participants found them to be helpful or very helpful, but the Figure 7: Helpfulness of explanation rules compared amount of redaction used in the experiment makes it hard with partly-random rules to make firm conclusions about their effectiveness. Finally, we conclude that users do find the algorithm’s selection of Explanation rule Partly-random rule conditions for the rule antecedent to be better than random: Accuracy Coverage Accuracy Coverage just under 80% of participants who expressed a preference 91% 2% 56% 15% preferred the explanation rule to a partly-random variant. 83% 1% 68% 4% But results here are also partly confounded by the conditions 76% 11% 42% 33% of the experiment, where a participant has to put herself ‘in 25% 3% 20% 13% the shoes’ of another user. Given the caveats about the limitations of the experi- Table 2: Accuracy and coverage of pairs of rules ments, our main conclusion is that explanation rules are promising enough that we should evaluate them further, per- haps in a comparative experiment such as the one reported a two-tailed z-test), there is no significant difference between in [3] or in A/B experiments in a real recommender. the proportion who found explanation rules to be more help- ful and the proportion who found the two rules to be equally helpful. 5. ACKNOWLEDGMENTS There are at least two reasons for this. The first is that This publication has emanated from research supported the participant is required to put herself ‘in the shoes’ of in part by a research grant from Science Foundation Ireland another user. The recommendation and the rules are com- (SFI) under Grant Number SFI/12/RC/2289. We are grate- puted for a user in the MovieLens dataset, not for the person ful to Barry Smyth for discussions about the design of our who is completing the experiment, who must pretend that experiment. she likes the recommendation. The person who completes the experiment may not know much, if anything, about the 6. REFERENCES movies mentioned in the rules. This may be why the “don’t know” option was selected so often.4 The alternative was [1] M. Bilgic and R. Mooney. Explaining to require participants in the experiment to register with recommendations: Satisfaction vs. promotion. In the recommender and to rate enough movies that it would Procs. of Beyond Personalization 2005: A Workshop be able to make genuine recommendations and build real- on the Next Stage of Recommender Systems Research istic explanation rules. We felt that this placed too great at the 2005 International Conference on Intelligent a burden on the participants, and would likely result in an User Interfaces, 2005. experiment skewed towards users with relatively few ratings. [2] G. Friedrich and M. Zanker. A taxonomy for The second reason is that the partly-random rules are generating explanations in recommender systems. AI still quite good rules: they are considerably more meaning- Magazine, 32(3):90–98, 2011. ful than wholly-random rules. As Table 2 shows, one of [3] F. Gedikli, D. Jannach, and M. Ge. How should I the partly-random rules used in the experiment is nearly as explain? A comparison of different explanation types accurate as its corresponding explanation rule. The partly- for recommender systems. Int. J. Hum.-Comput. random rules also have high coverage because randomly se- Stud., 72(4):367–382, 2014. lected movies are often popular movies. In our pilot run of [4] J. L. Herlocker, J. A. Konstan, A. Borchers, and the experiment, we had tried wholly-random rules, but they J. Riedl. An algorithmic framework for performing were so egregiously worse than their corresponding expla- collaborative filtering. In F. Gey et al., editors, Procs. nation rules that we felt that using them would prejudice of the 22nd Annual International ACM SIGIR the results of the real experiment. Ironically, the partly- Conference on Research and Development in random rules that we use instead perhaps include too many Information Retrieval, pages 230–237. ACM Press, movies that are reasonable substitutes for the ones in their 1999. 4 [5] J. L. Herlocker, J. A. Konstan, and J. Riedl. An on-screen note told the participant that she was able to Explaining collaborative filtering recommendations. In click on any title to get some information about the movie. If she did, we fetched and displayed IMDb genres and a one- W. Kellogg and S. Whittaker, editors, Procs. of the line synopsis for the movie. But we did not record how many ACM Conference on Computer Supported Cooperative users exploited this feature. Work, pages 241–250. ACM Press, 2000. [6] G. Linden, B. Smith, and J. York. Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 7(1):76–80, 2003. [7] D. McSherry. A lazy learning approach to explaining case-based reasoning solutions. In B. Dı́az-Agudo and I. Watson, editors, Procs. of the 20th International Conference on Case-Based Reasoning, LNCS 7466, pages 241–254. Springer, 2012. [8] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. In Procs. of the 10th International Conference on World Wide Web, pages 285–295, 2001. [9] P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos. MoviExplain: A recommender system with explanations. In Procs. of the Third ACM Conference on Recommender Systems, pages 317–320, 2009. [10] N. Tintarev. Explanations of recommendations. In Procs. of the First ACM Conference on Recommender Systems, pages 203–206, 2007. [11] N. Tintarev and J. Masthoff. Designing and evaluating explanations for recommender systems. In F. Ricci et al., editors, Recommender Systems Handbook, pages 479–510. Springer, 2011. [12] N. Tintarev and J. Masthoff. Evaluating the effectiveness of explanations for recommender systems. User Modeling and User-Adapted Interaction, 22(4–5):399–439, 2012. [13] J. Vig, S. Sen, and J. Riedl. Tagsplanations: Explaining recommendations using tags. In Procs. of the 14th International Conference on Intelligent User Interfaces, pages 47–56, 2009. [14] M. Zanker. The influence of knowledgeable explanations on users’ perception of a recommender system. In Procs. of the Sixth ACM Conference on Recommender Systems, pages 269–272, 2012.