INTRODUCTION

Explanation Chains: Recommendation by Explanation

Arpit Rana

arpit.rana@insight-centre.org 1

RecSys '17 Poster Proceedings, Como, Italy

Derek Bridge

derek.bridge@insight-centre.org 1

Explanation, Fidelity, Interpretability

0 2017. 1 Insight Centre for Data Analytics, University College Cork , Ireland

2017

Given a set of candidate items, Recommendation by Explanation constructs a justification for recommending each item, in the form of what we call an Explanation Chain, and then recommends those candidates that have the best explanations. By unifying recommendation and explanation, this approach enables us to find relevant recommendations with explanations that have a high degree of both fidelity and interpretability. Experimental results on a movie recommendation dataset show that our approach also provides sets of recommendations that have a high degree of serendipity, low popularity-bias and high diversity.

INTRODUCTION

Recommender systems provide explanations to help the end-user understand the rationale for a recommendation and to help him make a decision [ 2 ]. Conventionally, computing recommendations and generating corresponding explanations are considered as two separate, sequential processes. This afords the recommender the freedom to include in the explanation information diferent from that which it used to compute the recommendation [ 1 ]. For example, In [ 4 ], a recommendation generated by matrix factorization is explained using topic models mined from textual data associated with items. Such diferences are one cause of low fidelity between the recommender and its explanations.

In this paper, we seek to achieve a higher degree of fidelity between the explanations and the operation of the recommender system, without compromising the interpretability of the explanations and the quality of the recommendations. For this, we use what we call explanation chains. Figure 1 shows an example of an explanation chain in the movie domain. The last item in the diagram (in this case, The Notebook), which we do not regard as part of the chain, is the candidate for recommendation to the user, and will typically not already be in the user’s profile. The other items in the diagram (Big Fish, Pearl Harbour and The Illusionist) form the chain. They are drawn from positively-rated items in the user’s profile and are intended to support recommendation of the candidate item. Each movie is represented as a set of keywords. Pairs of successive items in a chain satisfy a local constraint in the form of a similarity threshold; additionally, each item in the ★★★★☆ Big Fish •romantic-rivalry •carnival •secret-mission •parachute •. . .

★★★★★ Pearl Harbor •fiancé-fianceerelationship •shooting •secret-mission •volunteer •u.s.-army •parachute •. . .

User’s Past Preferences

★★★★☆ The Illusionist •fiancé-fianceerelationship •shooting •secret-love •brokenengagement •star-crossed-lovers •. . .

The Notebook •star-crossed-lovers •secret-love •brokenengagement •volunteer •u.s.-army •romantic-rivalry •self-discovery •. . .

Candidate Item chain satisfies a global constraint in the form of a threshold on the level of coverage it contributes towards features of the candidate item. For example, Big Fish has the keywords: secret-mission and parachute in common with Pearl Harbour, as well as the keyword romantic-rivalry in common with The Notebook.

There is previous work in which there is a more intimate connection between recommendation and explanation, e.g. [ 3, 5 ]. In [ 3 ], for example, recommendations are re-ranked by the strength of their explanations, so that items with more compelling explanations are recommended first. However, these approaches still compute recommendations and explanations separately, which is what makes Recommendation by Explanation a unique development. 2

APPROACH

Recommendation by Explanation is a novel approach that unifies recommendation and explanation: it computes recommendations by generating and ranking corresponding personalized explanations in the form of explanation chains. Recommendation is modelled as a path-finding problem in the item-item similarity graph. Once a chain has been constructed for each candidate item, the top-n chains are selected iteratively based on their total coverage of the candidate item’s features and their dissimilarity to other chains in the top-n. We describe our approach in more detail in the following subsections. 2.1

Generating Explanation Chains Given a candidate item, Recommendation by Explanation works backwards to construct a chain: starting with the candidate item, it finds predecessors, greedily selects one, finds its predecessors, selects one; and so on. The predecessors of an item are all its neighbours in the item-item similarity graph that satisfy four conditions: (a) they are positively-rated members of the user’s profile; (b) they are not already in this chain; (c) their similarity to the item exceeds a similarity threshold; and (d) their reward (see below) exceeds a marginal gain threshold. When there are no predecessors, the chain is complete.

At each step in this process, the predecessor that gets selected is the one with the highest reward. The reward rwd(j, i, C ) of adding predecessor j to partial chain C that explains candidate item i is given by: rwd (j, i, C ) = ( fj \ covered(i, C )) ∩ fi

( fj \ covered(i, C )) ∩ fi + (1) Here fi denotes the features of item i and covered(i, C ) is the set of features of candidate i that are already covered by members of the chain C, i.e. covered(i, C ) = Sj′ ∈C fj′ ∩ fi . Then the first term in the definition of rwd(j, i, C ) measures j’s coverage (with respect to the size of fi ) of features of i that are not yet covered by the chain. The second term in the definition measures the same but with respect to the size of fj and therefore assures j’s fitness to explain the candidate by penalizing items that have high coverage simply by virtue of having more features. 2.2 Evaluating Explanation Chains After constructing a chain C for each candidate item i, we must select the top-n chains so that we can recommend n items to the user, along with their explanations. This is done iteratively based on a chain’s total coverage of the candidate item’s features and its dissimilarity to other chains already included in the top-n. Specifically, we score ⟨C, i⟩ relative to a list of all the items that appear in already-selected chains C∗ using the following:

Pj ∈C rwd(j, i, C )

C \ Sj′ ∈C∗ j ′ + score(⟨C, i⟩, C∗) =

|C | + 1 |C | + 1 Here, the first term is the sum of the rewards of the items in the chain with respect to its length including the candidate item i. The second term penalizes a chain if its members are also members of already-selected chains and hence encourages the final recommendation list to cover as many positively-rated items in the user’s profile as possible. In eefct, the latter reduces popularity-bias in the chains and diversifies the recommendation list. (Note that the second term is about coverage of items that appear in already-selected chains, not their features.) (2) 3 EXPERIMENTAL RESULTS We performed of-line experiments on the hetrec2011-movielens-2k dataset1 augmented by movie keywords from IMDb2. The dataset comprises 2113 users, 5410 movies and over half a million keywords. We represented each movie as a set of all of its keywords and measured the similarity between movies using Jaccard similarity. We compared our approach (r-by-e) with other recommenders that make use of the same keyword data (CB-7, CB-|C |) and a random recommender (RM). CB-7, is a classic content-based model with number of neighbours as 7, and CB-|C |, is a dynamic version of the 100 80 60 40 20 8.32 0

r-by-e

29.35 r-by-e 0.6

RM Precision & Coverage (%) 81.94

31.57 3.1

2.32

15.25 0.6 RM content-based system with number of neighbours as the length of the corresponding explanation chain. In r-by-e, explanation chains are generated with a similarity threshold of 0.05 and a marginal gain threshold of 0.17 which were set by a grid-search.

Experimental results are presented in Figure 2. It is clear that the proposed approach outperforms the other methods for precision while still achieving high levels of diversity, surprise and novelty.

4 CONCLUSIONS

Recommendation by Explanation unifies recommendation and explanation, providing high quality recommendations with corresponding explanations that have high fidelity and interpretability. In the future, we will carry out experiments using keyword weighting and filtering and experiments in which we lower the thresholds to see how this results in looser connections and longer chains in the expectation of even less obvious recommendations. We have also built a web-based recommender with which to evaluate our system with real users.

ACKNOWLEDGMENTS

This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289.

[1]

Behnoush

Abdollahi and

Olfa

Nasraoui . 2016 . Explainable Matrix Factorization for Collaborative Filtering . In Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee , 5 - 6 .

[2]

Mustafa

Bilgic and Raymond J Mooney . 2005 . Explaining recommendations: Satisfaction vs. promotion . In Beyond Personalization Workshop, IUI, Vol. 5 . 153.

[3]

Khalil

Muhammad , Aonghus Lawlor, Rachael Rafter, and

Barry

Smyth . 2015 . Great explanations: Opinionated explanations for recommendations . In International Conference on Case-Based Reasoning . Springer, 244 - 258 .

[4]

Marco

Rossetti , Fabio Stella, and

Markus

Zanker . 2013 . Towards explaining latent factors with topic models in collaborative recommender systems . In Database and Expert Systems Applications (DEXA) , 2013 24th International Workshop on. IEEE, 162 - 167 .

[5]

Panagiotis

Symeonidis , Alexandros Nanopoulos, and

Yannis

Manolopoulos . 2008 . Providing justifications in recommender systems . IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 38 , 6 ( 2008 ), 1262 - 1272 .