Explaining Recommendations by Means of User Reviews Tim Donkers Benedikt Loepp Jürgen Ziegler University of Duisburg-Essen University of Duisburg-Essen University of Duisburg-Essen Duisburg, Germany Duisburg, Germany Duisburg, Germany tim.donkers@uni-due.de benedikt.loepp@uni-due.de juergen.ziegler@uni-due.de ABSTRACT When recommendation methods are applied in the real world, The field of recommender systems has seen substantial users can often provide textual feedback on items in the form progress in recent years in terms of algorithmic sophistica- of short tags or written reviews. Textual feedback from other tion and quality of recommendations as measured by standard customers is known to strongly influence the current user’s accuracy metrics. Yet, the systems mainly act as black boxes decision-making [2]. However, perusing all reviews associated for the user and are limited in their capability to explain why with the items in a recommendation set is time-consuming and certain items are recommended. This is particularly true when mostly infeasible. The information available in review data using abstract models which do not easily lend themselves is currently also hardly exploited for making the otherwise for providing explanations. In many cases, however, recom- opaque recommendation process more transparent. While mendation methods are employed in scenarios where users not research has more recently begun to investigate the role of, only rate items, but also provide feedback in the form of tags for instance, product features mentioned, topics addressed, or or written product reviews. Such user-generated content can general sentiments expressed, for improving algorithmic pre- serve as a useful source for deriving explanatory information cision [25, 5, 1], their potential for increasing intelligibility of that may increase the user’s understanding of the underlying recommendations has not been fully exploited yet. The same criteria and mechanisms that led to the results. In this paper, applies for aspects such as presence, polarity and quality of we describe a set of developments we undertook to couple such arguments found in the reviews for or against a product. Thus, textual content with common recommender techniques. These extracting and summarizing relevant arguments and present- developments have moved from integrating tags into collabo- ing them as textual explanations seems to offer a promising rative filtering to employing topics and sentiments expressed avenue to supporting users better in their decision process. in reviews to increase transparency and to give users more control over the recommendation process. Furthermore, we RELATED AND PRIOR WORK describe our current research goals and a first concept concern- One popular way of increasing transparency of RS is to use ing extraction of more complex argumentative explanations textual explanations [20]. When sufficient content information from textual reviews and presenting them to users. is available, item attributes may be aligned with user pref- ACM Classification Keywords erences to explain a recommendation [22]. Such data can H.3.0. Information Storage and Retrieval: General also be used together with context information to point out arguments for recommended items, and may simultaneously Author Keywords serve as a means to critique recommendations [13]. For item- Recommender Systems; Deep Learning; Explanations based collaborative filtering, the static variant used e.g. by Amazon (“Customers who bought this item also bought. . . ”) INTRODUCTION is quite popular. For model-based methods, it is still difficult Today’s Recommender Systems (RS) have been shown to gen- to improve transparency through explanations. The approach erate recommendations for items that match the user’s interest proposed in [25] actually exploits review data, but is limited profile quite accurately. The underlying methods are usually to identifying sentiments on a phrase-level to highlight prod- based either on purchase data or ratings of other users (collab- uct features the user is particularly interested in. In other orative filtering), or on structured data explicitly describing cases where reviews have been analyzed semantically, this the items (content-based filtering). Yet, algorithmic maturity has served primarily to improve model quality, e.g. by infer- does not necessarily lead to a commensurate level of user ring hidden topics, extracting content aspects, or mining user satisfaction [11]. Aspects related to user experience such as opinions [5, 1]. More advanced approaches that, for instance, the amount of control users have over the recommendation extract argumentative explanations from review texts to sup- process or the transparency of the systems may also contribute port the user’s decision-making do not exist. One exception substantially to the user’s acceptance and appraisal of the where argumentation techniques have been integrated into RS recommendations [11]. Still, many state-of-the-art methods is presented in [4]. Yet, this approach depends on the avail- appear to the user as black boxes and do not provide a ratio- ability of explicit item features and, since implemented in nale for the recommendations, which may negatively influence defeasible logic, requires defining postulates manually. intelligibility, and thus user’s comprehension and trust [18]. In prior work of ours [6, 7], we proposed a model-based recom- © 2018. Copyright for the individual papers remains with the authors. Copying per- mendation method that exploits textual data. TagMF enhances mitted for private and academic purposes. a standard matrix factorization [12] algorithm with tags users ExSS ’18, March 11, Tokyo, Japan provided for the items. By learning an integrated model of tive descriptions by (sometimes implicitly) invoking a stance ratings and tags, the meaning of the learned latent factors towards a target. Users reading a review can get an idea of becomes more transparent [7] and the user may interactively motives and reasoning behind its author’s words. This verbal change their effect on the generated recommendations [6]. In provision of subjective information constitutes a comprehen- contrast to other attempts that integrate additional data, this im- sible context on which arguments and their interrelations can proves comprehensibility of recommendations as well as user be grounded. However, automatically identifying presence, control over the system which is typically limited to (re-)rating polarity and quality of argumentation structures has not yet single items. Moreover, our user study showed that not only been considered in RS research, although it is well known objective accuracy as measured in offline experiments, but that textual feedback of other users may strongly influence also perceived recommendation quality benefits from comple- decision-making [2]. In particular, the potential of arguments menting ratings with additional tag data. Apparently, tags can has not been exploited for explaining recommendations. Due introduce semantics into the underlying abstract model that to their persuasive nature, arguments can be thought of pro- are natural to understand so that users notice some kind of viding intelligible reasons that support recommendations, and inner consistency in the recommendation set. Besides, the may thus increase system transparency and trust in the results. relations between users and tags introduced by our method allow to explicitly describe user preferences in textual form: Since users differ with respect to dispositions and preferences, We can automatically derive which tags are considered im- they may attach different levels of importance to arguments. portant to an individual user, even in cases when the user For example, one user may insist on closeness to the beach has never tagged items him- or herself. These tags can then when booking a hotel, while another user lays more focus be presented as an explanation of his or her formerly latent on cleanliness or friendliness of the staff. Although reading preference profile [7]. Despite its advantages, our method the same review, these two users would, most likely, attend to completely different aspects. A sophisticated argument requires the relationship between items and additional data extractor should therefore mimic human scanning behavior by to be quantified in numerical form. If this requirement is not adaptively assigning individual attention weights to arguments. met, it seems a natural extension to exploit other, more general forms of user-generated content such as product reviews. In [9], we presented an interactive recommending approach re- Attention-based mechanism lying on review data. We extended the concept of blended rec- central location Explained item shopping recommendation The hotel is in a central ommending [15] by automatically extracting keywords from Item location, with lots of shopping beach and eateries nearby, yet only product reviews and identifying their sentiment. Then, we few meters from the beach. Linguistic argument mining and used the extracted (positive or negative) item descriptions to stance detection offer filtering options usually not available in contemporary Recommender system RS. In our system, we present them as facet values that can be Figure 1. In our framework, a review is analyzed linguistically and via selected and weighted by the user to influence the recommen- an attention-based mechanism. This allows to implement an argumen- dations. We conducted a user study that provided evidence that tation flow based on information provided in the review while deeply users were able to find items matching interests that are diffi- integrating the user. Eventually, a personalized recommendation is pre- sented together with individual arguments for or against this product. cult to take into account when only structured content data is exploited. Without requiring users to actually read the reviews, We propose a conceptual framework that automates the pro- the method seems promising for improving the recommenda- cess of extracting arguments from reviews (see Figure 1) to tion process in terms of user experience, especially when users come up with personalized argumentative item-level explana- have to choose from sets of “experience products”. Although tions for recommendations: We suggest to apply feature-based in principle any algorithm can be integrated in the system’s methods from computational linguistics in order to derive hybrid configuration, we have not yet utilized the advantages argumentative structures, including argument polarity in the of model-based recommender methods in combination with form of stances. Beyond that, detecting personally impor- the ones of exploiting user-generated reviews. tant arguments requires a deep integration into the process of calculating recommendations. We aim at utilizing deep learn- ing methods that—matching the attention analogy—enable A CONCEPT FOR EXPLAINING RECOMMENDATIONS the system to focus on important concepts subject to a user Building on our prior work, we in the following present a novel variable. Put together, this leads to the following challenges: concept that relies on extracting and summarizing arguments about products from textual reviews in order to provide users • Linguistically analyzing review texts via argument mining with adapted model-based recommendations, and in particular, and stance detection. explanations that are personalized according to the current • Identifying important concepts for a target user via an user’s preferences and styles of decision-making. attention-based mechanism. • Deriving an argumentation flow via multiple applications While tagging helps to classify items by attributing specific of the attention-based mechanism. properties, the descriptive nature of tags limits their applicabil- • Unifying the linguistic analyses and the attention-based ity as an explanatory element. User reviews, on the other hand, mechanism. constitute semantically rich information sources that incorpo- rate sentiment and often an intended sequence of arguments, Linguistic Analyses: Computational linguistic approaches i.e. an argumentation flow. Reviews go beyond mere objec- can be distinguished based on granularity of the analysis. Document-level analyses, for instance, aim at determining we suggest to compute a weighted sum over the sequence of a sentiment or stance towards the subject of decision, e.g. a ho- vectors derived from the words contained in a review. The as- tel. However, for our purpose such an approach is too shallow signed weights would indicate the relative importance of each as a review generally not only consists of utterances regarding vector and thus resemble the amount of attention a particular the target item as a whole, but usually also includes remarks on word is receiving. sub-aspects. Sentiments or stances towards these sub-aspects may deviate from the review’s overall polarity. For exam- Personalizing explanations requires attention to be distributed ple, a guest may generally like a hotel albeit he or she found with respect to user preferences, i.e. users act as the context the bed uncomfortable. In addition, a review might address that moderates the (soft-)attention’s output. Thus, it is nec- sub-aspects not important to the current user. Therefore, we essary to calculate attention weights subject to a vectorial user representation. We propose to model users analogous propose a more fine-grained approach relying on aspect-level to our previous work [8] by embedding one-hot user vectors, argument mining that is capable of assigning polarity to a along with word vectors, into a densified joint information variety of mentioned entities. space. This would allow to numerically estimate the degree to Automated argument mining refers to identifying linguistic which the current user’s preferences are in line with concepts structures consisting of at least one explicit claim and optional expressed in a review. Assume, for example, the following supporting structures such as premises [17]. Depending on portion of a review: “The food is good, but beds are uncomfort- which theory of argumentation one follows [21, 10], these able.” If the user has shown interest in good food in the past, structures are more or less specialized. However, analyzing the attention mechanism should assign a large weight to food. user-generated reviews depicts a difficult task for an argument Although the lack of bed comfort might be relevant to others, mining tool as they deviate considerably from texts on which this argument should be neglected and receive a weight close linguistic argument mining is typically performed, e.g. legal to zero if the system has not detected a relationship between text or scientific writing: Reviews are usually shorter, noisy, the current user and this particular aspect before. less densely packed with arguments, and contain arguments in a way that is often not as sophisticated. Argumentative Flow: Argumentation can be considered more convincing if it consists not only of an aggregation of Another difficulty arises when one wants to decide whether an potentially important words or phrases. We assume arguments extracted argument is in favor or against a particular product. to become more understandable and effective in case they fol- However, a notion of polarity is necessary to be able to select low a coherent argumentative flow, i.e. dependencies between adequate arguments supporting or opposing a recommenda- successively extracted arguments. The analogy in our pro- tion. Therefore, we propose to relate each argument to a stance posed concept is the repeated application of the (soft-)attention expressed by the review author. In this regard, a stance target mechanism while considering previous output as context. In- is not limited to the subject of decision itself but may essen- formally, this mimics a conversational exchange between user tially address anything towards which one can have a stance. and attention mechanism where the latter continuously tries to It follows that the linguistic analyses need to identify the target identify concepts more and more tailored specifically towards and establish a relation to the subject of decision. Please note the user’s preferences. The result would be a succession of that stance detection involves identifying a subjective dispo- relevant word sets that exhibit sequential properties and are sition that might often be implicit [14]. Moreover, stances, thus tied to each other. As a consequence, the output sequence as opposed to e.g. sentiments, do not necessarily integrate a would reflect the whole path of how the system came up with polarity aspect (e.g. “the hotel is modern”, “the food is local”). a particular recommendation and thus explain which concepts played an important role during the process (see Figure 1). It Attention-based Mechanism: Since we are interested in pro- must be noted that such a procedure would be closely related viding users with personalized explanations, solely performing to so-called memory networks [23, 19], which also work with linguistic analyses is not sufficient: Users differ not only in multiple hop operations on an attention-based memory. Since their interests, but have unique styles of decision-making. As our concept imposes an artificial argumentative structure on a consequence, assessment of the importance of an argument the raw review text, it will be interesting to investigate the found in a review as well as its accordance with the stances argumentative flow intended by the review’s author compared expressed by the review author have to be adapted towards the to the one automatically derived by the system. current user. In order to achieve this, we propose an attention- based approach that considers a vectorial user representation Unification of Linguistic Analyses and Attention: Up to to identify personally important concepts. this point, we have covered two independent approaches to process review texts: (1) linguistic analyses aiming at min- Attention-based deep learning approaches [3] have proved to ing arguments and detecting stances, and (2) attention-based be very successful at identifying significant local features in extraction of words relevant to the current user. However, tasks from several domains such as image processing [24] or both on their own are limited in expressiveness: the linguistic machine translation [16]. An attention function can be de- analyses lack the personalization component while the atten- scribed as a (soft-)search for a set of positions in a linguistic tion mechanism operates on word-level, thus incapable of source where the most relevant information is concentrated. In extracting complete arguments. Consequently, following our the context of review-based RS, we propose to use attention as superordinate research goal of presenting users more informa- a means to identify important and distinguishing concepts for tive, personalized explanations, we plan to exploit the benefits the subject of decision, i.e. a target item. Technically speaking, of both approaches by coupling them closely together. A first with Review-based Information Filtering. In IntRS ’17. attempt, for instance, would be to check whether the surround- 2–9. ing context of a word that received a large amount of attention 10. J. B. Freeman. 1991. Dialectics and the Macrostructure was identified as an argument by the linguistic processor. If of Arguments: A Theory of Argument Structure. De this condition is met, the system can interpret the argument as Gruyter. a possible candidate for a personalized explanation. Assume the system detects e.g. the arguments “the food is good” and 11. J. A. Konstan and J. Riedl. 2012. Recommender Systems: “beds are uncomfortable”, then identifying food as an impor- From Algorithms to User Experience. User Mod. tant individual concept should lead to the first argument being User-Adap. 22, 1-2 (2012), 101–123. chosen for explaining the recommendation. 12. Y. Koren, R. M. Bell, and C. Volinsky. 2009. Matrix CONCLUSIONS Factorization Techniques for Recommender Systems. In this paper, we have discussed the state of research regarding IEEE Computer 42, 8 (2009), 30–37. usage of product reviews in RS—with a focus on explanations. 13. B. Lamche, U. Adıgüzel, and W. Wörndl. 2014. We set our prior work into context, where we already used Interactive Explanations in Mobile Shopping tags to increase a recommender’s transparency and analyzed Recommender Systems. In IntRS ’14. 14–21. product reviews in order to provide extended interaction pos- sibilities. Based on this, we pointed out a possible way to 14. W.-H. Lin, T. Wilson, J. Wiebe, and A. Hauptmann. 2006. exploit this rich information source for presenting users with Which Side are you on?: Identifying Perspectives at the intelligible, personalized explanations by extracting more com- Document and Sentence Levels. In CoNLL-X ’06. plex arguments. We outlined several challenges and proposed Association for Computational Linguistics, 109–116. a concept to address them. For future work, our research goals 15. B. Loepp, K. Herrmanny, and J. Ziegler. 2015. Blended are to implement this novel concept, use it to additionally Recommending: Integrating Interactive Information personalize recommendations, and to evaluate it in several Filtering and Algorithmic Recommender Techniques. In domains with a focus on the influence on user experience in CHI ’15. ACM, 975–984. comparison to other explanation methods. 16. T. Luong, H. Pham, and C. D. Manning. 2015. Effective REFERENCES Approaches to Attention-based Neural Machine 1. A. Almahairi, K. Kastner, K. Cho, and A. Courville. 2015. Translation. In EMNLP ’15. 1412–1421. Learning Distributed Representations from Reviews for 17. R. M. Palau and M.-F. Moens. 2009. Argumentation Collaborative Filtering. In RecSys ’15. ACM, 147–154. Mining: The Detection, Classification and Structure of 2. G. Askalidis and E. C. Malthouse. 2016. The Value of Arguments in Text. In ICAIL ’09. ACM, 98–107. Online Customer Reviews. In RecSys ’16. ACM, 18. P. Pu, L. Chen, and R. Hu. 2012. Evaluating 155–158. Recommender Systems from the User’s Perspective: 3. D. Bahdanau, K. Cho, and Y. Bengio. 2015. Neural Survey of the State of the Art. User Mod. User-Adap. 22, Machine Translation by Jointly Learning to Align and 4-5 (2012), 317–355. Translate. In ICLR ’15. 19. S. Sukhbaatar, A. Szlam, J. Weston, and R. Fergus. 2015. 4. C. E. Briguez, M. C. D. Budán, C. A. D. Deagustini, End-To-End Memory Networks. In NIPS ’15. A. G. Maguitman, M. Capobianco, and G. R. Simari. 2440–2448. 2014. Argument-based Mixed Recommenders and Their 20. N. Tintarev and J. Masthoff. 2015. Recommender Systems Application to Movie Suggestion. Expert Syst. Appl. 41, Handbook. Springer US, Chapter Explaining 14 (2014), 6467–6482. Recommendations: Design and Evaluation, 353–382. 5. Q. Diao, M. Qiu, C.-Y. Wu, A. J. Smola, J. Jiang, and C. 21. S. E. Toulmin. 2003. The Uses of Argument. Cambridge Wang. 2014. Jointly Modeling Aspects, Ratings and University Press. Sentiments for Movie Recommendation (JMARS). In KDD ’14. ACM, 193–202. 22. J. Vig, S. Sen, and J. Riedl. 2009. Tagsplanations: Explaining Recommendations Using Tags. In IUI ’09. 6. T. Donkers, B. Loepp, and J. Ziegler. 2016a. ACM, 47–56. Tag-Enhanced Collaborative Filtering for Increasing Transparency and Interactive Control. In UMAP ’16. 23. J. Weston, S. Chopra, and A. Bordes. 2014. Memory ACM, 169–173. Networks. (2014). 7. T. Donkers, B. Loepp, and J. Ziegler. 2016b. Towards 24. K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Understanding Latent Factors and User Profiles by Salakhutdinov, R. S. Zemel, and Y. Bengio. 2015. Show, Enhancing Matrix Factorization with Tags. In RecSys ’16. Attend and Tell: Neural Image Caption Generation with Visual Attention. In ICML ’15. 2048–2057. 8. T. Donkers, B. Loepp, and J. Ziegler. 2017. Sequential User-Based Recurrent Neural Network 25. Y. Zhang, G. Lai, M. Zhang, Y. Zhang, Y. Liu, and S. Ma. Recommendations. In RecSys ’17. ACM, 152–160. 2014. Explicit Factor Models for Explainable 9. J. Feuerbach, B. Loepp, C.-M. Barbu, and J. Ziegler. Recommendation Based on Phrase-Level Sentiment 2017. Enhancing an Interactive Recommendation System Analysis. In SIGIR ’14. ACM, 83–92.