Explaining Recommendations by Means of User Reviews
              Tim Donkers                                           Benedikt Loepp                                 Jürgen Ziegler
       University of Duisburg-Essen                           University of Duisburg-Essen                  University of Duisburg-Essen
            Duisburg, Germany                                     Duisburg, Germany                              Duisburg, Germany
         tim.donkers@uni-due.de                                benedikt.loepp@uni-due.de                     juergen.ziegler@uni-due.de


ABSTRACT                                                                             When recommendation methods are applied in the real world,
The field of recommender systems has seen substantial                                users can often provide textual feedback on items in the form
progress in recent years in terms of algorithmic sophistica-                         of short tags or written reviews. Textual feedback from other
tion and quality of recommendations as measured by standard                          customers is known to strongly influence the current user’s
accuracy metrics. Yet, the systems mainly act as black boxes                         decision-making [2]. However, perusing all reviews associated
for the user and are limited in their capability to explain why                      with the items in a recommendation set is time-consuming and
certain items are recommended. This is particularly true when                        mostly infeasible. The information available in review data
using abstract models which do not easily lend themselves                            is currently also hardly exploited for making the otherwise
for providing explanations. In many cases, however, recom-                           opaque recommendation process more transparent. While
mendation methods are employed in scenarios where users not                          research has more recently begun to investigate the role of,
only rate items, but also provide feedback in the form of tags                       for instance, product features mentioned, topics addressed, or
or written product reviews. Such user-generated content can                          general sentiments expressed, for improving algorithmic pre-
serve as a useful source for deriving explanatory information                        cision [25, 5, 1], their potential for increasing intelligibility of
that may increase the user’s understanding of the underlying                         recommendations has not been fully exploited yet. The same
criteria and mechanisms that led to the results. In this paper,                      applies for aspects such as presence, polarity and quality of
we describe a set of developments we undertook to couple such                        arguments found in the reviews for or against a product. Thus,
textual content with common recommender techniques. These                            extracting and summarizing relevant arguments and present-
developments have moved from integrating tags into collabo-                          ing them as textual explanations seems to offer a promising
rative filtering to employing topics and sentiments expressed                        avenue to supporting users better in their decision process.
in reviews to increase transparency and to give users more
control over the recommendation process. Furthermore, we                             RELATED AND PRIOR WORK
describe our current research goals and a first concept concern-                     One popular way of increasing transparency of RS is to use
ing extraction of more complex argumentative explanations                            textual explanations [20]. When sufficient content information
from textual reviews and presenting them to users.                                   is available, item attributes may be aligned with user pref-
ACM Classification Keywords                                                          erences to explain a recommendation [22]. Such data can
H.3.0. Information Storage and Retrieval: General                                    also be used together with context information to point out
                                                                                     arguments for recommended items, and may simultaneously
Author Keywords                                                                      serve as a means to critique recommendations [13]. For item-
Recommender Systems; Deep Learning; Explanations                                     based collaborative filtering, the static variant used e.g. by
                                                                                     Amazon (“Customers who bought this item also bought. . . ”)
INTRODUCTION                                                                         is quite popular. For model-based methods, it is still difficult
Today’s Recommender Systems (RS) have been shown to gen-                             to improve transparency through explanations. The approach
erate recommendations for items that match the user’s interest                       proposed in [25] actually exploits review data, but is limited
profile quite accurately. The underlying methods are usually                         to identifying sentiments on a phrase-level to highlight prod-
based either on purchase data or ratings of other users (collab-                     uct features the user is particularly interested in. In other
orative filtering), or on structured data explicitly describing                      cases where reviews have been analyzed semantically, this
the items (content-based filtering). Yet, algorithmic maturity                       has served primarily to improve model quality, e.g. by infer-
does not necessarily lead to a commensurate level of user                            ring hidden topics, extracting content aspects, or mining user
satisfaction [11]. Aspects related to user experience such as                        opinions [5, 1]. More advanced approaches that, for instance,
the amount of control users have over the recommendation                             extract argumentative explanations from review texts to sup-
process or the transparency of the systems may also contribute                       port the user’s decision-making do not exist. One exception
substantially to the user’s acceptance and appraisal of the                          where argumentation techniques have been integrated into RS
recommendations [11]. Still, many state-of-the-art methods                           is presented in [4]. Yet, this approach depends on the avail-
appear to the user as black boxes and do not provide a ratio-                        ability of explicit item features and, since implemented in
nale for the recommendations, which may negatively influence                         defeasible logic, requires defining postulates manually.
intelligibility, and thus user’s comprehension and trust [18].
                                                                                     In prior work of ours [6, 7], we proposed a model-based recom-
© 2018. Copyright for the individual papers remains with the authors. Copying per-   mendation method that exploits textual data. TagMF enhances
mitted for private and academic purposes.                                            a standard matrix factorization [12] algorithm with tags users
 ExSS ’18, March 11, Tokyo, Japan
provided for the items. By learning an integrated model of            tive descriptions by (sometimes implicitly) invoking a stance
ratings and tags, the meaning of the learned latent factors           towards a target. Users reading a review can get an idea of
becomes more transparent [7] and the user may interactively           motives and reasoning behind its author’s words. This verbal
change their effect on the generated recommendations [6]. In          provision of subjective information constitutes a comprehen-
contrast to other attempts that integrate additional data, this im-   sible context on which arguments and their interrelations can
proves comprehensibility of recommendations as well as user           be grounded. However, automatically identifying presence,
control over the system which is typically limited to (re-)rating     polarity and quality of argumentation structures has not yet
single items. Moreover, our user study showed that not only           been considered in RS research, although it is well known
objective accuracy as measured in offline experiments, but            that textual feedback of other users may strongly influence
also perceived recommendation quality benefits from comple-           decision-making [2]. In particular, the potential of arguments
menting ratings with additional tag data. Apparently, tags can        has not been exploited for explaining recommendations. Due
introduce semantics into the underlying abstract model that           to their persuasive nature, arguments can be thought of pro-
are natural to understand so that users notice some kind of           viding intelligible reasons that support recommendations, and
inner consistency in the recommendation set. Besides, the             may thus increase system transparency and trust in the results.
relations between users and tags introduced by our method
allow to explicitly describe user preferences in textual form:        Since users differ with respect to dispositions and preferences,
We can automatically derive which tags are considered im-             they may attach different levels of importance to arguments.
portant to an individual user, even in cases when the user            For example, one user may insist on closeness to the beach
has never tagged items him- or herself. These tags can then           when booking a hotel, while another user lays more focus
be presented as an explanation of his or her formerly latent          on cleanliness or friendliness of the staff. Although reading
preference profile [7]. Despite its advantages, our method            the same review, these two users would, most likely, attend
                                                                      to completely different aspects. A sophisticated argument
requires the relationship between items and additional data
                                                                      extractor should therefore mimic human scanning behavior by
to be quantified in numerical form. If this requirement is not
                                                                      adaptively assigning individual attention weights to arguments.
met, it seems a natural extension to exploit other, more general
forms of user-generated content such as product reviews.
In [9], we presented an interactive recommending approach re-                                                         Attention-based
                                                                                                                        mechanism
lying on review data. We extended the concept of blended rec-                                                          central location    Explained item
                                                                                                                          shopping        recommendation
                                                                                     The hotel is in a central
ommending [15] by automatically extracting keywords from                  Item
                                                                                 location, with lots of shopping           beach
                                                                                  and eateries nearby, yet only
product reviews and identifying their sentiment. Then, we                          few meters from the beach.
                                                                                                                          Linguistic
                                                                                                                    argument mining and
used the extracted (positive or negative) item descriptions to                                                        stance detection

offer filtering options usually not available in contemporary                                                      Recommender system

RS. In our system, we present them as facet values that can be        Figure 1. In our framework, a review is analyzed linguistically and via
selected and weighted by the user to influence the recommen-          an attention-based mechanism. This allows to implement an argumen-
dations. We conducted a user study that provided evidence that        tation flow based on information provided in the review while deeply
users were able to find items matching interests that are diffi-      integrating the user. Eventually, a personalized recommendation is pre-
                                                                      sented together with individual arguments for or against this product.
cult to take into account when only structured content data is
exploited. Without requiring users to actually read the reviews,      We propose a conceptual framework that automates the pro-
the method seems promising for improving the recommenda-              cess of extracting arguments from reviews (see Figure 1) to
tion process in terms of user experience, especially when users       come up with personalized argumentative item-level explana-
have to choose from sets of “experience products”. Although           tions for recommendations: We suggest to apply feature-based
in principle any algorithm can be integrated in the system’s          methods from computational linguistics in order to derive
hybrid configuration, we have not yet utilized the advantages         argumentative structures, including argument polarity in the
of model-based recommender methods in combination with                form of stances. Beyond that, detecting personally impor-
the ones of exploiting user-generated reviews.                        tant arguments requires a deep integration into the process of
                                                                      calculating recommendations. We aim at utilizing deep learn-
                                                                      ing methods that—matching the attention analogy—enable
A CONCEPT FOR EXPLAINING RECOMMENDATIONS                              the system to focus on important concepts subject to a user
Building on our prior work, we in the following present a novel       variable. Put together, this leads to the following challenges:
concept that relies on extracting and summarizing arguments
about products from textual reviews in order to provide users         • Linguistically analyzing review texts via argument mining
with adapted model-based recommendations, and in particular,            and stance detection.
explanations that are personalized according to the current           • Identifying important concepts for a target user via an
user’s preferences and styles of decision-making.                       attention-based mechanism.
                                                                      • Deriving an argumentation flow via multiple applications
While tagging helps to classify items by attributing specific           of the attention-based mechanism.
properties, the descriptive nature of tags limits their applicabil-   • Unifying the linguistic analyses and the attention-based
ity as an explanatory element. User reviews, on the other hand,         mechanism.
constitute semantically rich information sources that incorpo-
rate sentiment and often an intended sequence of arguments,           Linguistic Analyses: Computational linguistic approaches
i.e. an argumentation flow. Reviews go beyond mere objec-             can be distinguished based on granularity of the analysis.
Document-level analyses, for instance, aim at determining             we suggest to compute a weighted sum over the sequence of
a sentiment or stance towards the subject of decision, e.g. a ho-     vectors derived from the words contained in a review. The as-
tel. However, for our purpose such an approach is too shallow         signed weights would indicate the relative importance of each
as a review generally not only consists of utterances regarding       vector and thus resemble the amount of attention a particular
the target item as a whole, but usually also includes remarks on      word is receiving.
sub-aspects. Sentiments or stances towards these sub-aspects
may deviate from the review’s overall polarity. For exam-             Personalizing explanations requires attention to be distributed
ple, a guest may generally like a hotel albeit he or she found        with respect to user preferences, i.e. users act as the context
the bed uncomfortable. In addition, a review might address            that moderates the (soft-)attention’s output. Thus, it is nec-
sub-aspects not important to the current user. Therefore, we          essary to calculate attention weights subject to a vectorial
                                                                      user representation. We propose to model users analogous
propose a more fine-grained approach relying on aspect-level
                                                                      to our previous work [8] by embedding one-hot user vectors,
argument mining that is capable of assigning polarity to a
                                                                      along with word vectors, into a densified joint information
variety of mentioned entities.
                                                                      space. This would allow to numerically estimate the degree to
Automated argument mining refers to identifying linguistic            which the current user’s preferences are in line with concepts
structures consisting of at least one explicit claim and optional     expressed in a review. Assume, for example, the following
supporting structures such as premises [17]. Depending on             portion of a review: “The food is good, but beds are uncomfort-
which theory of argumentation one follows [21, 10], these             able.” If the user has shown interest in good food in the past,
structures are more or less specialized. However, analyzing           the attention mechanism should assign a large weight to food.
user-generated reviews depicts a difficult task for an argument       Although the lack of bed comfort might be relevant to others,
mining tool as they deviate considerably from texts on which          this argument should be neglected and receive a weight close
linguistic argument mining is typically performed, e.g. legal         to zero if the system has not detected a relationship between
text or scientific writing: Reviews are usually shorter, noisy,       the current user and this particular aspect before.
less densely packed with arguments, and contain arguments in
a way that is often not as sophisticated.                             Argumentative Flow: Argumentation can be considered
                                                                      more convincing if it consists not only of an aggregation of
Another difficulty arises when one wants to decide whether an         potentially important words or phrases. We assume arguments
extracted argument is in favor or against a particular product.       to become more understandable and effective in case they fol-
However, a notion of polarity is necessary to be able to select       low a coherent argumentative flow, i.e. dependencies between
adequate arguments supporting or opposing a recommenda-               successively extracted arguments. The analogy in our pro-
tion. Therefore, we propose to relate each argument to a stance       posed concept is the repeated application of the (soft-)attention
expressed by the review author. In this regard, a stance target       mechanism while considering previous output as context. In-
is not limited to the subject of decision itself but may essen-       formally, this mimics a conversational exchange between user
tially address anything towards which one can have a stance.          and attention mechanism where the latter continuously tries to
It follows that the linguistic analyses need to identify the target   identify concepts more and more tailored specifically towards
and establish a relation to the subject of decision. Please note      the user’s preferences. The result would be a succession of
that stance detection involves identifying a subjective dispo-        relevant word sets that exhibit sequential properties and are
sition that might often be implicit [14]. Moreover, stances,          thus tied to each other. As a consequence, the output sequence
as opposed to e.g. sentiments, do not necessarily integrate a         would reflect the whole path of how the system came up with
polarity aspect (e.g. “the hotel is modern”, “the food is local”).    a particular recommendation and thus explain which concepts
                                                                      played an important role during the process (see Figure 1). It
Attention-based Mechanism: Since we are interested in pro-            must be noted that such a procedure would be closely related
viding users with personalized explanations, solely performing        to so-called memory networks [23, 19], which also work with
linguistic analyses is not sufficient: Users differ not only in       multiple hop operations on an attention-based memory. Since
their interests, but have unique styles of decision-making. As        our concept imposes an artificial argumentative structure on
a consequence, assessment of the importance of an argument
                                                                      the raw review text, it will be interesting to investigate the
found in a review as well as its accordance with the stances
                                                                      argumentative flow intended by the review’s author compared
expressed by the review author have to be adapted towards the
                                                                      to the one automatically derived by the system.
current user. In order to achieve this, we propose an attention-
based approach that considers a vectorial user representation         Unification of Linguistic Analyses and Attention: Up to
to identify personally important concepts.                            this point, we have covered two independent approaches to
                                                                      process review texts: (1) linguistic analyses aiming at min-
Attention-based deep learning approaches [3] have proved to
                                                                      ing arguments and detecting stances, and (2) attention-based
be very successful at identifying significant local features in
                                                                      extraction of words relevant to the current user. However,
tasks from several domains such as image processing [24] or
                                                                      both on their own are limited in expressiveness: the linguistic
machine translation [16]. An attention function can be de-
                                                                      analyses lack the personalization component while the atten-
scribed as a (soft-)search for a set of positions in a linguistic     tion mechanism operates on word-level, thus incapable of
source where the most relevant information is concentrated. In        extracting complete arguments. Consequently, following our
the context of review-based RS, we propose to use attention as        superordinate research goal of presenting users more informa-
a means to identify important and distinguishing concepts for         tive, personalized explanations, we plan to exploit the benefits
the subject of decision, i.e. a target item. Technically speaking,
of both approaches by coupling them closely together. A first          with Review-based Information Filtering. In IntRS ’17.
attempt, for instance, would be to check whether the surround-         2–9.
ing context of a word that received a large amount of attention
                                                                   10. J. B. Freeman. 1991. Dialectics and the Macrostructure
was identified as an argument by the linguistic processor. If
                                                                       of Arguments: A Theory of Argument Structure. De
this condition is met, the system can interpret the argument as
                                                                       Gruyter.
a possible candidate for a personalized explanation. Assume
the system detects e.g. the arguments “the food is good” and       11. J. A. Konstan and J. Riedl. 2012. Recommender Systems:
“beds are uncomfortable”, then identifying food as an impor-           From Algorithms to User Experience. User Mod.
tant individual concept should lead to the first argument being        User-Adap. 22, 1-2 (2012), 101–123.
chosen for explaining the recommendation.
                                                                   12. Y. Koren, R. M. Bell, and C. Volinsky. 2009. Matrix
CONCLUSIONS                                                            Factorization Techniques for Recommender Systems.
In this paper, we have discussed the state of research regarding       IEEE Computer 42, 8 (2009), 30–37.
usage of product reviews in RS—with a focus on explanations.       13. B. Lamche, U. Adıgüzel, and W. Wörndl. 2014.
We set our prior work into context, where we already used              Interactive Explanations in Mobile Shopping
tags to increase a recommender’s transparency and analyzed             Recommender Systems. In IntRS ’14. 14–21.
product reviews in order to provide extended interaction pos-
sibilities. Based on this, we pointed out a possible way to        14. W.-H. Lin, T. Wilson, J. Wiebe, and A. Hauptmann. 2006.
exploit this rich information source for presenting users with         Which Side are you on?: Identifying Perspectives at the
intelligible, personalized explanations by extracting more com-        Document and Sentence Levels. In CoNLL-X ’06.
plex arguments. We outlined several challenges and proposed            Association for Computational Linguistics, 109–116.
a concept to address them. For future work, our research goals     15. B. Loepp, K. Herrmanny, and J. Ziegler. 2015. Blended
are to implement this novel concept, use it to additionally            Recommending: Integrating Interactive Information
personalize recommendations, and to evaluate it in several             Filtering and Algorithmic Recommender Techniques. In
domains with a focus on the influence on user experience in            CHI ’15. ACM, 975–984.
comparison to other explanation methods.
                                                                   16. T. Luong, H. Pham, and C. D. Manning. 2015. Effective
REFERENCES                                                             Approaches to Attention-based Neural Machine
 1. A. Almahairi, K. Kastner, K. Cho, and A. Courville. 2015.          Translation. In EMNLP ’15. 1412–1421.
    Learning Distributed Representations from Reviews for          17. R. M. Palau and M.-F. Moens. 2009. Argumentation
    Collaborative Filtering. In RecSys ’15. ACM, 147–154.              Mining: The Detection, Classification and Structure of
 2. G. Askalidis and E. C. Malthouse. 2016. The Value of               Arguments in Text. In ICAIL ’09. ACM, 98–107.
    Online Customer Reviews. In RecSys ’16. ACM,                   18. P. Pu, L. Chen, and R. Hu. 2012. Evaluating
    155–158.                                                           Recommender Systems from the User’s Perspective:
 3. D. Bahdanau, K. Cho, and Y. Bengio. 2015. Neural                   Survey of the State of the Art. User Mod. User-Adap. 22,
    Machine Translation by Jointly Learning to Align and               4-5 (2012), 317–355.
    Translate. In ICLR ’15.                                        19. S. Sukhbaatar, A. Szlam, J. Weston, and R. Fergus. 2015.
 4. C. E. Briguez, M. C. D. Budán, C. A. D. Deagustini,                End-To-End Memory Networks. In NIPS ’15.
    A. G. Maguitman, M. Capobianco, and G. R. Simari.                  2440–2448.
    2014. Argument-based Mixed Recommenders and Their              20. N. Tintarev and J. Masthoff. 2015. Recommender Systems
    Application to Movie Suggestion. Expert Syst. Appl. 41,            Handbook. Springer US, Chapter Explaining
    14 (2014), 6467–6482.                                              Recommendations: Design and Evaluation, 353–382.
 5. Q. Diao, M. Qiu, C.-Y. Wu, A. J. Smola, J. Jiang, and C.       21. S. E. Toulmin. 2003. The Uses of Argument. Cambridge
    Wang. 2014. Jointly Modeling Aspects, Ratings and                  University Press.
    Sentiments for Movie Recommendation (JMARS). In
    KDD ’14. ACM, 193–202.                                         22. J. Vig, S. Sen, and J. Riedl. 2009. Tagsplanations:
                                                                       Explaining Recommendations Using Tags. In IUI ’09.
 6. T. Donkers, B. Loepp, and J. Ziegler. 2016a.                       ACM, 47–56.
    Tag-Enhanced Collaborative Filtering for Increasing
    Transparency and Interactive Control. In UMAP ’16.             23. J. Weston, S. Chopra, and A. Bordes. 2014. Memory
    ACM, 169–173.                                                      Networks. (2014).
 7. T. Donkers, B. Loepp, and J. Ziegler. 2016b. Towards           24. K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R.
    Understanding Latent Factors and User Profiles by                  Salakhutdinov, R. S. Zemel, and Y. Bengio. 2015. Show,
    Enhancing Matrix Factorization with Tags. In RecSys ’16.           Attend and Tell: Neural Image Caption Generation with
                                                                       Visual Attention. In ICML ’15. 2048–2057.
 8. T. Donkers, B. Loepp, and J. Ziegler. 2017. Sequential
    User-Based Recurrent Neural Network                            25. Y. Zhang, G. Lai, M. Zhang, Y. Zhang, Y. Liu, and S. Ma.
    Recommendations. In RecSys ’17. ACM, 152–160.                      2014. Explicit Factor Models for Explainable
 9. J. Feuerbach, B. Loepp, C.-M. Barbu, and J. Ziegler.               Recommendation Based on Phrase-Level Sentiment
    2017. Enhancing an Interactive Recommendation System               Analysis. In SIGIR ’14. ACM, 83–92.