<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ICB</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Post-hoc Explanations for Complex Model Recommendations using Simple Methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dorin Shmaryahu</string-name>
          <email>dorins@post.bgu.ac.il</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guy Shani</string-name>
          <email>shanigu@bgu.ac.il</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bracha Shapira</string-name>
          <email>bshapira@bgu.ac.il</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Author Keywords</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ben-Gurion University of the</institution>
          ,
          <addr-line>Negev</addr-line>
          ,
          <country country="IL">Israel</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Recommender Systems</institution>
          ,
          <addr-line>Explainable Recommendation, content-base explanations, collaborative filtering explanations, user-study</addr-line>
        </aff>
      </contrib-group>
      <volume>376</volume>
      <issue>4</issue>
      <abstract>
        <p>Many leading approaches for generating recommendations, such as matrix factorization and autoencoders, compute a complex model composed of latent variables. As such, explaining the recommendations generated by these models is a difficult task. In this paper, instead of attempting to explain the latent variables, we provide post-hoc explanations for why a recommended item may be appropriate for the user, by using a set of simple, easily explainable recommendation algorithms. When the output of the simple explainable recommender agrees with the complex model on a recommended item, we consider the explanation of the simple model to be applicable. We suggest both simple collaborative filtering and content based approaches for generating these explanations. We conduct a user study in the movie recommendation domain, showing that users accept our explanations, and react positively to simple and short explanations, even if they do not truly explain the mechanism leading to the generated recommendations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        INTRODUCTION
Recommendation systems that suggest items to users can be
found in many modern applications, from online newspapers
and movie streaming applications, to e-commerce [
        <xref ref-type="bibr" rid="ref18 ref2 ref26">2, 26, 18</xref>
        ].
Research has shown that in many applications, user may be
interested in understanding why is a particular recommended
item appropriate for her [
        <xref ref-type="bibr" rid="ref11 ref27 ref31">27, 11, 31</xref>
        ]. Thus, it is beneficial to
be able to generate explanations for the recommended items.
Early simple recommendation algorithms often yield a natural
explanation for their recommendations. For example, the
recommendations of a neighborhood based collaborative filtering
approach [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] can be explained as: “users similar to you often
choose this item”. Item-item collaborative filtering algorithms
[
        <xref ref-type="bibr" rid="ref23 ref3">23, 3</xref>
        ] provide recommendations that can be explained as
“users who choose the item that you have chosen often also
choose the recommended item”. Content-based algorithms
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], that learn for each user a set of content features that the
user prefers, generate recommendations that can be explained
by “the recommended item has a content feature that you
prefer”.
      </p>
      <p>
        However, these simple algorithms often provide
recommendations of lower accuracy than modern approaches. In recent
years, two collaborative filtering approaches became popular
for generating good recommendations — the matrix
factorization (MF) approach [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13, 14, 15</xref>
        ], and the artificial neural
network (ANN) approach [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. Algorithms of these families
have shown the capacity to generate accurate
recommendations for users.
      </p>
      <p>One of the downsides of both approaches is that they compute
the recommendations through a set of latent variables and
their possibly non-linear relations. For example, in the MF
approach one computes a vector of latent variables for each
user, and a vector of latent variables for each item, and then
computes a recommendation score using the inner product
between the vectors of a particular user and a particular item. The
values of the latent variables do not have an understandable
meaning to humans.</p>
      <p>
        Several researchers have attempted to provide explanations by
understanding the behavior of the latent variables [
        <xref ref-type="bibr" rid="ref32 ref6">32, 6</xref>
        ]. Such
efforts may be possible in some cases, but it is unlikely that all,
or even most, latent variables represent an easy to understand
structure. The problem becomes even more difficult with deep
ANNs, that may contain thousands of such variables with
complex connections between them.
      </p>
      <p>
        Alternatively, one can take a post-hoc approach to
explanations [
        <xref ref-type="bibr" rid="ref12 ref4">12, 4</xref>
        ], that takes the model recommendations as input,
and attempts to identify reasons as to why these recommended
items are appropriate to the user. For example, [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] used
association rule mining to identify explanations for the
recommendations directly from the data. These explanations cannot
be considered to be transparent [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], as they do not shed light
on the choices made within the model in recommending the
particular item, but may still provide value to the user. They
can be effective, helping the user in making decisions. They
may be persuasive, convincing the user to explore the
recommended item. They may also increase trust, by, e.g., providing
a reasonable explanation for a recommendation that the user
dislikes.
      </p>
      <p>In this paper we also take a post-hoc explanation generation
approach. Given the output of any black-box recommender,
we run a set of easy-to-explain recommendation algorithms,
such as the simple collaborative filtering and content based
methods suggested above. These algorithms provide a score
for the items recommended by the black box recommender.
When this score is sufficiently high, it means that the
explainable recommender agrees with the black box recommender.
In this case, we can present the explanation of the explaining
recommender to the user.</p>
      <p>Our approach is model agnostic — we can generate
explanations for any recommender. Our approach is also flexible,
in that the explanations can be generated post-hoc by any
easy-to-explain recommendation algorithm that outputs a
recommendation score for each item. Although in this paper we
study only the simple recommenders mentioned above, given
any other easy-to-explain recommender, one can use it to
generate new explanations, that would be candidate explanations
for the items recommended by the black box recommender.
We study the user perception of explanations generated by
simple easy-to-explain recommenders for the items recommended
by complex models. We evaluate the user’s response to
recommended items with and without explanations of different
types. We also measure participant user preference over the
various types of explanations. To study these questions we
conduct a user study in the movie domain. We use two popular
recommendation models, an MF and an autoencoder, as black
boxes to generate recommendations. For each recommended
item we run a set of 6 easy-to-explain approaches to produce
explanations for the recommendation — item-item content
based, user-item content based, item-item collaborative
filtering, user-user collaborative filtering, movie overview textual
similarity, and a popularity recommender. We show only
explanations which are sufficiently relevant, that is, whose score
passes a method-dependant threshold.</p>
      <p>We first ask participants to rank the generated
recommendations without any explanation. Then, we ask their opinion
about recommended items with explanation, showing a single,
randomly chosen, explanation for every movie.</p>
      <p>In the next stage of the user study, the participants were shown
additional recommended movies. In this stage we presented
all explanations that passed a threshold to the participants,
and asked them to rate each explanation. The results in this
stage show that participants preferred content based
explanations to collaborative filtering explanations, and that popularity
explanations are rated the lowest.</p>
      <p>Finally, the participants completed an online survey, asking
their opinion about recommendation explanations in general.
Our results indicate that participants prefer short and easy to
understand explanations to transparent explanations that fully
disclose the mechanism behind the computed
recommendations.</p>
      <p>
        BACKGROUND
Recommender systems actively suggest items to users, to
help them to rapidly discover relevant items, and to increase
item consumption [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Such systems can be found in many
applications, including TV streaming services [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], online
ecommerce [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], smart tutoring [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and many more [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. We
focus here one important recommendation task [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] —
topN recommendation, where the system computes a list of N
recommended items that the user may choose.
      </p>
      <p>
        There are two dominant approaches for computing
recommendations for the active user — the user that is currently
interacting with the application and the recommender system.
First, the collaborative filtering approach [
        <xref ref-type="bibr" rid="ref10 ref5">5, 10</xref>
        ] assumes that
users who agreed on preferred items in the past will tend to
agree in the future too. Many such methods rely on a matrix
R of user-item ratings to predict unknown matrix entries, and
thus to decide which items to recommend.
      </p>
      <p>
        A simple method in this family [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], commonly referred
to as user-user collaborative filtering, identifies a
neighborhood of users that are similar to the active user. A common
method for computing user similarity is the Jaccard
correlation Jaccard(u1; u2) = Iu1 \Iu2 where Iu is the set of items
I
u1 [Iu2
consumed by a user u. This set of neighbors is based on the
similarity of observed preferences between these users and the
active user. Then, items that were preferred by users in the
neighborhood are recommended to the active user. Another
approach [
        <xref ref-type="bibr" rid="ref23 ref3">23, 3</xref>
        ], known as item-item collaborative filtering
rely on the set of users that consumed two items i1 and i2.
One can compute, e.g., the Jaccard correlation between the
items: Jaccard(i1; i2) = UUii11 [\UUii2 where Ui is the set of users
2
who consumed item i. Then, the system can recommend to a
user u an item i2 that has high Jaccard similarity to an item i1
that u has previously consumed.
      </p>
      <p>
        A second popular approach is known as content-based
recommendation [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. In this approach, the system has access to a set
of item features. The system then learns the user preferences
over features, and uses these computed preferences to
recommend new items with similar features. Such recommendations
are typically titled “similar items”.
      </p>
      <p>In content based recommendations one can again take an
itemitem approach, computing the similarity between items based
on shared feature values, such as the leading actors, the same
director. or the same genre. Then, one can recommend an
item that has high similarity to an item that was previously
consumed by the user. One can also take a user-item approach,
by computing a user profile — the set of feature values that
often appear in items consumed by the user, such as actors
that repeatedly appear in movies that the user has consumed,
or genres the the user often watches. Then, one can compute
the similarity of an item to the user profile to decide whether
to recommend the item to the user.</p>
      <p>It is widely agreed in the recommendation system research
community that in many domains, collaborative filtering
approaches produce better recommendations than content based
methods.</p>
      <p>
        A collaborative filtering approach that has gained much
attention in the recommender system community is the matrix
factorization [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13, 14, 15</xref>
        ], where the system attempts to factor
the rating matrix RjUj jIj into two matrices, PjUj k and Qk jIj,
for some small number k, where R P Q. One can consider
the matrix P as a set of latent user features, and Q as a set of
latent item features. An item i is considered to be
appropriate for a user u when the inner product pu qi is high. The
resulting latent feature vectors pu and qi typically do not have
a meaning that can be translated into content features, such as
actors or genres, but are associated with the user like-dislike
pattern of items. As such, explaining to the user why a
particular item was recommended to her, beyond the vague statement
that the system predicts that the item is a good match for the
user, is difficult.
      </p>
      <p>Another state of the art collaborative filtering approach is the
variational autoencoder (VAE). An autoencoder (AE) neural
network is an unsupervised learning algorithm, attempting to
produce target values equal to the input values, y(i) = x(i). The
autoencoder tries to learn a function hW;b(x) x where W and
b is the set of weights and biases corresponding to the hidden
units in the deep network.</p>
      <p>While the input and output layers of the network are large,
there is an inner low dimensional layer within the network.
Thus, the network learns a lower dimension representation
of the input, the latent space. The autoencoder operates in
two phases, an encoder that reduces the input into a compact
representation in the low dimension layer, and a decoder,
responsible for reconstructing the encoded representation into
the original input.</p>
      <p>In the recommendation system task, the input is a user partial
item choice vector r(u), e.g., a vector of all movies in the
system, where only movies that the user has watched receive
a value of 1. The reconstruction of the input at the output
layer contains higher scores for items that the user is likely to
choose.</p>
      <p>
        RELATED WORK
Explainable recommendations provided to users may help
them understand why certain items are appropriate for them.
By clarifying these reasons, explanations can improve the
transparency, persuasiveness, effectiveness, trustworthiness,
and user satisfaction from the recommender system [
        <xref ref-type="bibr" rid="ref11 ref27 ref31">27, 11,
31</xref>
        ]. While earlier recommenders were often naturally
explainable, modern models are more complex, and do not yield
natural explanations. Studies in explainable recommendations
hence address the challenge of providing human
understandable explanations for items recommended by complex models.
There are two main approaches to providing explainable
recommendations [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. The first approach attempts to create
interpretable recommendation models whose results can be
naturally explained. However, many modern models are often
not naturally explainable, and making them more explainable,
often results in reduced recommendation accuracy. This line
of research therefore aims at mitigating the trade-off between
accuracy and explainability by including explainable
components, layers or external information into non-linear complex
and deep accurate models to make them explainable.
Examples of such solutions for MF-based recommendation models
include the work by [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], who applied sentiment analysis over
user reviews, to learn users preferences related features of
items that served as a basis for latent feature tables.
Additional examples can be found for deep learning
recommendation models, such as the work by [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], that learned the
distribution of user attention over features of different items
that serve as explanations. These algorithms try to analyse the
meaning of each latent component in a neural network, and
how they interact with each other to generate the final results.
The second approach is post-hoc and model-agnostic [
        <xref ref-type="bibr" rid="ref12 ref4">12, 4</xref>
        ].
It treats the model as a black box and explains the
recommendation results in a rational way by identifying relations
between the data provided as input to the recommender
system and its recommended items. This analysis is decoupled
from the recommendation model, considering only the model
input and output. The post-hoc approach has the advantage of
enabling explanations in scenarios where the recommendation
model cannot be exposed. Although the post-hoc explanations
presented to a user are not transparent, i.e., they do not reflect
the computation used by the underlying model to provide
recommendations, they commonly present rationale, plausible
information for the user.
      </p>
      <p>
        Some post-hoc explainable recommendation models use
statistical methods to analyze the influence of the input on the
output [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. These methods often require heavy computations
to provide explanations. Other studies apply various deep
learning reinforcement learning methods to build explanation
models using various types of networks. These studies [
        <xref ref-type="bibr" rid="ref19 ref30">30,
19</xref>
        ] are commonly based on static explanation templates, result
in complex models, and require parameter tuning.
Post-hoc methods are built on the assumption, that we
investigate in this paper, that an explanation that makes sense to the
user, even if it is not the exact reason that the
recommendation was indeed issued, is acceptable to users and may have a
beneficial effect for the recommendation system.
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] suggested that providing explanations to users alongside
a recommendation can help users to make more informed
decisions about consuming the item. They used 3 post-hoc
methods — keyword similarity, neighbors ratings, and what
they call influential item computation — to explain
recommendations generated by a hybrid content-based and collaborative
system rating prediction system. They run a small scale user
study in a books domain, attempting to understand which
explanation provided the most information for the user to best
understand the quality of the recommended item for her. Our
paper can be seen as an extension of their preliminary work,
describing a general framework for post-hoc explanations
using simple methods, suggesting additional explanation types,
and conducting a thorough user study in the movies domain,
evaluating many more research questions.
[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] also extended the work of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] by suggesting a different
post-hoc method, applying association rule mining on the
input data – the user-item rating table. The mining results with
association rules, sorted by their confidence and support, that
reflect links between items. Those links form the explanations
that are provided to users whose input data include antecedents
of the rule. The explanations, however, unlike our approach,
are limited to item-based collaboration-like statements (i.e.,
“item X is recommended because item Y was consumed”), and
require the application of some association mining algorithm
(e.g., the a-priori algorithm that the authors used [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). Rule
mining algorithms typically require heavier computations than
our simple similarity-based computations. They also defined
Model Fidelity, the portion of recommendations that can be
explained. Post-hoc explanations may not always apply to
all recommendations, and the goal is to provide high model
fidelity.
      </p>
      <p>
        In a gaming application, Frogger, [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] created a system that
generated simple rational explanations of the agent state and
actions rather than complex detailed explanations. They showed
good perception of the rationales by users, further
supporting our hypothesis that simple post-hoc explanations are well
received by users.
      </p>
      <p>The post-hoc explanation approach that we propose in this
paper emphasizes simplicity, flexibility, and the ease of its
application. Our method supports simple similarity based models,
collaborative and content-based, as well as other simple
posthoc methods. This allows users to choose their preferred type
of explanation. The main tunable parameter in our approach is
the method-specific threshold for deciding which explanation
is sufficiently supported to be presented to the user.
GENERATING POST-HOC EXPLANATIONS USING
SIMPLE METHODS
We now present our framework for providing post-hoc
explanations for complex model recommendations. The framework
is presented in Figure 1.</p>
      <p>Our method for generating recommendation along with
plausible explanation operates in several stages. First, a black box
recommendation model receives as input the user-item rating
matrix and outputs a recommendation. Although in this paper
we focus on collaborative filtering methods, this approach can
be applied to other methods, such as content-based
recommenders, that employ data sources other than the user-item
matrix.</p>
      <p>In the second stage, the recommended item is given as input
to several explaining algorithms. In addition, each explanation
algorithm receives as input additional required data sources.
These explanation methods can access the data sources
available to the recommender, but also other data sources as needed.
For example, a possible explanation is the popularity of the
item. The algorithm which produces this explanation requires
data over item popularity. Another possible explanation
approach is a content-based item-item method, which requires
as input item content information.</p>
      <p>The explaining algorithm is also a recommendation method,
that produces a recommendation score for items, or a ranking
of recommended items for the user. We use the explaining
algorithm to generate such a score for the recommended item.
The algorithm returns an explanation only if the
recommendation score is sufficiently high. We use a method-specific
threshold to decide whether the explanation is sufficiently
relevant.</p>
      <p>The explanations provided by all explaining algorithms are fed
into a filter. All plausible explanations received from the
explaining algorithms are filtered and one explanation is chosen
to be shown to the user. For example, such a filter can be based
on user preferences, or on the observed response of the user to
different types of explanations. Choosing the explanation with
the best score from the explanation algorithms is problematic,
because these scores are not calibrated, that is, each explaining
algorithm may use a different scale of scores.</p>
      <p>USER STUDY
As we have explained above, we study the participant
perception of the provided explanations. We now describe a
user study applying our approach to a movie
recommendation application, in which participants evaluate recommended
movies, with and without explanations. The participants also
provide their preferences over possible explanations for a
recommended movie.</p>
    </sec>
    <sec id="sec-2">
      <title>More formally, we study two hypotheses: Users prefer short post-hoc explanations generated by simple methods over a complete explanation of the mechanism of complex models.</title>
      <p>Presenting a post-hoc explanation to the user influences the
user acceptance of a recommended movie.</p>
      <p>We now explain the structure and process of the user study —
the dataset and algorithms used to generate the
recommendations and the explanations, and the different parts of the study.
We then discuss the results that we observed.</p>
      <p>Dataset and Algorithms
Our study is implemented in the movie recommendation
domain using the Kaggle movies dataset 1, containing both
ratings from MovieLens, as well as movie content data from
TMDB. 2
The dataset originally contains 45,000 movies. We filtered
the dataset for two reasons — first, as we are interested in
participants opinion over the presented movies, we prefer to
limit our attention to relatively popular movies, to increase the
likelihood that the participant is familiar with a recommended
movie. Moreover, we observed that the complex models that
we use provide less appropriate recommendations when the
input movies have a relatively low number of user opinions.
As we are not truly interested in evaluating the quality of the
complex models, but rather the participant perception of the
recommended movies, with and without explanations, we
prefer to limit the models to items that are easier to recommend.
We hence choose to use only movies with more than 500
ratings, resulting in 3878 movies. We used all users who rated
at least one of these movies, resulting in 122,147 users, and
5.7 million user-movie ratings.</p>
      <p>
        For generating the recommendations, we use two complex
models, an MF recommender that we implemented locally,
and a variational autoencoder (VAE) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>1https://www.kaggle.com/rounakbanik/the-movies-dataset 2https://www.themoviedb.org/</title>
      <p>For generating the explanations, we implemented 6
simple-toexplain algorithms. Each algorithm receives as input a user
profile, and an item (ir) that was recommended by a complex
model (VAE or MF), and generates a recommendation score
for that item. In addition, different algorithms take as input
different data sources.</p>
      <p>Popularity (denoted POP in the tables below): we compute
the popularity of ir in our dataset. If the movie is
sufficiently popular, we can explain the recommendation by the
movie being popular. The resulting explanation reads “This
movie is popular. Many users have watched it.”. We set the
threshold here to the 50 most popular movies.</p>
      <p>Item-item content based (denoted I2ICB): for each movie
j that the user has rated, we compute a content similarity
score between j and ir, and take the item in the user profile
with the maximal score. The content similarity is computed
using the Jaccard score between the movies cast (top 5
actors only), genres and director. The resulting explanation
is based on the particular content features that the items
share. For example, ”This movie was recommended to you
because you liked j in the past, and the actor c played in
both movies, and both movies are of genre g.”
User-item content-based (denoted UICB): we generate a
user profile from the list of movies that the user has liked.
The profile contains a score for each actor, director, and
genre, based on the amount of times that a content attribute
value, e.g., a specific actor, appeared in the movies that
the user has liked. We then compute a weighted Jaccard
score between the user profile, and ir content attributes.
The resulting explanation is based on the specific content
attributes that the user profile and the item have in common.
For example, such an explanation may be “This movie was
recommended to you because it was directed by d, and you
have liked other movies that d directed.”
Item-item overview (denoted I2ID): for each movie j that
the user has rated, we compute the description similarity
between j and ir, and take the item in the user profile with
the maximal score. The textual similarity between item
description is computed using TF-IDF. The explanation in
this case is: ”This movie was recommended to you because
you liked j in the past, and both movies have a similar
description.”
Item-item collaborative filtering (denoted I2ICF): we
compute the item-item Jaccard score, that is, the number of
users who have watched both movies, divided by the union
of the number of users who have watched at least one of
the movies. The explanation here reads “This movie was
recommended to you because you have watched movie m,
and many people who like m also like this movie.”
User-user collaborative filtering (denoted U2UCF): we
compute the user neighborhood using the Jaccard similarity
between the sets of movies that each user has liked. Then, we
compute the portion of similar users who have watched the
recommended movie. This explanation reads “This movie
was recommended to you because x% users who like the
same movies that you did, also like this movie.”
Default explanation (denoted DEF): this is a strawman
explanation that provides no additional information to the
user, reading “Our system predicts that this movie is a good
match for you.”
For each explanation algorithm we manually tune a threshold
specifying whether the explanation is sufficiently relevant to
be shown to the user. We leave a smarter tuning of these
thresholds to future research.</p>
      <p>Population
We recruited to the study mostly engineering students from
different academic institutes. The subjects who completed
the study entered a raffle for a cash prize. Some subjects
were given, in addition, a credit in an academic course. We
recruited the subjects by sending an email to several mailing
lists, asking people to participate in a study over recommender
systems for movies.</p>
      <p>Overall, we recruited 207 participants, 131 males, and 73
females (3 preferred not to specify gender). 24% of the
participants were graduate students, 53% were undergrad students,
and 23% had high school education only. 55% of the
participants were 25 years old or younger, 35% were between 25-30,
and 10% were above 30 years old.</p>
      <p>Some of the participants have a background in
recommendations system or related fields. 103 have taken a course in
machine learning, 67 have taken a course in deep learning, 56
participants have taken a course in information retrieval, and
52 have taken a course in recommendation systems. 40% of
the participants have not taken any course in those Fields.
16% reported watching a few movies each week, 46% reported
watching a movie once a week, 32% once a month, and the
rest (6%) almost never watch a movie Netflix is the
leading movie watching channel (75%). 46% reported watching
movies at the theater, 40% watch downloaded movies, and
36% watch movies on broadcast channels. We did not detect
any significant difference between the various populations in
the participant behavior and answers below.</p>
      <p>We asked the participants how they decide which movie to
watch. 78% use recommendations from friends, 56% read
movie reviews online or in the newspaper, 25% report
using some automated system to recommend movies, and 20%
watch whatever is currently on. 81% are familiar with
personal movie recommendations in Netflix. When asked about
the quality of the Netflix recommendations , 62% reported
that they sometimes liked the recommendations, 18% almost
always like Netflix recommendations, 13% mostly do not like
the recommendations, and 7% reported never liking these
recommendations. Netflix presents some shows or movies
under the title, "Because you watched X". 58% of the
participants claimed that they are likely to explore recommended
movies under this title, whereas 35% said that they may
explore these recommendations, and 7% will not explore such a
recommended movie.</p>
      <p>Method
We now describe the process of the user study, explaining
the different tasks that the test subjects performed. As we
explained above, the subjects were asked to participate in a
user study over movie recommendations. The invitation email,
as well as the instructions at the beginning of the study, did not
mention explanations. Specifically, the subjects were told that
they are asked to evaluate the recommendations of a system.
Step 1: Creating a User Profile. After an instruction screen,
we asked each subject to choose 5 movies which she likes
(Figure 2). Using these movies, we created a CF user profile that
is used as input to the black box recommendation algorithms
— MF and VAE.</p>
      <p>Once the participant clicks on the “Let’s go” button, we
compute two lists of 3 recommendations. For each black box
algorithm, we compute two recommended movies using the
provided user profile. In addition we add to each of the two
recommendation lists, a randomly selected movie from the top
100 popular movies according to the IMDB popularity score.
These popular movies allow us to evaluate the participant
opinion over non-personalized recommendations.
Step 2: Rating Recommendations Without Explanations.</p>
      <p>During this step, the user was provided and was requested
to evaluate the above two sets of recommended movies, that
were presented without any explanations. The opinions of
the participants over these sets serve as a baseline for the
performance of the recommendation algorithms, without the
influence of an explanation.</p>
      <p>We present the recommended movies to the participant in two
different screens, one containing 2 MF recommendations and
one popular movie, and the other containing 2 VAE
recommendations and one popular movie (Figure 3). The order of
the systems, as well as the ordering within the 3 movies, is
random.</p>
      <p>Throughout the study, we avoid presenting to the subject
recommended movies that were previously shown to her. If both
algorithms agree on a recommended movie, we take the next
movie on the recommendation list.</p>
      <p>The subject rates each recommended movie in a 1-5 scale.
Again, clicking on the movie poster allowed the subject to
explore movie data from IMDB.</p>
      <p>Step 3: Rating Recommendations with Explanations. We now
use the black box recommenders to produce two additional
recommended movies. We enrich the user profile by adding all
recommended movies that the subject rated 4 or 5 in the first
step, and avoid recommendations that were already presented
in the first step.</p>
      <p>In addition, we apply all the explanation generation algorithms
above. We use only an explanation whose score is higher than
the method-specific threshold required to be considered
acceptable. From all acceptable explanations, we choose one
explanation randomly. In cases where none of the algorithms
returned a plausible explanation, we show a default
explanation.</p>
      <p>We used in this step 3 different methods for showing the
explanations:</p>
      <p>Hidden: we place below the recommended movie a button
saying “Why is this movie appropriate for me?”.
Clicking on the button opened a popup windows containing the
explanation.</p>
      <p>Teaser: we place below the recommended movie the
beginning of the explanation, followed by an ellipsis. Clicking
(a) Hidden explanation
(b) Explanation teaser
(c) Explicit explanation</p>
      <p>on the ellipsis opened a popup window containing the
explanation.</p>
      <p>Visible: We place below the recommended movie the
explanation.</p>
      <p>This allows us to check whether the participants are
interested in an explanation, and whether they actively seek an
explanation. We use a between-subjects setting here, that is,
each participant was allocated to one of the 3 groups, to avoid
over emphasizing the explanations due to the variations in
presentation. We again ask each participant to rate 2 sets of
recommendations, each containing 3 recommended movies,
as in the previous step.</p>
      <p>Step 4: Rating Explanations. In the final step of the user study
we explicitly ask the subject to rate possible explanations. We
again add to the user profile the successful recommendations
from the previous steps, and ask for additional
recommendations from the two black box algorithms, MF and VAE.
In this step, unlike the previous steps, we present to the
participant a single recommended movie. In addition, we present
movie content information, such as the actors, the genres, and
the description, without requiring the participants to explicitly
request for such details (Figure 5).</p>
      <p>We first ask the participant to state whether she likes the
recommended movie, and then present a set of explanations as to
why the movie was recommended. We show all explanations
that are deemed sufficiently appropriate, achieving a score
higher than the method specific threshold. The participant is
asked to rate each explanation on a scale of 1-5.</p>
      <p>Each participants is shown 6 different movies in this step as
well, 3 of which were generated by each of the two black box
recommenders, and ordered randomly.</p>
      <p>To summarize, we present to each participant in the steps 2-4
18 different recommendations, 7 recommendations from each
of the complex models, MF and VAE, and 4 additional popular
movies.
a set of question concerning the demographic details. Then,
we ask the participants questions about their movie watching
habits, and their previous interactions with recommender
systems. Finally, we ask questions regarding their opinions about
the presented explanations.</p>
      <p>Results
We now discuss the user study results. We first review the
effects of explanation on the subject perception of a
recommended movie, then, we discuss subject opinion over the
various explanation types. Finally, we discuss the fidelity of
the various explanation methods.</p>
      <p>Effects of Various Explanations on Movie Ratings
We now study the effect that the explanations had on the
subject opinion over the recommended movies, comparing
the average rating for movies without explanations and with
explanations. As we explain above, in Step 3, there were 3
options for explanation presentation — hidden, requiring click
a button, teaser, showing only the beginning of the explanation,
and fully presenting the explanation.</p>
      <p>Post Study Questionnaire. After finishing step 4 above, the
subject is transferred to an online questionnaire. We first ask
Somewhat surprisingly, only for 24% of the recommendations
in the first case, the participant clicked on the button, and only
in 14% of the recommendations in the second case the
participant clicked on the teaser. That is, in most recommendations
the participants did not look at the explanations. In informal
discussions following the study, participants indicated that
they did not see the option to request an explanation, or did
not think that they needed an explanation to decide whether
the recommended movie is appropriate.</p>
      <p>Thus, we group together here both movies in Step 2 for which
no explanation was shown, and movies in Step 3 shown to
participants who did not click on an explanation button or teaser.
We compare this group to recommendations for which the
explanation was shown. Below, when we discuss significance,
we base our claims on a paired t-test.</p>
      <p>Table 1 compares the average rating for each one of the
plausible explanations, and without explanations. First, although
this is not the focus of the study, the VAE method produced
better recommendations than the MF method, which produced
better results than recommending a random popular movie.
Looking at the explanations, we can see that the user-item
content-based explanation was shown only 4 times, and is
hence not statistically different than other explanations. The
popularity, and the default explanations result in the lowest
rating than all other explanations. That is, movies presented
with either CF or CB explanations produce significantly higher
ratings than the non-personal popularity explanation and the
non-informative default explanation.</p>
      <p>While the differences between the ratings can be attributed to
the presented explanation, there is another plausible reason
for these differences. It might be that recommended items
for which a specific recommendation type applies are better
recommendations. For example, it may be that when a
recommended item has a strong item-item Jaccard correlation
with an item in the user profile, it is considered as a better
recommendation for the user, whether we explicitly tell the
user about it or not.</p>
      <p>Table 2 shows the average rating over movies that we were
able to explain through one of the methods although the
explanation was not shown to the subject. This occurs either
in Step 2, or in Step 3 where the subject did not click on the
explanation button or teaser. As can be seen, similar to the
ratings in Table 1, movies for which a user based explanation
exists, as well as movies with similar descriptions, receive
a statistically significant (t-test p-value=0.046) higher user
rating than movies for which an item-item based explanation
holds. These, in turn, receive a statistically significant higher
rating than movies for which only the popularity explanation
holds. Finally, movies for which none of our explanation types
hold, receive the lowest rating.</p>
      <p>To conclude, on the one hand it is unclear whether the
explanations that we suggest themselves truly affect the subject
behavior. On the other hand, it appears that these explanations
are well correlated with the way that participants perceive
a recommended movie, and decide whether to rate it higher.
As such, it may be that our explanations indeed capture a
part of the subject decision process for her opinion over a
recommended movie.</p>
      <p>User Ratings for Explanations
As we explained above, in Step 4 we asked the participants to
rate various explanations for a given recommendation. Table 3
shows the explanation ratings provided by the participants.
Somewhat surprisingly, all explanations, including the default
explanation, received a relatively positive (above 3 on a
15 scale) rating. The only explanations that the participants
significantly liked less, are the popularity explanation, and the
user-item content-based explanation.</p>
      <p>The latter is especially surprising, given that movies for which
this explanation was shown, or for which this explanation
holds, receive the highest user ratings in the results reported
in Table 1 and Table 2. We believe that the relatively lower
subject opinion for this type of explanation may be attributed
not to its content, but rather to its length. As we discuss
below, in the post-study questionnaire, participants reported
that they prefer short explanations. This explanation is by far
the longest. Note that the item-item content-based explanation
may also appear to be long, in practice it is not; For the content
based explanations we report all properties (actors, genres,
director) that apply. A recommended movie typically has
more joint property value with the user profile, containing
all the movies that the user has liked (i.e., UICB), than with
a single movie that the user has liked, which entails longer
explanations for UICB.</p>
      <p>Explanation Fidelity
Finally, we evaluate the explanation fidelity — the portion of
recommended items for which each explanation type holds.
Table 4 shows the empirical fidelity of the various
explanations with respect to all recommended items in our study in
the Steps 2-4. We note that the fidelity is highly sensitive to
the thresholds that we set to decide which explanation is
sufficiently valid to be presented. We leave an automated careful
tuning of these thresholds to future research.</p>
      <p>As can be seen, collaborative filtering fidelity is always higher
than its content-based counterpart, which is not surprising,
because the black box recommenders are collaborative filtering
methods. Item-item explanations have higher fidelity than
userbased explanations. This is not surprising, given the relatively
small user profiles that we use.</p>
      <p>
        It is especially interesting to look at the difference in
contentbased fidelity between the MF method that we use and VAE.
Together with Table 2, this may explain the lower quality
of recommendations computed by our MF implementation.
The movies recommended by the MF method have very low
content similarity to the movies that the subject has liked, and
this may be the reason that participants rate them lower.
Overall, as can be seen in the bottom line of Table 2, 65% of
the recommended movies could be explained by at least one of
our suggested methods (except for the popularity explanation).
[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] report a model fidelity of 84% at most for their created
association rules. Our model fidelity is sensitive to the
thresholds that we set to accept an explanation. We may be able to
increase the model fidelity with more accurate and personal
tuning of these thresholds.
Post Study Questionnaire Results
We now discuss the participant answers to the questions
concerning the explanations at the post study questionnaire. The
responses below are hence biased given the explanations
shown throughout the study, and may not reflect the subject
opinion prior to the study.
70% of the participants reported noticing the explanations in
our study, 24% noticed them only sometimes, and 6% reported
not noticing the explanations at all. 60% of the participants felt
that the explanations were mostly appropriate, 26% felt they
were sometimes appropriate, only 1% felt that the explanations
were always appropriate, and 3% felt that they were never
appropriate. 71% of the participants thought that explanations
can help understand the recommendation, and may influence
the decision on considering the recommended item. 23%of the
participants said that an explanation is interesting, but would
not change their opinion over the movie. 5% responded that an
explanation is not important at all, and 1% said they ignore all
recommendations and hence the explanations are not relevant.
Similar results were reported before for the importance of
explanations in recommendation systems [
        <xref ref-type="bibr" rid="ref12 ref27">12, 27</xref>
        ].
We also asked in an open, non-obligatory question to state
an explanation that they liked the best. 93 of the participants
choose to answer. We categorized their free text answers
into groups. 52% of the responses were related to
contentbased explanations. 33% preferred the collaborative filtering
explanations. 10% liked the popularity explanations, and 4%
liked the default explanation. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] reports similar preference
for content based explanations over CF explanations.
Figure 6 shows the participants responses about the
importance of various properties of an explanation. We can see
that the property that was deemed most important is that an
explanation should be easy to understand. Participants also
thought that an explanation should be accurate, convincing,
and short. We believe that this explains the relatively low
opinion of the participants concerning the content-based user-item
explanation which we reported above, as this explanation is
quite long.
      </p>
      <p>
        The only property that was not deemed as important by the
participants is whether the explanation fully explains the
recommendation mechanism. This is in somewhat in conflict
with many research attempts in the recommender system
community [
        <xref ref-type="bibr" rid="ref25 ref27">25, 27</xref>
        ] that focus on providing an explanation of the
way that the models operate. It appears that users, at least
the participants of our study, prefer an explanation that will
help them decide whether the recommended item is
appropriate for them, than to understand the mechanism behind the
recommendation engine.
      </p>
      <p>When asked if they would like to get such recommendations in
a system that they use (e.g. Netflix), 62% answered positively,
31% answered maybe and the rest (7%) answered no.
These findings, that 94% of the participants found many of our
explanations to be appropriate, and that most people would
have liked to see such explanations in a system that they use,
together with the relatively low importance of revealing the
recommendation engine behavior, further support our intuition,
that post-hoc explanations generated by simple methods can
provide valuable information that users appreciate.
CONCLUSION
In this paper we suggest a simple method for generating
posthoc explanations for recommendations generated by complex,
difficult to explain, models. We use a set of easy to explain
recommendation algorithms, and when their output agrees
with the recommendation of the complex model, consider the
explanation of the simple model as a valid explanation for the
recommended item. While these explanations are clearly not
transparent, we argue that they provide valuable information
for the users in making decisions concerning the recommended
items.</p>
      <p>We study two research questions. First, whether users prefer
our simple post-hoc explanations to explanations of the
mechanism of the neural network or the matrix factorization model.
Indeed, in our post study questionnaire, users stated that it is
more important for an explanation be short and clear, than to
fully explain the algorithm.</p>
      <p>Second, we checked whether presenting a post-hoc
explanation influences the behavior of users. For some of our
explanations, namely, the I2ICB explanation and the I2ID
explanation, the average rating was higher when an explanation was
presented than the average rating when no explanation was
presented. For other explanations, this did not hold. We
speculate that this was due to the explanation length and complexity.
Perhaps a future, simpler phrasing of the explanation would
lead to more pronounced effects.</p>
      <p>To support our claims, we use a user study in the movie
domain, showing that some explanations may affect the user
opinion over the recommended item. We also show that movies
that can be explained by our method may be better items to
recommend. We evaluate subject opinion over the different
explanations that we suggest, showing that participants preferred
item-item explanations to user-based explanations. The
subjects also stated that it is more important for an explanation to
be easy to understand, convincing, and short, than to uncover
the underlying operation of the recommendation engine.
Our method can be easily extended by using additional
explainable recommenders. In the future we will apply more
methods. We will also study methods for automatically
selecting a method-specific threshold for deciding if an explanation
is valid, instead of the manually tuned threshold that we
currently use.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Rakesh</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , Ramakrishnan Srikant, and others.
          <source>1994</source>
          .
          <article-title>Fast algorithms for mining association rules</article-title>
          .
          <source>In Proc. 20th int. conf. very large data bases, VLDB</source>
          , Vol.
          <volume>1215</volume>
          .
          <fpage>487</fpage>
          -
          <lpage>499</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Fernando</given-names>
            <surname>Amat</surname>
          </string-name>
          , Ashok Chandrashekar, Tony Jebara, and
          <string-name>
            <given-names>Justin</given-names>
            <surname>Basilico</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Artwork personalization at netflix</article-title>
          .
          <source>In Proceedings of the 12th ACM Conference on Recommender Systems. ACM</source>
          ,
          <volume>487</volume>
          -
          <fpage>488</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Oren</given-names>
            <surname>Barkan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Noam</given-names>
            <surname>Koenigstein</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Item2vec: neural item embedding for collaborative filtering</article-title>
          .
          <source>In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)</source>
          .
          <source>IEEE</source>
          , 1-
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Mustafa</given-names>
            <surname>Bilgic and Raymond J Mooney</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Explaining recommendations: Satisfaction vs promotion</article-title>
          .
          <source>In Beyond Personalization Workshop</source>
          , IUI, Vol.
          <volume>5</volume>
          . 153.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>John</surname>
            <given-names>S Breese</given-names>
          </string-name>
          , David Heckerman,
          <string-name>
            <given-names>and Carl</given-names>
            <surname>Kadie</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Empirical analysis of predictive algorithms for collaborative filtering</article-title>
          .
          <source>In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence</source>
          . Morgan Kaufmann Publishers Inc.,
          <fpage>43</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Jingwu</given-names>
            <surname>Chen</surname>
          </string-name>
          , Fuzhen Zhuang, Xin Hong, Xiang Ao, Xing Xie, and
          <string-name>
            <given-names>Qing</given-names>
            <surname>He</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Attention-driven factor model for explainable personalized recommendation</article-title>
          .
          <source>In The 41st International ACM SIGIR Conference on Research &amp; Development in Information Retrieval</source>
          .
          <fpage>909</fpage>
          -
          <lpage>912</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Weiyu</surname>
            <given-names>Cheng</given-names>
          </string-name>
          , Yanyan Shen,
          <string-name>
            <given-names>Linpeng</given-names>
            <surname>Huang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Yanmin</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Incorporating Interpretability into Latent Factor Models via Fast Influence Analysis</article-title>
          .
          <source>In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          .
          <fpage>885</fpage>
          -
          <lpage>893</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Hendrik</given-names>
            <surname>Drachsler</surname>
          </string-name>
          , Katrien Verbert, Olga C Santos, and
          <string-name>
            <given-names>Nikos</given-names>
            <surname>Manouselis</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Panorama of recommender systems to support learning</article-title>
          .
          <source>In Recommender systems handbook</source>
          . Springer,
          <fpage>421</fpage>
          -
          <lpage>451</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Upol</given-names>
            <surname>Ehsan</surname>
          </string-name>
          , Pradyumna Tambwekar, Larry Chan,
          <string-name>
            <given-names>Brent</given-names>
            <surname>Harrison</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Mark O</given-names>
            <surname>Riedl</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Automated rationale generation: a technique for explainable AI and its effects on human perceptions</article-title>
          .
          <source>In Proceedings of the 24th International Conference on Intelligent User Interfaces</source>
          .
          <fpage>263</fpage>
          -
          <lpage>274</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Michael</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Ekstrand</surname>
          </string-name>
          , John T Riedl,
          <article-title>Joseph A Konstan, and</article-title>
          <string-name>
            <surname>others.</surname>
          </string-name>
          <year>2011</year>
          .
          <article-title>Collaborative filtering recommender systems</article-title>
          .
          <source>Foundations and Trends® in Human-Computer Interaction 4</source>
          ,
          <issue>2</issue>
          (
          <year>2011</year>
          ),
          <fpage>81</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Bruce</surname>
            <given-names>Ferwerda</given-names>
          </string-name>
          , Kevin Swelsen, and
          <string-name>
            <given-names>Emily</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Explaining Content-Based Recommendations</article-title>
          . New York (
          <year>2018</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Jonathan</surname>
            <given-names>L Herlocker</given-names>
          </string-name>
          ,
          <article-title>Joseph A Konstan,</article-title>
          and John Riedl.
          <year>2000</year>
          .
          <article-title>Explaining collaborative filtering recommendations</article-title>
          .
          <source>In Proceedings of the 2000 ACM conference on Computer supported cooperative work</source>
          .
          <volume>241</volume>
          -
          <fpage>250</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Yehuda</given-names>
            <surname>Koren</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Factorization meets the neighborhood: a multifaceted collaborative filtering model</article-title>
          .
          <source>In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM</source>
          ,
          <volume>426</volume>
          -
          <fpage>434</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Yehuda</given-names>
            <surname>Koren</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Factor in the neighbors: Scalable and accurate collaborative filtering</article-title>
          .
          <source>ACM Transactions on Knowledge Discovery from Data (TKDD) 4</source>
          ,
          <issue>1</issue>
          (
          <year>2010</year>
          ),
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Yehuda</given-names>
            <surname>Koren</surname>
          </string-name>
          and
          <string-name>
            <given-names>Robert</given-names>
            <surname>Bell</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Advances in collaborative filtering</article-title>
          .
          <source>In Recommender systems handbook</source>
          . Springer,
          <fpage>77</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Dawen</surname>
            <given-names>Liang</given-names>
          </string-name>
          , Rahul G Krishnan,
          <string-name>
            <surname>Matthew D Hoffman</surname>
            , and
            <given-names>Tony</given-names>
          </string-name>
          <string-name>
            <surname>Jebara</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Variational autoencoders for collaborative filtering</article-title>
          .
          <source>In Proceedings of the 2018 World Wide Web Conference</source>
          .
          <volume>689</volume>
          -
          <fpage>698</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Pasquale</surname>
            <given-names>Lops</given-names>
          </string-name>
          , Marco De Gemmis, and
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Semeraro</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Content-based recommender systems: State of the art and trends</article-title>
          .
          <source>In Recommender systems handbook</source>
          . Springer,
          <fpage>73</fpage>
          -
          <lpage>105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Jie</surname>
            <given-names>Lu</given-names>
          </string-name>
          , Dianshuang Wu, Mingsong Mao,
          <string-name>
            <given-names>Wei</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Guangquan</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Recommender system application developments: a survey</article-title>
          .
          <source>Decision Support Systems</source>
          <volume>74</volume>
          (
          <year>2015</year>
          ),
          <fpage>12</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>James</surname>
            <given-names>McInerney</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Lacker</surname>
          </string-name>
          , Samantha Hansen, Karl Higley, Hugues Bouchard, Alois Gruson, and
          <string-name>
            <given-names>Rishabh</given-names>
            <surname>Mehrotra</surname>
          </string-name>
          .
          <year>2018</year>
          . Explore, Exploit, and
          <article-title>Explain: Personalizing Explainable Recommendations with Bandits</article-title>
          .
          <source>In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>31</fpage>
          -
          <lpage>39</lpage>
          . DOI:http://dx.doi.org/10.1145/3240323.3240354
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Xia</surname>
            <given-names>Ning</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Christian</given-names>
            <surname>Desrosiers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>George</given-names>
            <surname>Karypis</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>A comprehensive survey of neighborhood-based recommendation methods</article-title>
          .
          <source>In Recommender systems handbook</source>
          . Springer,
          <fpage>37</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Georgina</given-names>
            <surname>Peake</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jun</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Explanation mining: Post hoc interpretability of latent factor models for recommendation systems</article-title>
          .
          <source>In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          .
          <year>2060</year>
          -
          <fpage>2069</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Francesco</surname>
            <given-names>Ricci</given-names>
          </string-name>
          , Lior Rokach, and
          <string-name>
            <given-names>Bracha</given-names>
            <surname>Shapira</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Recommender Systems: Introduction and Challenges</article-title>
          .
          <source>In Recommender Systems Handbook</source>
          .
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Badrul</surname>
            <given-names>Sarwar</given-names>
          </string-name>
          , George Karypis, Joseph Konstan,
          <string-name>
            <given-names>and John</given-names>
            <surname>Riedl</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Item-based collaborative filtering recommendation algorithms</article-title>
          .
          <source>In Proceedings of the 10th international conference on World Wide Web</source>
          .
          <fpage>285</fpage>
          -
          <lpage>295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Guy</given-names>
            <surname>Shani</surname>
          </string-name>
          and
          <string-name>
            <given-names>Asela</given-names>
            <surname>Gunawardana</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Evaluating recommendation systems</article-title>
          .
          <source>In Recommender systems handbook</source>
          . Springer,
          <fpage>257</fpage>
          -
          <lpage>297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Rashmi</given-names>
            <surname>Sinha</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kirsten</given-names>
            <surname>Swearingen</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>The role of transparency in recommender systems</article-title>
          .
          <source>In CHI'02 extended abstracts on Human factors in computing systems</source>
          .
          <volume>830</volume>
          -
          <fpage>831</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Brent</given-names>
            <surname>Smith</surname>
          </string-name>
          and
          <string-name>
            <given-names>Greg</given-names>
            <surname>Linden</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Two decades of recommender systems at amazon</article-title>
          .
          <source>com. Ieee internet computing 21</source>
          ,
          <issue>3</issue>
          (
          <year>2017</year>
          ),
          <fpage>12</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Nava</given-names>
            <surname>Tintarev</surname>
          </string-name>
          and
          <string-name>
            <given-names>Judith</given-names>
            <surname>Masthoff</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>A survey of explanations in recommender systems</article-title>
          .
          <source>In 2007 IEEE 23rd international conference on data engineering workshop</source>
          . IEEE,
          <fpage>801</fpage>
          -
          <lpage>810</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Nava</given-names>
            <surname>Tintarev</surname>
          </string-name>
          and
          <string-name>
            <given-names>Judith</given-names>
            <surname>Masthoff</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Explaining Recommendations: Design and Evaluation</article-title>
          . In Recommender Systems Handbook, Francesco Ricci, Lior Rokach, and Bracha Shapira (Eds.). Springer,
          <fpage>353</fpage>
          -
          <lpage>382</lpage>
          . DOI:http://dx.doi.org/10.1007/978-1-
          <fpage>4899</fpage>
          -7637-6_
          <fpage>10</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Hao</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Naiyan</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Dit-Yan Yeung</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Collaborative deep learning for recommender systems</article-title>
          .
          <source>In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining</source>
          .
          <volume>1235</volume>
          -
          <fpage>1244</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Xiting</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Yiru Chen, Jie Yang,
          <string-name>
            <surname>Le Wu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Zhengtao Wu</surname>
            , and
            <given-names>Xing</given-names>
          </string-name>
          <string-name>
            <surname>Xie</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A Reinforcement Learning Framework for Explainable Recommendation</article-title>
          .
          <source>2018 IEEE International Conference on Data Mining (ICDM)</source>
          (
          <year>2018</year>
          ),
          <fpage>587</fpage>
          -
          <lpage>596</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>Yongfeng</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Xu</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Explainable recommendation: A survey and new perspectives</article-title>
          . arXiv preprint arXiv:
          <year>1804</year>
          .
          <volume>11192</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Yongfeng</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, and Shaoping Ma.
          <year>2014</year>
          .
          <article-title>Explicit factor models for explainable recommendation based on phrase-level sentiment analysis</article-title>
          .
          <source>In Proceedings of the 37th international ACM SIGIR conference on Research &amp; development in information retrieval</source>
          .
          <volume>83</volume>
          -
          <fpage>92</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>