=Paper= {{Paper |id=Vol-2225/paper6 |storemode=property |title=From Recommendation to Curation: When the System Becomes your Personal Docent |pdfUrl=https://ceur-ws.org/Vol-2225/paper6.pdf |volume=Vol-2225 |authors=Nevena Dragovic,Ion Madrazo Azpiazu,Maria Soledad Pera |dblpUrl=https://dblp.org/rec/conf/recsys/DragovicAP18 }} ==From Recommendation to Curation: When the System Becomes your Personal Docent== https://ceur-ws.org/Vol-2225/paper6.pdf
                                       From Recommendation to Curation
                                              When the system becomes your personal docent

               Nevena Dragovic∗                                   Ion Madrazo Azpiazu                         Maria Soledad Pera
               MarkMonitor                                   People & Information Research Team      People & Information Research Team
             Boise, Idaho, USA                                      Boise State University                  Boise State University
     nevena.dragovic@markmonitor.com                                  Boise, Idaho, USA                        Boise, Idaho, USA
                                                                 ionmadrazo@boisestate.edu                 solepera@boisestate.edu

ABSTRACT                                                                        small set of diverse items could lead to an enriched recommenda-
Curation is the act of selecting, organizing, and presenting con-               tion process. In this paper, we present the algorithmic foundation
tent. Some applications emulate this process by turning users into              to make this possible, and facilitate future user studies. A number
curators, while others use recommenders to select items, seldom                 of applications, e.g. Pinterest, let the user be the decision maker:
achieving the focus or selectivity of human curators. We bridge                 they offer individualized selections and then allow the user play
this gap with a recommendation strategy that more closely mim-                  the role of a curator in choosing appealing items. However, given
ics the objectives of human curators. We consider multiple data                 the ability of a recommender to collect and examine large amounts
sources to enhance the recommendation process, as well as the                   of data about users and items, the system itself can get to know
quality and diversity of the provided suggestions. Further, we pair             users better—their interests and behaviors— and act as a curator.
each suggestion with an explanation that showcases why a book                      Due to scope limitations, we use books as a case study and
was recommended with the aim of easing the decision making pro-                 focus our research efforts on the techniques that lead to QBook, a
cess for the user. Empirical studies using Social Book Search data              curated book recommender (Figure 1). QBook does not only find
demonstrate the effectiveness of the proposed methodology.                      books that are appealing to a user, but also presents a meaningful
                                                                                set of suggestions with corresponding explanations that pertain
CCS CONCEPTS                                                                    to the various preferences of the user. QBook takes the role of the
                                                                                curator upon itself; make connections between suggested items and
• Information systems → Recommender systems;
                                                                                the reasons for their selection; and enriches the recommendation
                                                                                process by addressing issues that affect these systems: (1) Using
KEYWORDS                                                                        historical data, we capture suitable candidate items for each user;
Curation, Decision-making, Time series, Personalization, Diversity              (2) Understanding items’ metadata, we access potentially relevant
ACM Reference Format:                                                           items that otherwise might be ignored by solely relying on ratings;
Nevena Dragovic, Ion Madrazo Azpiazu, and Maria Soledad Pera. 2018. From        (3) Considering user reviews, we infer which item features each user
Recommendation to Curation: When the system becomes your personal               cares about and their degree of importance; (4) Examining experts’
docent. In Proceedings of Joint Workshop on Interfaces and Human Deci-          reviews, we ensure overall quality of books to be recommended;
sion Making for Recommender Systems (IntRS) 2018 (IntRS Workshop). ACM,         (5) Exploring the type of data sources a user favors, we learn about
8 pages.                                                                        the user and understand why he could be interested in an item;(6)
1    INTRODUCTION                                                               Analyzing users’ change in genre preferences over time, we better
                                                                                identify the current reading interests of individual users.
Recommenders have been studied for decades. Regardless of the                      QBook can be seen as an interpretable diversification strategy for
domain, they influence businesses’ success and users’ satisfaction.             recommendation, where evidence factors from (1)-(6) are combined
From a commercial standpoint, recommenders enable companies                     using a diversification technique adapted to each user’s interest.
to advertise items to potential buyers. From a user perspective,                Further, explanations on why each book was selected are provided
they enhance users’ experience by easing identification of items                so that the user can better select the book he is interested in.
of interests while addressing information overload concerns. The                   With this work, we improve research related to recommenders by
degrees of personalization recommenders offer, however, can be                  combining traditional approaches with novel preference matching
hindered by their limited ability to provide diverse enough sug-                methods into a single strategy that offers suggestions containing
gestions, restricting users’ exposure to new, prospective items of              information related to the interests of each user. Moreover, we
interest. This occurs because common recommendation strategies                  explicitly undertake diversity and personalization–key aspects as
rely on community data, thus suggesting the same items to similar               common collaborative filtering algorithms are known to not prop-
users, which can be vague and impersonal.                                       agate users’ preferences on diversity into their recommendations
    Inspired by the results of the work conducted by Willemsen et               [8]—by exploring user reviews and time analysis to understand
al. [45], who demonstrated that “diverse, small item sets are just              change of reading preference in time. To assess the effectivenesses
as satisfying and less effortful to choose from than Top-N recom-               of QBook we conduct experiments measuring the utility of individ-
mendations," we argue that emulating the curation process—the act               ual components as well as comparing the recommendation quality
of selecting, organizing, and presenting content [10]—to suggest a              of the system as a whole with respect to state-of-the-art systems.
∗ Work conducted while the author was a student at Boise State.
IntRS Workshop, October 2018, Vancouver, Canada                                                                                 Dragovic et al.


                                                                          works in this area focus on determining how different types of
                                                                          explanations influence users while making decisions [16]. Among
                                                                          the most common strategies we should highlight those based on
                                                                          exploring previous activity of a user [5], information collected from
                                                                          user reviews [46], and content-based tag cloud explanations. The
                                                                          goal of QBook is to provide explanations that reveal reasoning and
                                                                          data behind the recommendation process and contain other users’
                                                                          and experts’ (objective) opinions on item characteristics that are of a
                                                                          specific interest for each individual user. Many researchers consider
                                                                          sentiment-based explanations as more effective, trustworthy, and
                                                                          persuasive than the ones that capture relationship between previous
                                                                          activity of the user and suggested items [9]. We, however, follow
                                                                          the premise presented in [12] and do not include sentiment in order
                  Figure 1: Overview of QBook                             to make QBook look unbiased from users’ perspectives, i.e., we do
                                                                          not select feature descriptions based on their polarity.
2    RELATED WORK                                                            Recommenders & Curation. Few research works focus on
We discuss work pertaining to book recommenders, as well as the           simulating the curation process for recommendation purposes
use of explanations and curation for recommendation purposes.             [21, 22, 37]. In [21], the authors discussed the development of a
Recommenders & Books. A number of recommenders have been                  news application that learns from users’ interactions with the sys-
designed to generate suggestions that help users select suitable          tem while they swipe through provided news articles and like them.
books to read [4, 17, 34]. They are based on purchasing or rating         In this research, the authors use social networks and users’ brows-
patterns, click-through data, content/tag analysis, or ontologies.        ing history to create and recommend crowd curated content, but
   Some book recommenders are tailored towards specific group of          using users as curators. The work conducted by Saaya et al. [37],
users: By emulating the readers’ advisory service, the authors in         on the other hand, relies on a content-based strategy that considers
[34] describe the process of recommending books for K-12 students,        information authors collect from users’ profiles, which are then
based on the topics, contents, and literary elements that appeal to       managed by the users themselves. The most recent study conducted
each individual user, whereas K3Rec [35] uses information about           by Kislyuk at al. [22], combines a user-based curation method along
grade levels, contents, illustrations, and topics together with length    with a traditional collaborative filtering strategy to improve Pin-
and writing style, to generate suggestions for emergent readers.          terest’s recommendation process. The aforementioned alternatives
Garrido and Illarri [14] rely on content-based data for making book       simply let users organize suggested content based on their personal
suggestions. Their proposed TMR [14] examines items’ descriptions         preferences, since all three studies treat users as curators. However,
and reviews and relies on lexical and semantic resources to infer         no recommendation system takes the role of the curator upon it-
users’ preferences. However, TMR can only work if descriptions            self. We take a different approach and allow the system to take the
and reviews are available, unlike QBook, for which these are only         curator role using existing user and item data.
two of the multiple data points considered in the recommendation
process. The authors in [4] present a strategy based on graph anal-       3     THE ANATOMY OF QBOOK
ysis and PageRank that exploits clicks, purchasing patterns, and          Each step of QBook’s recommendation process addresses a particu-
book metadata. This strategy is constrained to the existence of a         lar research problem on its own: Can item metadata complement
priori pairwise similarity between items, e.g. “similar products",        the lack of historical data (and vice versa)? How can a time compo-
which is not a requirement for QBook. The authors in [38] highlight       nent influence recommendation systems?, Can experts’ opinions
the importance of book recommenders as library services, and thus         align with readers’ preferences?, Are users’ features of interests a
propose a fuzzy analytical hierarchy process based on a priori rules      valuable asset to a recommendation process?, How does curation
mining that depends upon the existence of book-loan histories. The        work as a part of a recommendation process?, Can personalized
empirical analysis presented in [38] is based on a limited and pri-       explanation aid users in selecting relevant suggestions?.
vate dataset, which constrains the task of verifying its applicability
on large-scale benchmark datasets. More importantly, the proposed         3.1    Selecting Candidate Books
strategy is contingent on book-loan historical information that due       To initiate the recommendation process (described in Algorithm 1),
to privacy concerns libraries rarely, if at all, make available.          QBook identifies a set of books CB from a given collection to be
   Recommenders & Explanations. An ongoing challenge faced                curated for a user U . The two strategies considered for candidate
by recommenders is to get users to trust them, as they still operate as   selection (i.e., matrix factorization and content-based filtering) com-
“black boxes" [18]. A powerful way to build successful relationships      plement each other and ensure diversity among candidate books.
between users and recommenders is by providing information on             While the former examines users’ rating patterns, the latter focuses
how each system works and why items are suggested [39]. This              on books characteristics and does not require user-generated data.
can be accomplished by including explanations that justify the            Moreover, by considering rating patterns and content, novelty of
suggestions, which are known to provide transparency and enhance          the recommendations increases, as users are exposed to a variety of
trust on the system [39]. Unfortunately, justifying the reasons why       books to chose from. We anticipate QBook to handle data sparsity
an item has been suggested to a user is not an easy task [16]. Recent     and cold start in this step, since even if candidate books do not
                                                                                                                      IntRS Workshop, October 2018, Vancouver, Canada

Algorithm 1 The Recommendation Process of QBook                                                        Table 1: Sample of terms associated with literary features
   Input: AB-Archived set of books, RS-Set of reviews, ER-Set of expert Reviews, K-# of recommenda-       Literary Element     Sample of Related Terms
   tions, RF-Trained Random Forest
   Terms: RSU -Reviews in RS written by U, RSb -Reviews in RS for b, ERb -Reviews in ER for b,            characters           stereotypes, detailed, distant, dramatic
   PU -Set of books read by U
   CandidateSet, Recommendations = empty set                                                              pace                 fast, slow, leisurely, breakneck, compelling
   Count=0                                                                                                storyline            action-oriented, character-centered
   for each user U do
       UPref=Ranked list of preferred literary elements using RSU                                         tone                 happy, light, uplifting, dark, ironic, funny
       CandidateSet=b ∈ AB with rU ,b > 3 OR CrU ,b > 3                                                   writing style        austere, candid, classic, colorful
       for each book b in CandidateSet do
           BPref=Ranked list of preferred literary elements using RSb
                                                                                                          frame                descriptive, minimal, bleak, light-hearted
           Sim= Similarity(UPref and BPref)
           sWNr=Polarity(ERb , SentiWordNet)
           sWNs=Polarity(lastSentence, ERb , SentiWordNet)                                            writing style. As defined in [34], each literary element (i.e., feature)
           cNLPr=Polarity(ERb , CoreNLP)
           cNLPs=Polarity(lastSentence, ERb , CoreNLP)
                                                                                                      is associated with a set of terms used to describe that element, since
          AU®,b = < rU ,b , CrU ,b , Sim, sWNr, sWNs, cNLPr, cNLPr >                                  different words can be used to express similar book elements. A
          AppealScore=GenerateRanking(RF , AU®,b )                                                    sample of literary elements and related terms is shown in Table 1.
          Recommendations=Recommendations+ < b, AppealScore >                                            QBook computes the overall frequency of occurrence of each fea-
      end for
      if U is active then                                                                             ture mentioned by U by normalizing the occurrence of the feature
          DP=Identify most-correlated data point for U
          GenrePref=GenreDistribution(ARIMA, PU )
                                                                                                      based on the number of reviews written by U . This score captures
          Recommendations=Sort(Recommendations, DP, GenrePref, K )                                    the importance (i.e., weight) of each particular feature for U .
      else
          Recommendations=Sort(Recommendations)                                                           On the same manner, QBook examines reviews available for b
      end if                                                                                          following the process defined for identifying features of interest to
      for each b in Recommendations do
          if Count++ <= K then                                                                        U , in order to gain a deeper understanding of the literary elements
              print “b + Explanation(b, ERb , RSb , UPref, DP)"                                       that are often used to describe b. This is done by analyzing the
          end if
      end for                                                                                         subjective opinions of all users who read and reviewed b.
   end for
                                                                                                         QBook leverages U ’s preferences in the recommendation process
                                                                                                      by calculating the degree of similarity between U ’s feature prefer-
                                                                                                      ences and b’s most-discussed features, as Sim(U , b) =         U®V ·BV
                                                                                                                                                                           ®
                                                                                                                                                                                 ,
have (sufficient) ratings assigned to them, they might still have tag
                                                                                                                                                                    U®V × BV ®
descriptions that can help the recommender determine if they are
indeed potentially of interest to a user and vice-versa.                                              where UV® =  and BV
                                                                                                                        U ,1       U ,n
                                                                                                                                              ® =  are
                                                                                                                                                        b, 1      b,m
   Matrix Factorization. To take advantage of U ’s historical data,                                   vector representations associated with feature discussions of U
QBook adopts a strategy based on Matrix factorization [24]. Specif-                                   and b, n and m are numbers of distinct features describing U and
ically, it uses LensKit’s [3] FunkSVD for candidate generation and                                    b, respectively, and WFU ,i and WFb,i capture the weight, i.e., de-
includes in CB any book b for which its predicted rating for U (rU ,b )                               gree of importance, of the i th feature for U and b, based on their
is above 3–ensuring the potential appeal of b to U .                                                  normalized frequencies of occurrence (in reviews).
   Content Analysis. Content-based filtering methodologies cre-                                          By using Sim(U , b), QBook triggers the generation of personal-
ate suggestions by comparing items’ characteristics and users’ pref-                                  ized suggestions, as it captures all features of interests for U and
erences. Available content representations (e.g., metadata) are used                                  compares them with the most-discussed features of b to determine
to describe items, as well as users’ profiles based on items users                                    how likely b is relevant to U .
favored in the past [36]. QBooks uses tags, which capture books’ con-
tent from diverse users’ perspectives. Thereafter, it applies Lenskit’s                               3.3    Considering Experts’ Opinions
implementation of the content-based algorithm [1] (based on the                                       To further analyze b, QBook takes advantage of experts’ reviews in
Vector Space Model and T F -IDF weighting scheme), and includes in                                    order to consider unbiased and objective opinions as another data
CB any book b for which its similarity with respect to U ’s content                                   point in its recommendation process. Unlike the polarity-neutral
preferences (CrU ,b ) is 3 or above.                                                                  strategy adopted to identify user/item features of interest, in the
                                                                                                      case of experts we explicitly examine the polarity of their opin-
3.2      Getting to Know Users and Books                                                              ions. By doing so Qbook leverages expert knowledge to ensure that
QBook aims to provide U with a set of appealing, personalized sug-                                    recommended books are of good quality. QBook explores publicly
gestions based on information he values. QBook examines reviews                                       available book critiques to determine experts’ opinions on can-
written by U and identifies the set of literary elements (features)                                   didate books by performing semantic analysis to examine which
that he cares the most about1 . Thereafter, it determines the degree                                  books experts valued more. QBook examines ER, the set of expert
to which each book in CB addresses U ’s features of interest. To                                      reviews available for b, from two complementary perspectives: it
identify features of interest to U , QBook performs semantic analysis                                 captures sentiment at a word and sentence levels using two popular
on reviews and considers the frequency of occurrence of terms U                                       sentiment analysis tools. By involving experts’ reviews in the rec-
employs in his reviews. By adopting the set of literary elements and                                  ommendation process, QBook can help overcome the data sparsity
the extraction strategy described in [34], QBook explores features                                    issue, since some books do not have sufficient user-generated data,
that characterize book content, such as character descriptions or                                     but have professional critiques which provide valuable information.
1 If U does not have reviews, then the most popular user features are treated as U ’s                    Sentiment at Word Level. SentiWordNet [13] is a lexical re-
features of importance.                                                                               source for opinion mining that assigns a sentiment score to each
IntRS Workshop, October 2018, Vancouver, Canada                                                                                                                Dragovic et al.


WordNet synset. Using SentiWordNet, QBook determines ER’s over-                              Using ARIMA and genre information about books read by U ,
all sentiment, denoted sW Nr , by calculating an average score based                      QBook can estimate the likelihood of occurrence of a given genre дn
on the sentiment of each word in ER. Based on the study described                         at time frame TW , i.e., the recommendation time in our case. This
in [33], and our own analysis, we observe that reviewers often sum-                       information is used to determine the degree to which U is interested
marize their overall thoughts in the last sentence of their review.                       in reading each genre and subsequently the number of books in each
For this reason, QBook also analyses the sentiment of the last sen-                       genre that should be recommended to satisfy U ’s varied interests
tence in each review in ER and calculates its overage score, denoted                      (see Section 3.5). For example, with the described time series genre
sW N s, to ensure the real tone of the review is captured.                                prediction strategy, QBook is able to prioritize the recommendation
    Sentiment at Sentence Level. In some cases, the polarity of                           of fantasy books for U (a genre U recently started reading more)
a word on its own does not properly capture the intended polar-                           over comedy books (a genre known to be favored by U in the past),
ity of a sentence. Thus, QBook uses CoreN LP [28] which builds                            even if proportionally U read more comedy than fantasy books. The
up a representation of whole sentences based on their structure.                          described prediction approach provides an additional data point to
QBook applies CoreN LP’ parser to extract sentences from ER and                           further personalize the recommendation process.
calculates a sentiment score for each respective sentence. These
scores are combined into a single (average) score, denoted cN LPr ,                       3.5      Curating Book Suggestions
which captures the overall sentiment of ER based on the sentiment                         The last step of QBook’s recommendation process focuses on cu-
of its individual sentences. Similar to the data points extracted at                      rating CB to generate top-K suggestions tailored to U . In this step,
word level, QBook also considers the average sentiment of the last                        QBook’s goal is to emulate the curation process (as defined in [6])
sentence in each review in ER, denoted cN LPs.                                            and become a personal docent that understands U and provides
                                                                                          relevant books to read that appeal to his diverse, yet unique, pref-
3.4      Incorporating a Time-Based Component                                             erences. To do so, QBook simultaneously considers different data
To better serve their stakeholders, recommenders must predict read-                       points and builds a model that creates a single score that quantifies
ers’ interest at any given time. Users’ preferences, however, tend to                     the degree to which U prefers b ∈ CB. For model generation, QBook
evolve, which is why it is crucial to consider a time component to                        adopts the Random Forest3 algorithm [7].
create suitable suggestions [44]. QBook examines genre, which is a                           As part of the curation process, QBook represents b ∈ CB as a vec-
category of literary composition, determined by literary technique,                       tor AU®,b =< rU ,b , CrU ,b , Sim(U , b), sW Nr, sW Ns, cN LPr, cN LPr >
tone, content, or even length, from a time-sensitive standpoint. In-                      which captures the degree of appeal of b for U from multiple per-
cluding this component provides the likelihood of reader(s) interest                      spectives and is used as an input instance to the trained Random
in each genre based on its occurrences at a specific point in the                         Forest to generate the corresponding ranking score for b. Note that
past, not only the most recent or the most frequently read one.                           unlike traditional recommenders that train a model for all the users
QBook uses a genre prediction strategy2 that examines a genre dis-                        in the community, in QBook a random forest model is trained per
tribution and applies a time series analysis model, Auto-Regressive                       user. This allows the model to specifically learn each user’s interests
Integrated Moving Average (ARIMA) [32]. In doing so, QBook can                            similar to what a personal docent would do.
discover different areas of U ’s reading preferences along with U ’s                         Reading Activity. Reading activity varies among users, influ-
degree of interest on each of them.                                                       encing QBook’s ability to manipulate ranked candidate books for
   Predicting genre preference to inform the recommendation pro-                          curation. For non-active readers–who rate less than 35 books- the
cess involves examining genres read by U . We first obtain the genre                      lack of available information can hinder the process of further per-
distribution among the books read by U during continuous periods                          sonalizing suggestions. In this case, QBook generates the top-K
of time and estimate a significance score for each genre дn or U                          recommendations for U by simply ordering the predictions scores
                                                 |дn, t,b |                               obtained using the trained Random Forest model on books in CB.
at a specific time period t: GenreImportance= |G        |
                                                            , where G t is
                                                                 t
                                                                                             For active readers (who read at least 35 books4 ), it is important
the set of books read in t дn,t is the frequency of occurrence of a
                                                                                          to identify what motivates their reading selections, which can vary
specific genre among books in G t , and |G t | is a size of G t .
                                                                                          among different readers. For example, some users are biased by
    Since changes in reading activity between fixed and known peri-
                                                                                          experts opinions while others by the preferences of similar-minded
ods of time are not constant, QBook applies non-seasonal ARIMA
                                                                                          individuals. QBook explicitly considers these individual biases in
models. By doing this, QBook is able to determine a model tailored
                                                                                          the book selection process for each active reader, leading to fur-
to each genre distribution to predict its importance for U in real
                                                                                          ther personalize suggestions. If U is an active reader, then QBook
time based on its previous occurrences. ARIMA forecasting (i.e.,
                                                                                          captures correlations among different data points involved in the
temporal prediction) model uses a specific genre distribution to
predict the likelihood of future occurrences of that genre based on                       process of creating AU®,b for U . QBook uses Pearson correlation
its importance in previous periods of time. This is why our strategy                      to indicate the extent to which variables fluctuate together (as il-
conducts a search over possible ARIMA models that capture user                            lustrated in Figure 2). By exploring U past rating behavior, QBook
preference and selects the one with a best fit for a specific genre                       can determine the data point that has the most influence on U in
distribution in time for U —the one that best describes the pattern                       3 Empirically verified that Random Forests are best suited for our curation task;analysis

of the time series and explains how the past affects the future.                          omitted due to page limitations.
                                                                                          4 Analysis of recent statistics on average number of books read by Americans on
2 We first discussed the benefits of explicitly considering changes in user preferences   a yearly basis along with examination of rating distributions on development data
over time in [11].                                                                        influenced our threshold selection for experimental purposes.
                                                                                          IntRS Workshop, October 2018, Vancouver, Canada


the process of ratings books, i.e., which data point yield the high-       the sentiment of the features, since QBook’s intent is not to make U
est correlated value with respect to U ’s ratings. This data point is      like one option more than another, but to save time on identifying
treated as the most important one, in terms of biasing U ’s decision       information important for him in the process of choosing the book
making process. QBook further re-orders the scores computed for            to read. Along with the other users’ opinions, QBook includes in its
each book in CB based on the score of the most influential data            explanations experts’ opinions on the book’s quality, as described
point and thus provides top-K suggestions further tailored for U .         in Section 3.3. In other words, QBook includes in the corresponding
                                                                           explanations a sentence from experts’ reviews that also reference
                                                                           users’ top feature of interest. This way, U is provided with objective
                                                                           opinions by extracting sentences from experts’ reviews pertaining
                                                                           to the feature of U . This increases U ’s trust in QBook, since U can
                                                                           read unbiased opinions that help him determine if he would like to
                                                                           read a particular recommendation or not. For its final step for ex-
                                                                           planation generation, QBook looks into the steps taken in curating
                                                                           each book suggestion. For example, if bc was selected based on U ’s
                                                                           rating history, then corresponding explanation includes a simple
                                                                           sentence of the form bc was chosen since it has been highly rated
                                                                           by users with similar rating patterns to yours. If, instead, experts’
                                                                           opinion had a strong influence in the curation process, then QBook
                                                                           makes sure that the user is aware of it.
Figure 2: Correlation among data points in QEval; “actual"                    The explanation paired with bc includes three sentences specifi-
is the rating assigned by a user, “predicted" is the one esti-             cally selected for U . While we pick 3 for simplicity purposes, the
mated by QBook, color denotes correlation, and the size of                 number of sentences included in the explanation can be adjusted.
the bubbles captures correlation strength.                                 By providing personalized explanations, QBook is able to tell a story
   Genre Preference. In the case of active readers, the final step of      to U about how each suggestion is related him, which increases
QBook for curating their suggestions involves explicitly considering       users’ trust in the system, as well as the system’s transparency [40].
U ’s genre preferences. This is accomplished using the strategy            Unlike the majority of existing strategies, QBook does not act like a
described in Section 3.4. To determine the number of representative        “black box" to the user since it provides information regarding the
candidates from each genre that should be part of the final set            selection and curation of the final set of books that are suggested.
of books presented to U , QBook relies on the genre preference             Therefore, with this step QBook acts as a personal docent for U .
distribution calculated using ARIMA time series analysis and the
process discussed in Section 3.4. In doing so, QBook can account           4     EVALUATION
for the degree to which U will be likely interested in reading each        In this section, we discuss the results of the studies conducted to
particular genre at the moment the recommendation is generated.            validate QBook’s performance and design methodology.
   The final list of top-K suggestions for U are generated by consid-
ering not only ranking scores and user bias, but also by ensuring          4.1    Framework
that the genre distribution of books among the K suggested match           Dataset. To the best of our knowledge, there is no benchmark that
the genre distribution uniquely expected for U .                           can be used for evaluating the performance of a curation recommen-
   By performing this curation step, QBook looks for diversity             dation systems. Instead we use resources from Social Book Search
among the suggestions by including books from all different areas          (SBS) Suggestion Track [2], which consists of 2.7 million book titles
of users’ interests and improves personalization by ordering sug-          along with user reviews and ratings, each with combined book
gestions based on a specific feature for U . Consequently, QBook           metadata from Amazon.com and LibraryThing. We complement
enables U to choose books from the exhibit tailored solely to him          this collection with (i) overlapping catalog records from the Library
in order to satisfy his reading needs in a given time.                     of Congress and (ii) experts’ reviews from known book critiques’
   Generating Explanations. In order to be a curator, QBook can            websites, such as NPR and Editorial Reviews SBS. We called this
not only recommend books to read without justifying why the                enhanced SBS dataset QData.
suggestions were generated. To do so, QBook pairs each recommen-              We split QData in three parts: 80% of the users were used for
dation with an explanation enabling U to make the most informed            training, denoted QTrain, 10% for development, denoted QDevel,
and fastest decisions in terms of selecting a single item among the        and the remaining 10% for evaluation, denoted QEval. To ensure
suggested ones. To generate explanations, QBook uses archived              a representative distribution for development and evaluation pur-
book reviews, experts’ reviews, and the set of steps taken to gen-         poses, we first clustered users from QData based on number of
erate curated recommendations and provides U with personalized             books read. Thereafter, we created QTrain, QDevel, QEval by ran-
and valuable information.                                                  domly selecting the same percentage of users from clusters to “sim-
   QBook creates explanations for a curated book bc by extracting          ulate" real representation of QData in each partition.
sentences in reviews pertaining to bc that refer to the most impor-           Metrics. For recommendation validation, we used the well-
tant literary element of interest to U . Note that if there are multiple   known NDCG and   Ñ RMSE.     We also considered:
                                                                                          |K R A |
                                                                                                 Ñ
sentence describing the same feature, QBook arbitrarily selects one           Coverage= |K Ñ R | , K is the set of books of the collection
to be shown to U . More importantly, QBook does not emphasize              known to a given user, R is the set of relevant books to a user and A
IntRS Workshop, October 2018, Vancouver, Canada                                                                                            Dragovic et al.


is the set of recommended books. This metric captures how many             Table 2: Aims of explanations in a recommender system
of the items from the dataset are being recommended to all users                Aim                  Definition
who get recommendations    [15].
                 Ñ
              |(R ÑA)−K |                                                       Effectiveness        Help users make good decisions
    Novelty= |R A | , where K, R, and A are defined as in Cover-                Efficiency           Help users make decisions faster
age. This metric captures how different a recommendation is with                Persuasiveness       Convince users to try or buy
respect to what the user has already seen along with the relevance              Satisfaction         Increase the ease of usability or enjoyment
of the recommended item [41].                                                   Scrutability         Allow users to tell the system it is wrong
    Serendipity, which measures how surprising the recommenda-                  Transparency         Explain how the system works
tions are to a user; computed as in [15].                                       Trust                Increase users’ confidence in the system

                                                                         Table 3: Influence of genre preference change over time on
                                                                         the recommendation process;‘*’ significant for p<0.001 t-test
                                                                                         Prediction Strategy                  KL       Accuracy
                                                                                   Without Time Series                       0.663         0.826
4.2    Results & Discussion                                                        With Time Series                         0.623*        0.870*
                                                                                   Without Time Series (3+ genres)           0.720         0.810
Temporal Analysis. Since time-based genre prediction and its
                                                                                   With Time Series (3+ genres)             0.660*        0.857*
influence in the recommendation process is a novel strategy, we
evaluate it in isolation to demonstrate its effectiveness. To do so,
                                                                         We create top-75 recommendations for each user in QDevel us-
we used a disjoint set of 1,214 randomly-selected users from the
                                                                         ing the individual strategies defined in Section 3 and evaluate the
dataset introduced earlier in the section. We used KL-Divergence,
                                                                         effectiveness of these recommendations based on NDCG6 .
which measures how well a distribution q generated by a predic-
                                                                            As shown in Figure 3, matrix factorization and content based ap-
tion strategy approximates a distribution p, the ground truth, i.e.,
                                                                         proaches are similarly effective in generating suggestions. However,
distribution of genres read by a user over a period of time. We also
                                                                         when combined they slightly increase the value of NDCG. This im-
used accuracy, a binary strategy that reflects if the predicted genres
                                                                         provement is statistically significant (p < 0.001; t-test), which means
correspond to the ones read by a user over a given period of time. In
                                                                         that, in general, users get more relevant recommendations when
establishing the ground truth for evaluation purposes, we adopted
                                                                         both strategies are considered in-tandem. This can be explained
the well-known N -1 strategy: the genre of the books rated by a
                                                                         with the fact that these two methodologies complement each other.
user U in N -1 time frames are used for training U ’s genre predic-
                                                                         Furthermore, we can see that the similarity between literary fea-
tion model, whereas the genre of the books rated by U in the N th
                                                                         tures of interest to a user and literary features most often used to
time frame are treated as “relevant". As a baseline for this initial
                                                                         describe a book, has a positive influence on the recommendation
assessment, we use a naive strategy that defines the importance of
                                                                         process as it increases NDCG by 2.5 % when explicitly considered
each genre for U on the current, i.e., N , time frame based on the
                                                                         as part of the recommendation process. This is anticipated, since
genre distribution across the previous N -1 time frames.
                                                                         user-generated reviews hold a lot of information that can allow us
   As shown in Table 3, for N =11 KL divergence scores indicate that
                                                                         to gain knowledge about each user and personalize suggestions.
genre distribution predicted using time-series better approximates
                                                                            The most reliable data points, which not only achieve relatively
to the ground truth thus leading to better performance. Furthermore,
                                                                         high NDCG but also are widely applicable and do not depend on
the probability of occurrence of each considered genre is closer
                                                                         individual users, are the four strategies that analyze sentiment of
to the real values when the time component is included in the
                                                                         expert reviews. These strategies rely on information frequently
prediction process. We observed differences among users who read
                                                                         available and thus are applicable to the majority of books examined
different number of distinct genres. For users who read only one to
                                                                         by QBook. Based on Figure 3, we can see that data points calculated
two genres, the time-based prediction strategy does not perform
                                                                         using sentence level sentiment analysis provide slightly better rec-
better than the baseline. However, if a user reads three or more
                                                                         ommendations compared to the ones generated using word level
genres, our time-based genre prediction strategy outperforms the
                                                                         sentiment analysis. Even though the individual strategies perform
baseline in both metrics. This is not surprising, given that it is not
                                                                         relatively well, we cannot assume that each data point can be cal-
hard to determine area(s) of interest for a user who constantly reads
                                                                         culated for every single book. QBook’s improvements in terms of
only one or two book genres, which is why the baseline performs
                                                                         NDCG can be justified with its ability to: (i) simultaneously con-
as good as time-based prediction strategy. Given that users that
                                                                         sider multiple data points, (ii) include genre-prediction strategy,
read 3 or more genres represent 91% of the users in the dataset used
                                                                         and (iii) more completely analyze different perspectives of user-
in the remaining of the experiments presented in this section, the
                                                                         book relations to provide recommendations even when some of the
proposed strategy provides significant improvements in predicting
                                                                         data points are unavailable. This is demonstrated based on the fact
preferred genre for the vast majority of readers.
                                                                         that NDCG of QBook is statistically significant with respect to the
   Overall Performance. We evaluate the individual strategies
                                                                         NDCG reported for the individual strategies (for p < 0.001).
that contribute to QBook’s recommendation process and analyze
                                                                         5 K is set to 7, based on a study presented in [30], where authors argue that a number
how each of them influences the generation of book suggestions.
                                                                         of objects an average human can hold in working memory is 7 ± 2.
                                                                         6 Q Devel and Q Eval yield comparable scores, indicating consistence in performance
                                                                         regardless of the data use for experimentation and no overfitting.
                                                                                          IntRS Workshop, October 2018, Vancouver, Canada




    Figure 3: Performance evaluation of individual recommendation strategies considered by QBook on QDevel and QEval.

   To further showcase the validity of Qbook’s design methodol-             serendipity and novelty. Based on the results of our analysis, we
ogy, we compare its performance with two baselines: SVD (matrix             observe that the performance of QBook is consistent, regardless of
factorization) and CB (content-based). For their respective imple-          the presence or absence of data points used in the recommendation
mentations we rely on LensKit. The significant (p < 0.01) NDCG              process. QBook’s RMSE (Table 4) indicates that its recommenda-
improvement of QBook, with respect to SVD (0.874) and CB (0.856),           tion strategy can successfully predict users’ degree of preference
demonstrates that, in general, recommendations provided by QBook            for books. QBook’s coverage score (0.92), highlights that QBook
are preferred over the ones provided by the baselines, which either         considers a vast number of diverse books as potential recommenda-
consider ratings patterns or content, but not both.                         tion, as opposed to popular ones. The novelty score (0.73) depicts
   Recommendation Explanations. There is no gold-standard                   that a user is provided with suggestions that differ from what he
to evaluate explanations offline. Thus, following the strategy in           already saw. This characteristic of QBook, together with relatively
[16, 39] we conducted an initial qualitative evaluation for demon-          high serendipity (0.68), indicates that new and unexpected, yet
strating the usefulness of the explanations generated by QBook.             relevant, suggestions are generated.
We rely on the criteria introduced in [39] and summarized in Table             State-of-the-art. We compare QBook with other book recom-
2, which outlines the “characteristics" of good explanations for rec-       menders (optimal parameters were empirically defined).
ommenders. QBook achieves five out of the seven criteria expected              LDAMF [29] harnesses the information in review text by fitting
of explanations generated by recommenders.By suggesting curated             an LDA model on the review text.
books which are described based on users’ preferred features of                CTR [43] uses a one-class collaborative filter strategy. Even
interest, showcasing opinions of other users on those features and          though it is not specifically created for books, we consider it as it
describing curation steps, QBook addresses transparency. QBook              exploits metadata comparable for that of books.
inspires trust on its users, since it does not consider the sentiment          HFT combines reviews with ratings [29] and models the rat-
connotation of the features to determine if they should be included         ings using a matrix factorization model to link the stochastic topic
in an explanation. Instead, QBook provides unbiased recommen-               distribution in review text and the latent vector in the ratings.
dations and explanations; which can increase users’ confidence as              SVD++ [23] refers to a matrix factorization model which makes
they know QBook offers a real depiction of each suggested book.             use of implicit feedback information.
Users are also able to make good and fast decisions, in terms of               URRP [20], is a Bayesian model that combines collaborative and
selecting books among the suggested ones, since based on provided           content-based filtering to learn user rating and review preferences.
explanations they can infer which books match their current prefer-            ‘Free Lunch’ [26] leverages clusters based on information that is
ences. With this, QBook increases its effectiveness. Given that users’      present in the user-item matrix, but not directly exploited during
overall satisfaction with a recommender is related to the perceived         matrix factorization.
quality of its recommendations and explanations [16], QBook users              RMR [25], which combines baselines by using the information
can appreciate not having to spend more time researching books              of both ratings and reviews.
with characteristics important to them.                                        In Table 4 we summarize the results of the evaluation conducted
   As per the study in [39] and assessments on several explanation-         using QEval in Table 4 in terms of RMSE. QBook outperforms
generation strategies [19, 31, 42, 46], we can report that, on average,     existing state-of-the-art book recommenders considered in this
only two (out of seven) criteria are satisfied. The only strategy that is   study in terms of predicting the degree of which a user would like to
comparable to QBook’s is the one discussed in [46], which addresses         read each recommended book. The difference on RMSE computed
five of the criteria. However, this strategy involves sentiment in the      for QBook with respect to aforementioned state-of-the-art book
generation of the explanations, as opposed to QBook which makes             recommenders are statistically significant with p < 0.001.
unbiased decisions when identifying users’ features of preference              The prediction power of QBook is evidenced by its ability to
and selecting which sentences to use to describe these features.            achieve lowest RMSE among state-of-the-art approaches. When
   Common Recommendation Issues. We showcase QBook’s                        analyzing the performance of different strategies in more detail, we
ability to address popular recommendation issues based on RMSE,             can see that Matrix Factorization strategies perform better, as in
in addition to adopting the evaluation framework presented in               the case of Free Lunch (with and without clustering) and SVD++.
[27] to simulate online evaluation using offline metrics: coverage,
IntRS Workshop, October 2018, Vancouver, Canada                                                                                                                Dragovic et al.


However, QBook goes beyond matrix factorization by using a con-                            [13] A. Esuli and F. Sebastiani. Sentiwordnet: A publicly available lexical resource for
tent based approach as well as involving different perspectives,                                opinion mining. In LREC, volume 6, pages 417–422, 2006.
                                                                                           [14] A. L. Garrido and S. Ilarri. Tmr: a semantic recommender system using topic
including other users’ and experts’ reviews.                                                    maps on the items’ descriptions. In ESWC, pages 213–217. Springer, 2014.
                                                                                           [15] M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating
       Table 4: QBook vs. state-of-the-art recommenders                                         recommender systems by coverage and serendipity. In ACM RecSys, pages 257–
                                                                                                260. ACM, 2010.
         Strategy      RMSE        Strategy                          RMSE                  [16] F. Gedikli, D. Jannach, and M. Ge. How should i explain? a comparison of
                                                                                                different explanation types for recommender systems. International Journal of
         QBook          0.795
                                                                                                Human-Computer Studies, 72(4):367–382, 2014.
         RMR            1.055      Free Lunch                         0.933                [17] S. Givon and V. Lavrenko. Predicting social-tags for cold start book recommen-
         LDAMF          1.053      Free Lunch w/ Clustering           0.825                     dations. In ACM RecSys, pages 333–336. ACM, 2009.
         CTR            1.052      SVD++                              0.908                [18] J. L. Herlocker, J. A. Konstan, and J. Riedl. Explaining collaborative filtering
                                                                                                recommendations. In ACM CSCW, pages 241–250. ACM, 2000.
         HFT            1.066      URRP                               1.104                [19] A. Hernando, J. Bobadilla, F. Ortega, and A. GutiéRrez. Trees for explaining
                                                                                                recommendations made through collaborative filtering. Information Sciences,
                                                                                                239:1–17, 2013.
5     CONCLUSIONS & FUTURE WORK                                                            [20] M. Jiang, D. Song, L. Liao, and F. Zhu. A bayesian recommender model for user
We presented QBook, a book recommender that acts as a curator by                                rating and review profiling. Tsinghua Science & Technology, 20(6):634–643, 2015.
                                                                                           [21] G. Kazai, D. Clarke, I. Yusof, and M. Venanzi. A personalised reader for crowd
showcasing tailored book selections that meet the reading needs of                              curated content. In ACM RecSys, pages 325–326. ACM, 2015.
individual users.As part of its recommendation process, QBooks ex-                         [22] D. Kislyuk, Y. Liu, D. Liu, E. Tzeng, and Y. Jing. Human curation and con-
                                                                                                vnets: Powering item-to-item recommendations on pinterest. arXiv preprint
amines different areas of user interest, not only the most dominant                             arXiv:1511.04003, 2015.
or recent ones, as well as varied data points. In doing so, QBook can                      [23] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative
yield a diverse set of suggestions, each paired with an explanation,                            filtering model. In ACM SIGKDD, pages 426–434. ACM, 2008.
                                                                                           [24] Y. Koren, R. Bell, C. Volinsky, et al. Matrix factorization techniques for recom-
to provide a user not only with reasons why a book was included                                 mender systems. Computer, 42(8):30–37, 2009.
in the curated list of recommendations but also how each recom-                            [25] G. Ling, M. R. Lyu, and I. King. Ratings meet reviews, a combined approach to
mendation was selected, with the objective of enhancing trust and                               recommend. In ACM RecSys, pages 105–112. ACM, 2014.
                                                                                           [26] B. Loni, A. Said, M. Larson, and A. Hanjalic. ’free lunch’enhancement for col-
transparency towards the user.                                                                  laborative filtering with factorization machines. In ACM RecSys, pages 281–284.
   We conducted a number of offline experiments to validate the                                 ACM, 2014.
                                                                                           [27] A. Maksai, F. Garcin, and B. Faltings. Predicting online performance of news
performance of QBook using a popular dataset. We also demon-                                    recommender systems through richer evaluation metrics. In ACM RecSys, pages
strated the importance of considering diverse data sources, beyond                              179–186. ACM, 2015.
ratings or content, to enhance the recommendation process.                                 [28] C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky.
                                                                                                The stanford corenlp natural language processing toolkit. In ACL (System Demon-
   With this work, we set the algorithmic foundations that will                                 strations), pages 55–60, 2014.
allow us to conduct in-depth online experiments in the future, in                          [29] J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding
order to quantify the usability of QBook, the value of its expla-                               rating dimensions with review text. In ACM RecSys, pages 165–172. ACM, 2013.
                                                                                           [30] G. A. Miller. The magical number seven, plus or minus two: Some limits on our
nations, and the degree of which its curation strategy can enrich                               capacity for processing information. Psychological Review, 63(2):81, 1956.
the recommendation process from a user’s perspective. Given the                            [31] J. Misztal and B. Indurkhya. Explaining contextual recommendations: Interaction
                                                                                                design study and prototype implementation. In IntRS@RecSys, pages 13–20, 2015.
domain-independent nature of our strategies, we plan to validate                           [32] R. Nau. Introduction to arima models.duke university. http://people.duke.edu/
QBook on datasets other than books to demonstrate its applicability                             ~rnau/411arim.htm. Accessed: 2016-05-06.
on domains other than books. Our goal is to go one step further                            [33] B. Ohana and B. Tierney. Sentiment classification of reviews using sentiwordnet.
                                                                                                9th IT & T Conference, 2009.
and enable our personal curator to generate suggestions in multiple                        [34] M. S. Pera and Y.-K. Ng. Automating readers’ advisory to make book recommen-
domains, based on a complete virtual footprint available for a user.                            dations for k-12 readers. In ACM RecSys, pages 9–16. ACM, 2014.
                                                                                           [35] M. S. Pera and Y.-K. Ng. Analyzing book-related features to recommend books
                                                                                                for emergent readers. In ACM HT, pages 221–230. ACM, 2015.
REFERENCES                                                                                 [36] F. Ricci, L. Rokach, and B. Shapira. Introduction to recommender systems handbook.
 [1] Content based recommendation system. Available at: http://eugenelin89.github.              Springer, 2011.
     io/recommender_content_based/.                                                        [37] Z. Saaya, R. Rafter, M. Schaal, and B. Smyth. The curated web: a recommendation
 [2] INEX Amazon/LibraryThing book corpus. http://social-book-search.humanities.                challenge. In ACM RecSys, pages 101–104. ACM, 2013.
     uva.nl/data/ALT_Nondisclosure_Agreements.html. Accessed: 2016-02-07.                  [38] Y. Teng, L. Zhang, Y. Tian, and X. Li. A novel fahp based book recommendation
 [3] LensKit open-source tools for recommender systems. https://lenskit.org.                    method by fusing apriori rule mining. In ISKE, pages 237–243. IEEE, 2015.
 [4] C. Benkoussas, A. Ollagnier, and P. Bellot. Book recommendation using informa-        [39] N. Tintarev and J. Masthoff. A survey of explanations in recommender systems.
     tion retrieval methods and graph analysis. In CLEF. CEUR, 2015.                            In IEEE ICDEW, pages 801–810, 2007.
 [5] R. Blanco, D. Ceccarelli, C. Lucchese, R. Perego, and F. Silvestri. You should read   [40] N. Tintarev and J. Masthoff. Evaluating recommender explanations: problems
     this! let me explain you why: explaining news recommendations to users. In                 experienced and lessons learned for the evaluation of adaptive systems. In
     ACM CIKM, pages 1995–1999. ACM, 2012.                                                      UCDEAS Workshop associated with UMAP. CEUR-WS, 2009.
 [6] C. Borrelli. Everybody’s a curator. chicago tribune. https://goo.gl/hpTF3Q.           [41] S. Vargas and P. Castells. Rank and relevance in novelty and diversity metrics
     Accessed: 2015-12-06.                                                                      for recommender systems. In ACM RecSys, pages 109–116. ACM, 2011.
 [7] L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.                       [42] J. Vig, S. Sen, and J. Riedl. Tagsplanations: explaining recommendations using
 [8] S. Channamsetty and M. D. Ekstrand. Recommender response to diversity and                  tags. In IUI, pages 47–56. ACM, 2009.
     popularity bias in user profiles. In AAAI FLAIRS, pages 657–660, 2017.                [43] C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific
 [9] L. Chen and F. Wang. Sentiment-enhanced explanation of product recommenda-                 articles. In TACM SIGKDD, pages 448–456. ACM, 2011.
     tions. In WWW, pages 239–240. ACM, 2014.                                              [44] J. Wang and Y. Zhang. Opportunity model for e-commerce recommendation:
[10] Dictionary. Oxford: Oxford university press, 1989.                                         right product; right time. In ACM SIGIR, pages 303–312, 2013.
[11] N. Dragovic and M. S. Pera. Genre prediction to inform the recommendation pro-        [45] M. C. Willemsen, M. P. Graus, and B. P. Knijnenburg. Understanding the role
     cess. Proceedings of the Poster Track of the 10th ACM Conference on Recommender            of latent feature diversification on choice difficulty and satisfaction. UMUAI,
     Systems, 2016.                                                                             26(4):347–389, 2016.
[12] N. Dragovic and M. S. Pera. Exploiting reviews to generate personalized and           [46] Y. Zhang, G. Lai, M. Zhang, Y. Zhang, Y. Liu, and S. Ma. Explicit factor models
     justified recommendations to guide users’ selection. In AAAI FLAIRS, pages                 for explainable recommendation based on phrase-level sentiment analysis. In
     661–664, 2017.                                                                             ACM SIGIR, pages 83–92. ACM, 2014.