=Paper= {{Paper |id=Vol-2225/paper6 |storemode=property |title=From Recommendation to Curation: When the System Becomes your Personal Docent |pdfUrl=https://ceur-ws.org/Vol-2225/paper6.pdf |volume=Vol-2225 |authors=Nevena Dragovic,Ion Madrazo Azpiazu,Maria Soledad Pera |dblpUrl=https://dblp.org/rec/conf/recsys/DragovicAP18 }} ==From Recommendation to Curation: When the System Becomes your Personal Docent== https://ceur-ws.org/Vol-2225/paper6.pdf

From Recommendation to Curation
When the system becomes your personal docent

Nevena Dragovic∗ Ion Madrazo Azpiazu Maria Soledad Pera
MarkMonitor People & Information Research Team People & Information Research Team
Boise, Idaho, USA Boise State University Boise State University
nevena.dragovic@markmonitor.com Boise, Idaho, USA Boise, Idaho, USA
ionmadrazo@boisestate.edu solepera@boisestate.edu

ABSTRACT small set of diverse items could lead to an enriched recommenda-
Curation is the act of selecting, organizing, and presenting con- tion process. In this paper, we present the algorithmic foundation
tent. Some applications emulate this process by turning users into to make this possible, and facilitate future user studies. A number
curators, while others use recommenders to select items, seldom of applications, e.g. Pinterest, let the user be the decision maker:
achieving the focus or selectivity of human curators. We bridge they offer individualized selections and then allow the user play
this gap with a recommendation strategy that more closely mim- the role of a curator in choosing appealing items. However, given
ics the objectives of human curators. We consider multiple data the ability of a recommender to collect and examine large amounts
sources to enhance the recommendation process, as well as the of data about users and items, the system itself can get to know
quality and diversity of the provided suggestions. Further, we pair users better—their interests and behaviors— and act as a curator.
each suggestion with an explanation that showcases why a book Due to scope limitations, we use books as a case study and
was recommended with the aim of easing the decision making pro- focus our research efforts on the techniques that lead to QBook, a
cess for the user. Empirical studies using Social Book Search data curated book recommender (Figure 1). QBook does not only find
demonstrate the effectiveness of the proposed methodology. books that are appealing to a user, but also presents a meaningful
set of suggestions with corresponding explanations that pertain
CCS CONCEPTS to the various preferences of the user. QBook takes the role of the
curator upon itself; make connections between suggested items and
• Information systems → Recommender systems;
the reasons for their selection; and enriches the recommendation
process by addressing issues that affect these systems: (1) Using
KEYWORDS historical data, we capture suitable candidate items for each user;
Curation, Decision-making, Time series, Personalization, Diversity (2) Understanding items’ metadata, we access potentially relevant
ACM Reference Format: items that otherwise might be ignored by solely relying on ratings;
Nevena Dragovic, Ion Madrazo Azpiazu, and Maria Soledad Pera. 2018. From (3) Considering user reviews, we infer which item features each user
Recommendation to Curation: When the system becomes your personal cares about and their degree of importance; (4) Examining experts’
docent. In Proceedings of Joint Workshop on Interfaces and Human Deci- reviews, we ensure overall quality of books to be recommended;
sion Making for Recommender Systems (IntRS) 2018 (IntRS Workshop). ACM, (5) Exploring the type of data sources a user favors, we learn about
8 pages. the user and understand why he could be interested in an item;(6)
1 INTRODUCTION Analyzing users’ change in genre preferences over time, we better
identify the current reading interests of individual users.
Recommenders have been studied for decades. Regardless of the QBook can be seen as an interpretable diversification strategy for
domain, they influence businesses’ success and users’ satisfaction. recommendation, where evidence factors from (1)-(6) are combined
From a commercial standpoint, recommenders enable companies using a diversification technique adapted to each user’s interest.
to advertise items to potential buyers. From a user perspective, Further, explanations on why each book was selected are provided
they enhance users’ experience by easing identification of items so that the user can better select the book he is interested in.
of interests while addressing information overload concerns. The With this work, we improve research related to recommenders by
degrees of personalization recommenders offer, however, can be combining traditional approaches with novel preference matching
hindered by their limited ability to provide diverse enough sug- methods into a single strategy that offers suggestions containing
gestions, restricting users’ exposure to new, prospective items of information related to the interests of each user. Moreover, we
interest. This occurs because common recommendation strategies explicitly undertake diversity and personalization–key aspects as
rely on community data, thus suggesting the same items to similar common collaborative filtering algorithms are known to not prop-
users, which can be vague and impersonal. agate users’ preferences on diversity into their recommendations
Inspired by the results of the work conducted by Willemsen et [8]—by exploring user reviews and time analysis to understand
al. [45], who demonstrated that “diverse, small item sets are just change of reading preference in time. To assess the effectivenesses
as satisfying and less effortful to choose from than Top-N recom- of QBook we conduct experiments measuring the utility of individ-
mendations," we argue that emulating the curation process—the act ual components as well as comparing the recommendation quality
of selecting, organizing, and presenting content [10]—to suggest a of the system as a whole with respect to state-of-the-art systems.
∗ Work conducted while the author was a student at Boise State.
IntRS Workshop, October 2018, Vancouver, Canada Dragovic et al.

works in this area focus on determining how different types of
explanations influence users while making decisions [16]. Among
the most common strategies we should highlight those based on
exploring previous activity of a user [5], information collected from
user reviews [46], and content-based tag cloud explanations. The
goal of QBook is to provide explanations that reveal reasoning and
data behind the recommendation process and contain other users’
and experts’ (objective) opinions on item characteristics that are of a
specific interest for each individual user. Many researchers consider
sentiment-based explanations as more effective, trustworthy, and
persuasive than the ones that capture relationship between previous
activity of the user and suggested items [9]. We, however, follow
the premise presented in [12] and do not include sentiment in order
Figure 1: Overview of QBook to make QBook look unbiased from users’ perspectives, i.e., we do
not select feature descriptions based on their polarity.
2 RELATED WORK Recommenders & Curation. Few research works focus on
We discuss work pertaining to book recommenders, as well as the simulating the curation process for recommendation purposes
use of explanations and curation for recommendation purposes. [21, 22, 37]. In [21], the authors discussed the development of a
Recommenders & Books. A number of recommenders have been news application that learns from users’ interactions with the sys-
designed to generate suggestions that help users select suitable tem while they swipe through provided news articles and like them.
books to read [4, 17, 34]. They are based on purchasing or rating In this research, the authors use social networks and users’ brows-
patterns, click-through data, content/tag analysis, or ontologies. ing history to create and recommend crowd curated content, but
Some book recommenders are tailored towards specific group of using users as curators. The work conducted by Saaya et al. [37],
users: By emulating the readers’ advisory service, the authors in on the other hand, relies on a content-based strategy that considers
[34] describe the process of recommending books for K-12 students, information authors collect from users’ profiles, which are then
based on the topics, contents, and literary elements that appeal to managed by the users themselves. The most recent study conducted
each individual user, whereas K3Rec [35] uses information about by Kislyuk at al. [22], combines a user-based curation method along
grade levels, contents, illustrations, and topics together with length with a traditional collaborative filtering strategy to improve Pin-
and writing style, to generate suggestions for emergent readers. terest’s recommendation process. The aforementioned alternatives
Garrido and Illarri [14] rely on content-based data for making book simply let users organize suggested content based on their personal
suggestions. Their proposed TMR [14] examines items’ descriptions preferences, since all three studies treat users as curators. However,
and reviews and relies on lexical and semantic resources to infer no recommendation system takes the role of the curator upon it-
users’ preferences. However, TMR can only work if descriptions self. We take a different approach and allow the system to take the
and reviews are available, unlike QBook, for which these are only curator role using existing user and item data.
two of the multiple data points considered in the recommendation
process. The authors in [4] present a strategy based on graph anal- 3 THE ANATOMY OF QBOOK
ysis and PageRank that exploits clicks, purchasing patterns, and Each step of QBook’s recommendation process addresses a particu-
book metadata. This strategy is constrained to the existence of a lar research problem on its own: Can item metadata complement
priori pairwise similarity between items, e.g. “similar products", the lack of historical data (and vice versa)? How can a time compo-
which is not a requirement for QBook. The authors in [38] highlight nent influence recommendation systems?, Can experts’ opinions
the importance of book recommenders as library services, and thus align with readers’ preferences?, Are users’ features of interests a
propose a fuzzy analytical hierarchy process based on a priori rules valuable asset to a recommendation process?, How does curation
mining that depends upon the existence of book-loan histories. The work as a part of a recommendation process?, Can personalized
empirical analysis presented in [38] is based on a limited and pri- explanation aid users in selecting relevant suggestions?.
vate dataset, which constrains the task of verifying its applicability
on large-scale benchmark datasets. More importantly, the proposed 3.1 Selecting Candidate Books
strategy is contingent on book-loan historical information that due To initiate the recommendation process (described in Algorithm 1),
to privacy concerns libraries rarely, if at all, make available. QBook identifies a set of books CB from a given collection to be
Recommenders & Explanations. An ongoing challenge faced curated for a user U . The two strategies considered for candidate
by recommenders is to get users to trust them, as they still operate as selection (i.e., matrix factorization and content-based filtering) com-
“black boxes" [18]. A powerful way to build successful relationships plement each other and ensure diversity among candidate books.
between users and recommenders is by providing information on While the former examines users’ rating patterns, the latter focuses
how each system works and why items are suggested [39]. This on books characteristics and does not require user-generated data.
can be accomplished by including explanations that justify the Moreover, by considering rating patterns and content, novelty of
suggestions, which are known to provide transparency and enhance the recommendations increases, as users are exposed to a variety of
trust on the system [39]. Unfortunately, justifying the reasons why books to chose from. We anticipate QBook to handle data sparsity
an item has been suggested to a user is not an easy task [16]. Recent and cold start in this step, since even if candidate books do not
IntRS Workshop, October 2018, Vancouver, Canada

Algorithm 1 The Recommendation Process of QBook Table 1: Sample of terms associated with literary features
Input: AB-Archived set of books, RS-Set of reviews, ER-Set of expert Reviews, K-# of recommenda- Literary Element Sample of Related Terms
tions, RF-Trained Random Forest
Terms: RSU -Reviews in RS written by U, RSb -Reviews in RS for b, ERb -Reviews in ER for b, characters stereotypes, detailed, distant, dramatic
PU -Set of books read by U
CandidateSet, Recommendations = empty set pace fast, slow, leisurely, breakneck, compelling
Count=0 storyline action-oriented, character-centered
for each user U do
UPref=Ranked list of preferred literary elements using RSU tone happy, light, uplifting, dark, ironic, funny
CandidateSet=b ∈ AB with rU ,b > 3 OR CrU ,b > 3 writing style austere, candid, classic, colorful
for each book b in CandidateSet do
BPref=Ranked list of preferred literary elements using RSb
frame descriptive, minimal, bleak, light-hearted
Sim= Similarity(UPref and BPref)
sWNr=Polarity(ERb , SentiWordNet)
sWNs=Polarity(lastSentence, ERb , SentiWordNet) writing style. As defined in [34], each literary element (i.e., feature)
cNLPr=Polarity(ERb , CoreNLP)
cNLPs=Polarity(lastSentence, ERb , CoreNLP)
is associated with a set of terms used to describe that element, since
AU®,b = < rU ,b , CrU ,b , Sim, sWNr, sWNs, cNLPr, cNLPr > different words can be used to express similar book elements. A
AppealScore=GenerateRanking(RF , AU®,b ) sample of literary elements and related terms is shown in Table 1.
Recommendations=Recommendations+ < b, AppealScore > QBook computes the overall frequency of occurrence of each fea-
end for
if U is active then ture mentioned by U by normalizing the occurrence of the feature
DP=Identify most-correlated data point for U
GenrePref=GenreDistribution(ARIMA, PU )
based on the number of reviews written by U . This score captures
Recommendations=Sort(Recommendations, DP, GenrePref, K ) the importance (i.e., weight) of each particular feature for U .
else
Recommendations=Sort(Recommendations) On the same manner, QBook examines reviews available for b
end if following the process defined for identifying features of interest to
for each b in Recommendations do
if Count++ <= K then U , in order to gain a deeper understanding of the literary elements
print “b + Explanation(b, ERb , RSb , UPref, DP)" that are often used to describe b. This is done by analyzing the
end if
end for subjective opinions of all users who read and reviewed b.
end for
QBook leverages U ’s preferences in the recommendation process
by calculating the degree of similarity between U ’s feature prefer-
ences and b’s most-discussed features, as Sim(U , b) = U®V ·BV
®
,
have (sufficient) ratings assigned to them, they might still have tag
U®V × BV ®
descriptions that can help the recommender determine if they are
indeed potentially of interest to a user and vice-versa. where UV® = and BV
U ,1 U ,n
® = are
b, 1 b,m
Matrix Factorization. To take advantage of U ’s historical data, vector representations associated with feature discussions of U
QBook adopts a strategy based on Matrix factorization [24]. Specif- and b, n and m are numbers of distinct features describing U and
ically, it uses LensKit’s [3] FunkSVD for candidate generation and b, respectively, and WFU ,i and WFb,i capture the weight, i.e., de-
includes in CB any book b for which its predicted rating for U (rU ,b ) gree of importance, of the i th feature for U and b, based on their
is above 3–ensuring the potential appeal of b to U . normalized frequencies of occurrence (in reviews).
Content Analysis. Content-based filtering methodologies cre- By using Sim(U , b), QBook triggers the generation of personal-
ate suggestions by comparing items’ characteristics and users’ pref- ized suggestions, as it captures all features of interests for U and
erences. Available content representations (e.g., metadata) are used compares them with the most-discussed features of b to determine
to describe items, as well as users’ profiles based on items users how likely b is relevant to U .
favored in the past [36]. QBooks uses tags, which capture books’ con-
tent from diverse users’ perspectives. Thereafter, it applies Lenskit’s 3.3 Considering Experts’ Opinions
implementation of the content-based algorithm [1] (based on the To further analyze b, QBook takes advantage of experts’ reviews in
Vector Space Model and T F -IDF weighting scheme), and includes in order to consider unbiased and objective opinions as another data
CB any book b for which its similarity with respect to U ’s content point in its recommendation process. Unlike the polarity-neutral
preferences (CrU ,b ) is 3 or above. strategy adopted to identify user/item features of interest, in the
case of experts we explicitly examine the polarity of their opin-
3.2 Getting to Know Users and Books ions. By doing so Qbook leverages expert knowledge to ensure that
QBook aims to provide U with a set of appealing, personalized sug- recommended books are of good quality. QBook explores publicly
gestions based on information he values. QBook examines reviews available book critiques to determine experts’ opinions on can-
written by U and identifies the set of literary elements (features) didate books by performing semantic analysis to examine which
that he cares the most about1 . Thereafter, it determines the degree books experts valued more. QBook examines ER, the set of expert
to which each book in CB addresses U ’s features of interest. To reviews available for b, from two complementary perspectives: it
identify features of interest to U , QBook performs semantic analysis captures sentiment at a word and sentence levels using two popular
on reviews and considers the frequency of occurrence of terms U sentiment analysis tools. By involving experts’ reviews in the rec-
employs in his reviews. By adopting the set of literary elements and ommendation process, QBook can help overcome the data sparsity
the extraction strategy described in [34], QBook explores features issue, since some books do not have sufficient user-generated data,
that characterize book content, such as character descriptions or but have professional critiques which provide valuable information.
1 If U does not have reviews, then the most popular user features are treated as U ’s Sentiment at Word Level. SentiWordNet [13] is a lexical re-
features of importance. source for opinion mining that assigns a sentiment score to each
IntRS Workshop, October 2018, Vancouver, Canada Dragovic et al.

WordNet synset. Using SentiWordNet, QBook determines ER’s over- Using ARIMA and genre information about books read by U ,
all sentiment, denoted sW Nr , by calculating an average score based QBook can estimate the likelihood of occurrence of a given genre дn
on the sentiment of each word in ER. Based on the study described at time frame TW , i.e., the recommendation time in our case. This
in [33], and our own analysis, we observe that reviewers often sum- information is used to determine the degree to which U is interested
marize their overall thoughts in the last sentence of their review. in reading each genre and subsequently the number of books in each
For this reason, QBook also analyses the sentiment of the last sen- genre that should be recommended to satisfy U ’s varied interests
tence in each review in ER and calculates its overage score, denoted (see Section 3.5). For example, with the described time series genre
sW N s, to ensure the real tone of the review is captured. prediction strategy, QBook is able to prioritize the recommendation
Sentiment at Sentence Level. In some cases, the polarity of of fantasy books for U (a genre U recently started reading more)
a word on its own does not properly capture the intended polar- over comedy books (a genre known to be favored by U in the past),
ity of a sentence. Thus, QBook uses CoreN LP [28] which builds even if proportionally U read more comedy than fantasy books. The
up a representation of whole sentences based on their structure. described prediction approach provides an additional data point to
QBook applies CoreN LP’ parser to extract sentences from ER and further personalize the recommendation process.
calculates a sentiment score for each respective sentence. These
scores are combined into a single (average) score, denoted cN LPr , 3.5 Curating Book Suggestions
which captures the overall sentiment of ER based on the sentiment The last step of QBook’s recommendation process focuses on cu-
of its individual sentences. Similar to the data points extracted at rating CB to generate top-K suggestions tailored to U . In this step,
word level, QBook also considers the average sentiment of the last QBook’s goal is to emulate the curation process (as defined in [6])
sentence in each review in ER, denoted cN LPs. and become a personal docent that understands U and provides
relevant books to read that appeal to his diverse, yet unique, pref-
3.4 Incorporating a Time-Based Component erences. To do so, QBook simultaneously considers different data
To better serve their stakeholders, recommenders must predict read- points and builds a model that creates a single score that quantifies
ers’ interest at any given time. Users’ preferences, however, tend to the degree to which U prefers b ∈ CB. For model generation, QBook
evolve, which is why it is crucial to consider a time component to adopts the Random Forest3 algorithm [7].
create suitable suggestions [44]. QBook examines genre, which is a As part of the curation process, QBook represents b ∈ CB as a vec-
category of literary composition, determined by literary technique, tor AU®,b =< rU ,b , CrU ,b , Sim(U , b), sW Nr, sW Ns, cN LPr, cN LPr >
tone, content, or even length, from a time-sensitive standpoint. In- which captures the degree of appeal of b for U from multiple per-
cluding this component provides the likelihood of reader(s) interest spectives and is used as an input instance to the trained Random
in each genre based on its occurrences at a specific point in the Forest to generate the corresponding ranking score for b. Note that
past, not only the most recent or the most frequently read one. unlike traditional recommenders that train a model for all the users
QBook uses a genre prediction strategy2 that examines a genre dis- in the community, in QBook a random forest model is trained per
tribution and applies a time series analysis model, Auto-Regressive user. This allows the model to specifically learn each user’s interests
Integrated Moving Average (ARIMA) [32]. In doing so, QBook can similar to what a personal docent would do.
discover different areas of U ’s reading preferences along with U ’s Reading Activity. Reading activity varies among users, influ-
degree of interest on each of them. encing QBook’s ability to manipulate ranked candidate books for
Predicting genre preference to inform the recommendation pro- curation. For non-active readers–who rate less than 35 books- the
cess involves examining genres read by U . We first obtain the genre lack of available information can hinder the process of further per-
distribution among the books read by U during continuous periods sonalizing suggestions. In this case, QBook generates the top-K
of time and estimate a significance score for each genre дn or U recommendations for U by simply ordering the predictions scores
|дn, t,b | obtained using the trained Random Forest model on books in CB.
at a specific time period t: GenreImportance= |G |
, where G t is
t
For active readers (who read at least 35 books4 ), it is important
the set of books read in t дn,t is the frequency of occurrence of a
to identify what motivates their reading selections, which can vary
specific genre among books in G t , and |G t | is a size of G t .
among different readers. For example, some users are biased by
Since changes in reading activity between fixed and known peri-
experts opinions while others by the preferences of similar-minded
ods of time are not constant, QBook applies non-seasonal ARIMA
individuals. QBook explicitly considers these individual biases in
models. By doing this, QBook is able to determine a model tailored
the book selection process for each active reader, leading to fur-
to each genre distribution to predict its importance for U in real
ther personalize suggestions. If U is an active reader, then QBook
time based on its previous occurrences. ARIMA forecasting (i.e.,
captures correlations among different data points involved in the
temporal prediction) model uses a specific genre distribution to
predict the likelihood of future occurrences of that genre based on process of creating AU®,b for U . QBook uses Pearson correlation
its importance in previous periods of time. This is why our strategy to indicate the extent to which variables fluctuate together (as il-
conducts a search over possible ARIMA models that capture user lustrated in Figure 2). By exploring U past rating behavior, QBook
preference and selects the one with a best fit for a specific genre can determine the data point that has the most influence on U in
distribution in time for U —the one that best describes the pattern 3 Empirically verified that Random Forests are best suited for our curation task;analysis

of the time series and explains how the past affects the future. omitted due to page limitations.
4 Analysis of recent statistics on average number of books read by Americans on
2 We first discussed the benefits of explicitly considering changes in user preferences a yearly basis along with examination of rating distributions on development data
over time in [11]. influenced our threshold selection for experimental purposes.
IntRS Workshop, October 2018, Vancouver, Canada

the process of ratings books, i.e., which data point yield the high- the sentiment of the features, since QBook’s intent is not to make U
est correlated value with respect to U ’s ratings. This data point is like one option more than another, but to save time on identifying
treated as the most important one, in terms of biasing U ’s decision information important for him in the process of choosing the book
making process. QBook further re-orders the scores computed for to read. Along with the other users’ opinions, QBook includes in its
each book in CB based on the score of the most influential data explanations experts’ opinions on the book’s quality, as described
point and thus provides top-K suggestions further tailored for U . in Section 3.3. In other words, QBook includes in the corresponding
explanations a sentence from experts’ reviews that also reference
users’ top feature of interest. This way, U is provided with objective
opinions by extracting sentences from experts’ reviews pertaining
to the feature of U . This increases U ’s trust in QBook, since U can
read unbiased opinions that help him determine if he would like to
read a particular recommendation or not. For its final step for ex-
planation generation, QBook looks into the steps taken in curating
each book suggestion. For example, if bc was selected based on U ’s
rating history, then corresponding explanation includes a simple
sentence of the form bc was chosen since it has been highly rated
by users with similar rating patterns to yours. If, instead, experts’
opinion had a strong influence in the curation process, then QBook
makes sure that the user is aware of it.
Figure 2: Correlation among data points in QEval; “actual" The explanation paired with bc includes three sentences specifi-
is the rating assigned by a user, “predicted" is the one esti- cally selected for U . While we pick 3 for simplicity purposes, the
mated by QBook, color denotes correlation, and the size of number of sentences included in the explanation can be adjusted.
the bubbles captures correlation strength. By providing personalized explanations, QBook is able to tell a story
Genre Preference. In the case of active readers, the final step of to U about how each suggestion is related him, which increases
QBook for curating their suggestions involves explicitly considering users’ trust in the system, as well as the system’s transparency [40].
U ’s genre preferences. This is accomplished using the strategy Unlike the majority of existing strategies, QBook does not act like a
described in Section 3.4. To determine the number of representative “black box" to the user since it provides information regarding the
candidates from each genre that should be part of the final set selection and curation of the final set of books that are suggested.
of books presented to U , QBook relies on the genre preference Therefore, with this step QBook acts as a personal docent for U .
distribution calculated using ARIMA time series analysis and the
process discussed in Section 3.4. In doing so, QBook can account 4 EVALUATION
for the degree to which U will be likely interested in reading each In this section, we discuss the results of the studies conducted to
particular genre at the moment the recommendation is generated. validate QBook’s performance and design methodology.
The final list of top-K suggestions for U are generated by consid-
ering not only ranking scores and user bias, but also by ensuring 4.1 Framework
that the genre distribution of books among the K suggested match Dataset. To the best of our knowledge, there is no benchmark that
the genre distribution uniquely expected for U . can be used for evaluating the performance of a curation recommen-
By performing this curation step, QBook looks for diversity dation systems. Instead we use resources from Social Book Search
among the suggestions by including books from all different areas (SBS) Suggestion Track [2], which consists of 2.7 million book titles
of users’ interests and improves personalization by ordering sug- along with user reviews and ratings, each with combined book
gestions based on a specific feature for U . Consequently, QBook metadata from Amazon.com and LibraryThing. We complement
enables U to choose books from the exhibit tailored solely to him this collection with (i) overlapping catalog records from the Library
in order to satisfy his reading needs in a given time. of Congress and (ii) experts’ reviews from known book critiques’
Generating Explanations. In order to be a curator, QBook can websites, such as NPR and Editorial Reviews SBS. We called this
not only recommend books to read without justifying why the enhanced SBS dataset QData.
suggestions were generated. To do so, QBook pairs each recommen- We split QData in three parts: 80% of the users were used for
dation with an explanation enabling U to make the most informed training, denoted QTrain, 10% for development, denoted QDevel,
and fastest decisions in terms of selecting a single item among the and the remaining 10% for evaluation, denoted QEval. To ensure
suggested ones. To generate explanations, QBook uses archived a representative distribution for development and evaluation pur-
book reviews, experts’ reviews, and the set of steps taken to gen- poses, we first clustered users from QData based on number of
erate curated recommendations and provides U with personalized books read. Thereafter, we created QTrain, QDevel, QEval by ran-
and valuable information. domly selecting the same percentage of users from clusters to “sim-
QBook creates explanations for a curated book bc by extracting ulate" real representation of QData in each partition.
sentences in reviews pertaining to bc that refer to the most impor- Metrics. For recommendation validation, we used the well-
tant literary element of interest to U . Note that if there are multiple known NDCG and Ñ RMSE. We also considered:
|K R A |
Ñ
sentence describing the same feature, QBook arbitrarily selects one Coverage= |K Ñ R | , K is the set of books of the collection
to be shown to U . More importantly, QBook does not emphasize known to a given user, R is the set of relevant books to a user and A
IntRS Workshop, October 2018, Vancouver, Canada Dragovic et al.

is the set of recommended books. This metric captures how many Table 2: Aims of explanations in a recommender system
of the items from the dataset are being recommended to all users Aim Definition
who get recommendations [15].
Ñ
|(R ÑA)−K | Effectiveness Help users make good decisions
Novelty= |R A | , where K, R, and A are defined as in Cover- Efficiency Help users make decisions faster
age. This metric captures how different a recommendation is with Persuasiveness Convince users to try or buy
respect to what the user has already seen along with the relevance Satisfaction Increase the ease of usability or enjoyment
of the recommended item [41]. Scrutability Allow users to tell the system it is wrong
Serendipity, which measures how surprising the recommenda- Transparency Explain how the system works
tions are to a user; computed as in [15]. Trust Increase users’ confidence in the system

Table 3: Influence of genre preference change over time on
the recommendation process;‘*’ significant for p<0.001 t-test
Prediction Strategy KL Accuracy
Without Time Series 0.663 0.826
4.2 Results & Discussion With Time Series 0.623* 0.870*
Without Time Series (3+ genres) 0.720 0.810
Temporal Analysis. Since time-based genre prediction and its
With Time Series (3+ genres) 0.660* 0.857*
influence in the recommendation process is a novel strategy, we
evaluate it in isolation to demonstrate its effectiveness. To do so,
We create top-75 recommendations for each user in QDevel us-
we used a disjoint set of 1,214 randomly-selected users from the
ing the individual strategies defined in Section 3 and evaluate the
dataset introduced earlier in the section. We used KL-Divergence,
effectiveness of these recommendations based on NDCG6 .
which measures how well a distribution q generated by a predic-
As shown in Figure 3, matrix factorization and content based ap-
tion strategy approximates a distribution p, the ground truth, i.e.,
proaches are similarly effective in generating suggestions. However,
distribution of genres read by a user over a period of time. We also
when combined they slightly increase the value of NDCG. This im-
used accuracy, a binary strategy that reflects if the predicted genres
provement is statistically significant (p < 0.001; t-test), which means
correspond to the ones read by a user over a given period of time. In
that, in general, users get more relevant recommendations when
establishing the ground truth for evaluation purposes, we adopted
both strategies are considered in-tandem. This can be explained
the well-known N -1 strategy: the genre of the books rated by a
with the fact that these two methodologies complement each other.
user U in N -1 time frames are used for training U ’s genre predic-
Furthermore, we can see that the similarity between literary fea-
tion model, whereas the genre of the books rated by U in the N th
tures of interest to a user and literary features most often used to
time frame are treated as “relevant". As a baseline for this initial
describe a book, has a positive influence on the recommendation
assessment, we use a naive strategy that defines the importance of
process as it increases NDCG by 2.5 % when explicitly considered
each genre for U on the current, i.e., N , time frame based on the
as part of the recommendation process. This is anticipated, since
genre distribution across the previous N -1 time frames.
user-generated reviews hold a lot of information that can allow us
As shown in Table 3, for N =11 KL divergence scores indicate that
to gain knowledge about each user and personalize suggestions.
genre distribution predicted using time-series better approximates
The most reliable data points, which not only achieve relatively
to the ground truth thus leading to better performance. Furthermore,
high NDCG but also are widely applicable and do not depend on
the probability of occurrence of each considered genre is closer
individual users, are the four strategies that analyze sentiment of
to the real values when the time component is included in the
expert reviews. These strategies rely on information frequently
prediction process. We observed differences among users who read
available and thus are applicable to the majority of books examined
different number of distinct genres. For users who read only one to
by QBook. Based on Figure 3, we can see that data points calculated
two genres, the time-based prediction strategy does not perform
using sentence level sentiment analysis provide slightly better rec-
better than the baseline. However, if a user reads three or more
ommendations compared to the ones generated using word level
genres, our time-based genre prediction strategy outperforms the
sentiment analysis. Even though the individual strategies perform
baseline in both metrics. This is not surprising, given that it is not
relatively well, we cannot assume that each data point can be cal-
hard to determine area(s) of interest for a user who constantly reads
culated for every single book. QBook’s improvements in terms of
only one or two book genres, which is why the baseline performs
NDCG can be justified with its ability to: (i) simultaneously con-
as good as time-based prediction strategy. Given that users that
sider multiple data points, (ii) include genre-prediction strategy,
read 3 or more genres represent 91% of the users in the dataset used
and (iii) more completely analyze different perspectives of user-
in the remaining of the experiments presented in this section, the
book relations to provide recommendations even when some of the
proposed strategy provides significant improvements in predicting
data points are unavailable. This is demonstrated based on the fact
preferred genre for the vast majority of readers.
that NDCG of QBook is statistically significant with respect to the
Overall Performance. We evaluate the individual strategies
NDCG reported for the individual strategies (for p < 0.001).
that contribute to QBook’s recommendation process and analyze
5 K is set to 7, based on a study presented in [30], where authors argue that a number
how each of them influences the generation of book suggestions.
of objects an average human can hold in working memory is 7 ± 2.
6 Q Devel and Q Eval yield comparable scores, indicating consistence in performance
regardless of the data use for experimentation and no overfitting.
IntRS Workshop, October 2018, Vancouver, Canada

Figure 3: Performance evaluation of individual recommendation strategies considered by QBook on QDevel and QEval.

To further showcase the validity of Qbook’s design methodol- serendipity and novelty. Based on the results of our analysis, we
ogy, we compare its performance with two baselines: SVD (matrix observe that the performance of QBook is consistent, regardless of
factorization) and CB (content-based). For their respective imple- the presence or absence of data points used in the recommendation
mentations we rely on LensKit. The significant (p < 0.01) NDCG process. QBook’s RMSE (Table 4) indicates that its recommenda-
improvement of QBook, with respect to SVD (0.874) and CB (0.856), tion strategy can successfully predict users’ degree of preference
demonstrates that, in general, recommendations provided by QBook for books. QBook’s coverage score (0.92), highlights that QBook
are preferred over the ones provided by the baselines, which either considers a vast number of diverse books as potential recommenda-
consider ratings patterns or content, but not both. tion, as opposed to popular ones. The novelty score (0.73) depicts
Recommendation Explanations. There is no gold-standard that a user is provided with suggestions that differ from what he
to evaluate explanations offline. Thus, following the strategy in already saw. This characteristic of QBook, together with relatively
[16, 39] we conducted an initial qualitative evaluation for demon- high serendipity (0.68), indicates that new and unexpected, yet
strating the usefulness of the explanations generated by QBook. relevant, suggestions are generated.
We rely on the criteria introduced in [39] and summarized in Table State-of-the-art. We compare QBook with other book recom-
2, which outlines the “characteristics" of good explanations for rec- menders (optimal parameters were empirically defined).
ommenders. QBook achieves five out of the seven criteria expected LDAMF [29] harnesses the information in review text by fitting
of explanations generated by recommenders.By suggesting curated an LDA model on the review text.
books which are described based on users’ preferred features of CTR [43] uses a one-class collaborative filter strategy. Even
interest, showcasing opinions of other users on those features and though it is not specifically created for books, we consider it as it
describing curation steps, QBook addresses transparency. QBook exploits metadata comparable for that of books.
inspires trust on its users, since it does not consider the sentiment HFT combines reviews with ratings [29] and models the rat-
connotation of the features to determine if they should be included ings using a matrix factorization model to link the stochastic topic
in an explanation. Instead, QBook provides unbiased recommen- distribution in review text and the latent vector in the ratings.
dations and explanations; which can increase users’ confidence as SVD++ [23] refers to a matrix factorization model which makes
they know QBook offers a real depiction of each suggested book. use of implicit feedback information.
Users are also able to make good and fast decisions, in terms of URRP [20], is a Bayesian model that combines collaborative and
selecting books among the suggested ones, since based on provided content-based filtering to learn user rating and review preferences.
explanations they can infer which books match their current prefer- ‘Free Lunch’ [26] leverages clusters based on information that is
ences. With this, QBook increases its effectiveness. Given that users’ present in the user-item matrix, but not directly exploited during
overall satisfaction with a recommender is related to the perceived matrix factorization.
quality of its recommendations and explanations [16], QBook users RMR [25], which combines baselines by using the information
can appreciate not having to spend more time researching books of both ratings and reviews.
with characteristics important to them. In Table 4 we summarize the results of the evaluation conducted
As per the study in [39] and assessments on several explanation- using QEval in Table 4 in terms of RMSE. QBook outperforms
generation strategies [19, 31, 42, 46], we can report that, on average, existing state-of-the-art book recommenders considered in this
only two (out of seven) criteria are satisfied. The only strategy that is study in terms of predicting the degree of which a user would like to
comparable to QBook’s is the one discussed in [46], which addresses read each recommended book. The difference on RMSE computed
five of the criteria. However, this strategy involves sentiment in the for QBook with respect to aforementioned state-of-the-art book
generation of the explanations, as opposed to QBook which makes recommenders are statistically significant with p < 0.001.
unbiased decisions when identifying users’ features of preference The prediction power of QBook is evidenced by its ability to
and selecting which sentences to use to describe these features. achieve lowest RMSE among state-of-the-art approaches. When
Common Recommendation Issues. We showcase QBook’s analyzing the performance of different strategies in more detail, we
ability to address popular recommendation issues based on RMSE, can see that Matrix Factorization strategies perform better, as in
in addition to adopting the evaluation framework presented in the case of Free Lunch (with and without clustering) and SVD++.
[27] to simulate online evaluation using offline metrics: coverage,
IntRS Workshop, October 2018, Vancouver, Canada Dragovic et al.

However, QBook goes beyond matrix factorization by using a con- [13] A. Esuli and F. Sebastiani. Sentiwordnet: A publicly available lexical resource for
tent based approach as well as involving different perspectives, opinion mining. In LREC, volume 6, pages 417–422, 2006.
[14] A. L. Garrido and S. Ilarri. Tmr: a semantic recommender system using topic
including other users’ and experts’ reviews. maps on the items’ descriptions. In ESWC, pages 213–217. Springer, 2014.
[15] M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating
Table 4: QBook vs. state-of-the-art recommenders recommender systems by coverage and serendipity. In ACM RecSys, pages 257–
260. ACM, 2010.
Strategy RMSE Strategy RMSE [16] F. Gedikli, D. Jannach, and M. Ge. How should i explain? a comparison of
different explanation types for recommender systems. International Journal of
QBook 0.795
Human-Computer Studies, 72(4):367–382, 2014.
RMR 1.055 Free Lunch 0.933 [17] S. Givon and V. Lavrenko. Predicting social-tags for cold start book recommen-
LDAMF 1.053 Free Lunch w/ Clustering 0.825 dations. In ACM RecSys, pages 333–336. ACM, 2009.
CTR 1.052 SVD++ 0.908 [18] J. L. Herlocker, J. A. Konstan, and J. Riedl. Explaining collaborative filtering
recommendations. In ACM CSCW, pages 241–250. ACM, 2000.
HFT 1.066 URRP 1.104 [19] A. Hernando, J. Bobadilla, F. Ortega, and A. GutiéRrez. Trees for explaining
recommendations made through collaborative filtering. Information Sciences,
239:1–17, 2013.
5 CONCLUSIONS & FUTURE WORK [20] M. Jiang, D. Song, L. Liao, and F. Zhu. A bayesian recommender model for user
We presented QBook, a book recommender that acts as a curator by rating and review profiling. Tsinghua Science & Technology, 20(6):634–643, 2015.
[21] G. Kazai, D. Clarke, I. Yusof, and M. Venanzi. A personalised reader for crowd
showcasing tailored book selections that meet the reading needs of curated content. In ACM RecSys, pages 325–326. ACM, 2015.
individual users.As part of its recommendation process, QBooks ex- [22] D. Kislyuk, Y. Liu, D. Liu, E. Tzeng, and Y. Jing. Human curation and con-
vnets: Powering item-to-item recommendations on pinterest. arXiv preprint
amines different areas of user interest, not only the most dominant arXiv:1511.04003, 2015.
or recent ones, as well as varied data points. In doing so, QBook can [23] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative
yield a diverse set of suggestions, each paired with an explanation, filtering model. In ACM SIGKDD, pages 426–434. ACM, 2008.
[24] Y. Koren, R. Bell, C. Volinsky, et al. Matrix factorization techniques for recom-
to provide a user not only with reasons why a book was included mender systems. Computer, 42(8):30–37, 2009.
in the curated list of recommendations but also how each recom- [25] G. Ling, M. R. Lyu, and I. King. Ratings meet reviews, a combined approach to
mendation was selected, with the objective of enhancing trust and recommend. In ACM RecSys, pages 105–112. ACM, 2014.
[26] B. Loni, A. Said, M. Larson, and A. Hanjalic. ’free lunch’enhancement for col-
transparency towards the user. laborative filtering with factorization machines. In ACM RecSys, pages 281–284.
We conducted a number of offline experiments to validate the ACM, 2014.
[27] A. Maksai, F. Garcin, and B. Faltings. Predicting online performance of news
performance of QBook using a popular dataset. We also demon- recommender systems through richer evaluation metrics. In ACM RecSys, pages
strated the importance of considering diverse data sources, beyond 179–186. ACM, 2015.
ratings or content, to enhance the recommendation process. [28] C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky.
The stanford corenlp natural language processing toolkit. In ACL (System Demon-
With this work, we set the algorithmic foundations that will strations), pages 55–60, 2014.
allow us to conduct in-depth online experiments in the future, in [29] J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding
order to quantify the usability of QBook, the value of its expla- rating dimensions with review text. In ACM RecSys, pages 165–172. ACM, 2013.
[30] G. A. Miller. The magical number seven, plus or minus two: Some limits on our
nations, and the degree of which its curation strategy can enrich capacity for processing information. Psychological Review, 63(2):81, 1956.
the recommendation process from a user’s perspective. Given the [31] J. Misztal and B. Indurkhya. Explaining contextual recommendations: Interaction
design study and prototype implementation. In IntRS@RecSys, pages 13–20, 2015.
domain-independent nature of our strategies, we plan to validate [32] R. Nau. Introduction to arima models.duke university. http://people.duke.edu/
QBook on datasets other than books to demonstrate its applicability ~rnau/411arim.htm. Accessed: 2016-05-06.
on domains other than books. Our goal is to go one step further [33] B. Ohana and B. Tierney. Sentiment classification of reviews using sentiwordnet.
9th IT & T Conference, 2009.
and enable our personal curator to generate suggestions in multiple [34] M. S. Pera and Y.-K. Ng. Automating readers’ advisory to make book recommen-
domains, based on a complete virtual footprint available for a user. dations for k-12 readers. In ACM RecSys, pages 9–16. ACM, 2014.
[35] M. S. Pera and Y.-K. Ng. Analyzing book-related features to recommend books
for emergent readers. In ACM HT, pages 221–230. ACM, 2015.
REFERENCES [36] F. Ricci, L. Rokach, and B. Shapira. Introduction to recommender systems handbook.
[1] Content based recommendation system. Available at: http://eugenelin89.github. Springer, 2011.
io/recommender_content_based/. [37] Z. Saaya, R. Rafter, M. Schaal, and B. Smyth. The curated web: a recommendation
[2] INEX Amazon/LibraryThing book corpus. http://social-book-search.humanities. challenge. In ACM RecSys, pages 101–104. ACM, 2013.
uva.nl/data/ALT_Nondisclosure_Agreements.html. Accessed: 2016-02-07. [38] Y. Teng, L. Zhang, Y. Tian, and X. Li. A novel fahp based book recommendation
[3] LensKit open-source tools for recommender systems. https://lenskit.org. method by fusing apriori rule mining. In ISKE, pages 237–243. IEEE, 2015.
[4] C. Benkoussas, A. Ollagnier, and P. Bellot. Book recommendation using informa- [39] N. Tintarev and J. Masthoff. A survey of explanations in recommender systems.
tion retrieval methods and graph analysis. In CLEF. CEUR, 2015. In IEEE ICDEW, pages 801–810, 2007.
[5] R. Blanco, D. Ceccarelli, C. Lucchese, R. Perego, and F. Silvestri. You should read [40] N. Tintarev and J. Masthoff. Evaluating recommender explanations: problems
this! let me explain you why: explaining news recommendations to users. In experienced and lessons learned for the evaluation of adaptive systems. In
ACM CIKM, pages 1995–1999. ACM, 2012. UCDEAS Workshop associated with UMAP. CEUR-WS, 2009.
[6] C. Borrelli. Everybody’s a curator. chicago tribune. https://goo.gl/hpTF3Q. [41] S. Vargas and P. Castells. Rank and relevance in novelty and diversity metrics
Accessed: 2015-12-06. for recommender systems. In ACM RecSys, pages 109–116. ACM, 2011.
[7] L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. [42] J. Vig, S. Sen, and J. Riedl. Tagsplanations: explaining recommendations using
[8] S. Channamsetty and M. D. Ekstrand. Recommender response to diversity and tags. In IUI, pages 47–56. ACM, 2009.
popularity bias in user profiles. In AAAI FLAIRS, pages 657–660, 2017. [43] C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific
[9] L. Chen and F. Wang. Sentiment-enhanced explanation of product recommenda- articles. In TACM SIGKDD, pages 448–456. ACM, 2011.
tions. In WWW, pages 239–240. ACM, 2014. [44] J. Wang and Y. Zhang. Opportunity model for e-commerce recommendation:
[10] Dictionary. Oxford: Oxford university press, 1989. right product; right time. In ACM SIGIR, pages 303–312, 2013.
[11] N. Dragovic and M. S. Pera. Genre prediction to inform the recommendation pro- [45] M. C. Willemsen, M. P. Graus, and B. P. Knijnenburg. Understanding the role
cess. Proceedings of the Poster Track of the 10th ACM Conference on Recommender of latent feature diversification on choice difficulty and satisfaction. UMUAI,
Systems, 2016. 26(4):347–389, 2016.
[12] N. Dragovic and M. S. Pera. Exploiting reviews to generate personalized and [46] Y. Zhang, G. Lai, M. Zhang, Y. Zhang, Y. Liu, and S. Ma. Explicit factor models
justified recommendations to guide users’ selection. In AAAI FLAIRS, pages for explainable recommendation based on phrase-level sentiment analysis. In
661–664, 2017. ACM SIGIR, pages 83–92. ACM, 2014.