=Paper=
{{Paper
|id=Vol-2225/paper6
|storemode=property
|title=From Recommendation to Curation: When the System Becomes your Personal Docent
|pdfUrl=https://ceur-ws.org/Vol-2225/paper6.pdf
|volume=Vol-2225
|authors=Nevena Dragovic,Ion Madrazo Azpiazu,Maria Soledad Pera
|dblpUrl=https://dblp.org/rec/conf/recsys/DragovicAP18
}}
==From Recommendation to Curation: When the System Becomes your Personal Docent==
From Recommendation to Curation When the system becomes your personal docent Nevena Dragovic∗ Ion Madrazo Azpiazu Maria Soledad Pera MarkMonitor People & Information Research Team People & Information Research Team Boise, Idaho, USA Boise State University Boise State University nevena.dragovic@markmonitor.com Boise, Idaho, USA Boise, Idaho, USA ionmadrazo@boisestate.edu solepera@boisestate.edu ABSTRACT small set of diverse items could lead to an enriched recommenda- Curation is the act of selecting, organizing, and presenting con- tion process. In this paper, we present the algorithmic foundation tent. Some applications emulate this process by turning users into to make this possible, and facilitate future user studies. A number curators, while others use recommenders to select items, seldom of applications, e.g. Pinterest, let the user be the decision maker: achieving the focus or selectivity of human curators. We bridge they offer individualized selections and then allow the user play this gap with a recommendation strategy that more closely mim- the role of a curator in choosing appealing items. However, given ics the objectives of human curators. We consider multiple data the ability of a recommender to collect and examine large amounts sources to enhance the recommendation process, as well as the of data about users and items, the system itself can get to know quality and diversity of the provided suggestions. Further, we pair users better—their interests and behaviors— and act as a curator. each suggestion with an explanation that showcases why a book Due to scope limitations, we use books as a case study and was recommended with the aim of easing the decision making pro- focus our research efforts on the techniques that lead to QBook, a cess for the user. Empirical studies using Social Book Search data curated book recommender (Figure 1). QBook does not only find demonstrate the effectiveness of the proposed methodology. books that are appealing to a user, but also presents a meaningful set of suggestions with corresponding explanations that pertain CCS CONCEPTS to the various preferences of the user. QBook takes the role of the curator upon itself; make connections between suggested items and • Information systems → Recommender systems; the reasons for their selection; and enriches the recommendation process by addressing issues that affect these systems: (1) Using KEYWORDS historical data, we capture suitable candidate items for each user; Curation, Decision-making, Time series, Personalization, Diversity (2) Understanding items’ metadata, we access potentially relevant ACM Reference Format: items that otherwise might be ignored by solely relying on ratings; Nevena Dragovic, Ion Madrazo Azpiazu, and Maria Soledad Pera. 2018. From (3) Considering user reviews, we infer which item features each user Recommendation to Curation: When the system becomes your personal cares about and their degree of importance; (4) Examining experts’ docent. In Proceedings of Joint Workshop on Interfaces and Human Deci- reviews, we ensure overall quality of books to be recommended; sion Making for Recommender Systems (IntRS) 2018 (IntRS Workshop). ACM, (5) Exploring the type of data sources a user favors, we learn about 8 pages. the user and understand why he could be interested in an item;(6) 1 INTRODUCTION Analyzing users’ change in genre preferences over time, we better identify the current reading interests of individual users. Recommenders have been studied for decades. Regardless of the QBook can be seen as an interpretable diversification strategy for domain, they influence businesses’ success and users’ satisfaction. recommendation, where evidence factors from (1)-(6) are combined From a commercial standpoint, recommenders enable companies using a diversification technique adapted to each user’s interest. to advertise items to potential buyers. From a user perspective, Further, explanations on why each book was selected are provided they enhance users’ experience by easing identification of items so that the user can better select the book he is interested in. of interests while addressing information overload concerns. The With this work, we improve research related to recommenders by degrees of personalization recommenders offer, however, can be combining traditional approaches with novel preference matching hindered by their limited ability to provide diverse enough sug- methods into a single strategy that offers suggestions containing gestions, restricting users’ exposure to new, prospective items of information related to the interests of each user. Moreover, we interest. This occurs because common recommendation strategies explicitly undertake diversity and personalization–key aspects as rely on community data, thus suggesting the same items to similar common collaborative filtering algorithms are known to not prop- users, which can be vague and impersonal. agate users’ preferences on diversity into their recommendations Inspired by the results of the work conducted by Willemsen et [8]—by exploring user reviews and time analysis to understand al. [45], who demonstrated that “diverse, small item sets are just change of reading preference in time. To assess the effectivenesses as satisfying and less effortful to choose from than Top-N recom- of QBook we conduct experiments measuring the utility of individ- mendations," we argue that emulating the curation process—the act ual components as well as comparing the recommendation quality of selecting, organizing, and presenting content [10]—to suggest a of the system as a whole with respect to state-of-the-art systems. ∗ Work conducted while the author was a student at Boise State. IntRS Workshop, October 2018, Vancouver, Canada Dragovic et al. works in this area focus on determining how different types of explanations influence users while making decisions [16]. Among the most common strategies we should highlight those based on exploring previous activity of a user [5], information collected from user reviews [46], and content-based tag cloud explanations. The goal of QBook is to provide explanations that reveal reasoning and data behind the recommendation process and contain other users’ and experts’ (objective) opinions on item characteristics that are of a specific interest for each individual user. Many researchers consider sentiment-based explanations as more effective, trustworthy, and persuasive than the ones that capture relationship between previous activity of the user and suggested items [9]. We, however, follow the premise presented in [12] and do not include sentiment in order Figure 1: Overview of QBook to make QBook look unbiased from users’ perspectives, i.e., we do not select feature descriptions based on their polarity. 2 RELATED WORK Recommenders & Curation. Few research works focus on We discuss work pertaining to book recommenders, as well as the simulating the curation process for recommendation purposes use of explanations and curation for recommendation purposes. [21, 22, 37]. In [21], the authors discussed the development of a Recommenders & Books. A number of recommenders have been news application that learns from users’ interactions with the sys- designed to generate suggestions that help users select suitable tem while they swipe through provided news articles and like them. books to read [4, 17, 34]. They are based on purchasing or rating In this research, the authors use social networks and users’ brows- patterns, click-through data, content/tag analysis, or ontologies. ing history to create and recommend crowd curated content, but Some book recommenders are tailored towards specific group of using users as curators. The work conducted by Saaya et al. [37], users: By emulating the readers’ advisory service, the authors in on the other hand, relies on a content-based strategy that considers [34] describe the process of recommending books for K-12 students, information authors collect from users’ profiles, which are then based on the topics, contents, and literary elements that appeal to managed by the users themselves. The most recent study conducted each individual user, whereas K3Rec [35] uses information about by Kislyuk at al. [22], combines a user-based curation method along grade levels, contents, illustrations, and topics together with length with a traditional collaborative filtering strategy to improve Pin- and writing style, to generate suggestions for emergent readers. terest’s recommendation process. The aforementioned alternatives Garrido and Illarri [14] rely on content-based data for making book simply let users organize suggested content based on their personal suggestions. Their proposed TMR [14] examines items’ descriptions preferences, since all three studies treat users as curators. However, and reviews and relies on lexical and semantic resources to infer no recommendation system takes the role of the curator upon it- users’ preferences. However, TMR can only work if descriptions self. We take a different approach and allow the system to take the and reviews are available, unlike QBook, for which these are only curator role using existing user and item data. two of the multiple data points considered in the recommendation process. The authors in [4] present a strategy based on graph anal- 3 THE ANATOMY OF QBOOK ysis and PageRank that exploits clicks, purchasing patterns, and Each step of QBook’s recommendation process addresses a particu- book metadata. This strategy is constrained to the existence of a lar research problem on its own: Can item metadata complement priori pairwise similarity between items, e.g. “similar products", the lack of historical data (and vice versa)? How can a time compo- which is not a requirement for QBook. The authors in [38] highlight nent influence recommendation systems?, Can experts’ opinions the importance of book recommenders as library services, and thus align with readers’ preferences?, Are users’ features of interests a propose a fuzzy analytical hierarchy process based on a priori rules valuable asset to a recommendation process?, How does curation mining that depends upon the existence of book-loan histories. The work as a part of a recommendation process?, Can personalized empirical analysis presented in [38] is based on a limited and pri- explanation aid users in selecting relevant suggestions?. vate dataset, which constrains the task of verifying its applicability on large-scale benchmark datasets. More importantly, the proposed 3.1 Selecting Candidate Books strategy is contingent on book-loan historical information that due To initiate the recommendation process (described in Algorithm 1), to privacy concerns libraries rarely, if at all, make available. QBook identifies a set of books CB from a given collection to be Recommenders & Explanations. An ongoing challenge faced curated for a user U . The two strategies considered for candidate by recommenders is to get users to trust them, as they still operate as selection (i.e., matrix factorization and content-based filtering) com- “black boxes" [18]. A powerful way to build successful relationships plement each other and ensure diversity among candidate books. between users and recommenders is by providing information on While the former examines users’ rating patterns, the latter focuses how each system works and why items are suggested [39]. This on books characteristics and does not require user-generated data. can be accomplished by including explanations that justify the Moreover, by considering rating patterns and content, novelty of suggestions, which are known to provide transparency and enhance the recommendations increases, as users are exposed to a variety of trust on the system [39]. Unfortunately, justifying the reasons why books to chose from. We anticipate QBook to handle data sparsity an item has been suggested to a user is not an easy task [16]. Recent and cold start in this step, since even if candidate books do not IntRS Workshop, October 2018, Vancouver, Canada Algorithm 1 The Recommendation Process of QBook Table 1: Sample of terms associated with literary features Input: AB-Archived set of books, RS-Set of reviews, ER-Set of expert Reviews, K-# of recommenda- Literary Element Sample of Related Terms tions, RF-Trained Random Forest Terms: RSU -Reviews in RS written by U, RSb -Reviews in RS for b, ERb -Reviews in ER for b, characters stereotypes, detailed, distant, dramatic PU -Set of books read by U CandidateSet, Recommendations = empty set pace fast, slow, leisurely, breakneck, compelling Count=0 storyline action-oriented, character-centered for each user U do UPref=Ranked list of preferred literary elements using RSU tone happy, light, uplifting, dark, ironic, funny CandidateSet=b ∈ AB with rU ,b > 3 OR CrU ,b > 3 writing style austere, candid, classic, colorful for each book b in CandidateSet do BPref=Ranked list of preferred literary elements using RSb frame descriptive, minimal, bleak, light-hearted Sim= Similarity(UPref and BPref) sWNr=Polarity(ERb , SentiWordNet) sWNs=Polarity(lastSentence, ERb , SentiWordNet) writing style. As defined in [34], each literary element (i.e., feature) cNLPr=Polarity(ERb , CoreNLP) cNLPs=Polarity(lastSentence, ERb , CoreNLP) is associated with a set of terms used to describe that element, since AU®,b = < rU ,b , CrU ,b , Sim, sWNr, sWNs, cNLPr, cNLPr > different words can be used to express similar book elements. A AppealScore=GenerateRanking(RF , AU®,b ) sample of literary elements and related terms is shown in Table 1. Recommendations=Recommendations+ < b, AppealScore > QBook computes the overall frequency of occurrence of each fea- end for if U is active then ture mentioned by U by normalizing the occurrence of the feature DP=Identify most-correlated data point for U GenrePref=GenreDistribution(ARIMA, PU ) based on the number of reviews written by U . This score captures Recommendations=Sort(Recommendations, DP, GenrePref, K ) the importance (i.e., weight) of each particular feature for U . else Recommendations=Sort(Recommendations) On the same manner, QBook examines reviews available for b end if following the process defined for identifying features of interest to for each b in Recommendations do if Count++ <= K then U , in order to gain a deeper understanding of the literary elements print “b + Explanation(b, ERb , RSb , UPref, DP)" that are often used to describe b. This is done by analyzing the end if end for subjective opinions of all users who read and reviewed b. end for QBook leverages U ’s preferences in the recommendation process by calculating the degree of similarity between U ’s feature prefer- ences and b’s most-discussed features, as Sim(U , b) = U®V ·BV ® , have (sufficient) ratings assigned to them, they might still have tag U®V × BV ® descriptions that can help the recommender determine if they are indeed potentially of interest to a user and vice-versa. where UV® =and BV U ,1 U ,n ® = are b, 1 b,m Matrix Factorization. To take advantage of U ’s historical data, vector representations associated with feature discussions of U QBook adopts a strategy based on Matrix factorization [24]. Specif- and b, n and m are numbers of distinct features describing U and ically, it uses LensKit’s [3] FunkSVD for candidate generation and b, respectively, and WFU ,i and WFb,i capture the weight, i.e., de- includes in CB any book b for which its predicted rating for U (rU ,b ) gree of importance, of the i th feature for U and b, based on their is above 3–ensuring the potential appeal of b to U . normalized frequencies of occurrence (in reviews). Content Analysis. Content-based filtering methodologies cre- By using Sim(U , b), QBook triggers the generation of personal- ate suggestions by comparing items’ characteristics and users’ pref- ized suggestions, as it captures all features of interests for U and erences. Available content representations (e.g., metadata) are used compares them with the most-discussed features of b to determine to describe items, as well as users’ profiles based on items users how likely b is relevant to U . favored in the past [36]. QBooks uses tags, which capture books’ con- tent from diverse users’ perspectives. Thereafter, it applies Lenskit’s 3.3 Considering Experts’ Opinions implementation of the content-based algorithm [1] (based on the To further analyze b, QBook takes advantage of experts’ reviews in Vector Space Model and T F -IDF weighting scheme), and includes in order to consider unbiased and objective opinions as another data CB any book b for which its similarity with respect to U ’s content point in its recommendation process. Unlike the polarity-neutral preferences (CrU ,b ) is 3 or above. strategy adopted to identify user/item features of interest, in the case of experts we explicitly examine the polarity of their opin- 3.2 Getting to Know Users and Books ions. By doing so Qbook leverages expert knowledge to ensure that QBook aims to provide U with a set of appealing, personalized sug- recommended books are of good quality. QBook explores publicly gestions based on information he values. QBook examines reviews available book critiques to determine experts’ opinions on can- written by U and identifies the set of literary elements (features) didate books by performing semantic analysis to examine which that he cares the most about1 . Thereafter, it determines the degree books experts valued more. QBook examines ER, the set of expert to which each book in CB addresses U ’s features of interest. To reviews available for b, from two complementary perspectives: it identify features of interest to U , QBook performs semantic analysis captures sentiment at a word and sentence levels using two popular on reviews and considers the frequency of occurrence of terms U sentiment analysis tools. By involving experts’ reviews in the rec- employs in his reviews. By adopting the set of literary elements and ommendation process, QBook can help overcome the data sparsity the extraction strategy described in [34], QBook explores features issue, since some books do not have sufficient user-generated data, that characterize book content, such as character descriptions or but have professional critiques which provide valuable information. 1 If U does not have reviews, then the most popular user features are treated as U ’s Sentiment at Word Level. SentiWordNet [13] is a lexical re- features of importance. source for opinion mining that assigns a sentiment score to each IntRS Workshop, October 2018, Vancouver, Canada Dragovic et al. WordNet synset. Using SentiWordNet, QBook determines ER’s over- Using ARIMA and genre information about books read by U , all sentiment, denoted sW Nr , by calculating an average score based QBook can estimate the likelihood of occurrence of a given genre дn on the sentiment of each word in ER. Based on the study described at time frame TW , i.e., the recommendation time in our case. This in [33], and our own analysis, we observe that reviewers often sum- information is used to determine the degree to which U is interested marize their overall thoughts in the last sentence of their review. in reading each genre and subsequently the number of books in each For this reason, QBook also analyses the sentiment of the last sen- genre that should be recommended to satisfy U ’s varied interests tence in each review in ER and calculates its overage score, denoted (see Section 3.5). For example, with the described time series genre sW N s, to ensure the real tone of the review is captured. prediction strategy, QBook is able to prioritize the recommendation Sentiment at Sentence Level. In some cases, the polarity of of fantasy books for U (a genre U recently started reading more) a word on its own does not properly capture the intended polar- over comedy books (a genre known to be favored by U in the past), ity of a sentence. Thus, QBook uses CoreN LP [28] which builds even if proportionally U read more comedy than fantasy books. The up a representation of whole sentences based on their structure. described prediction approach provides an additional data point to QBook applies CoreN LP’ parser to extract sentences from ER and further personalize the recommendation process. calculates a sentiment score for each respective sentence. These scores are combined into a single (average) score, denoted cN LPr , 3.5 Curating Book Suggestions which captures the overall sentiment of ER based on the sentiment The last step of QBook’s recommendation process focuses on cu- of its individual sentences. Similar to the data points extracted at rating CB to generate top-K suggestions tailored to U . In this step, word level, QBook also considers the average sentiment of the last QBook’s goal is to emulate the curation process (as defined in [6]) sentence in each review in ER, denoted cN LPs. and become a personal docent that understands U and provides relevant books to read that appeal to his diverse, yet unique, pref- 3.4 Incorporating a Time-Based Component erences. To do so, QBook simultaneously considers different data To better serve their stakeholders, recommenders must predict read- points and builds a model that creates a single score that quantifies ers’ interest at any given time. Users’ preferences, however, tend to the degree to which U prefers b ∈ CB. For model generation, QBook evolve, which is why it is crucial to consider a time component to adopts the Random Forest3 algorithm [7]. create suitable suggestions [44]. QBook examines genre, which is a As part of the curation process, QBook represents b ∈ CB as a vec- category of literary composition, determined by literary technique, tor AU®,b =< rU ,b , CrU ,b , Sim(U , b), sW Nr, sW Ns, cN LPr, cN LPr > tone, content, or even length, from a time-sensitive standpoint. In- which captures the degree of appeal of b for U from multiple per- cluding this component provides the likelihood of reader(s) interest spectives and is used as an input instance to the trained Random in each genre based on its occurrences at a specific point in the Forest to generate the corresponding ranking score for b. Note that past, not only the most recent or the most frequently read one. unlike traditional recommenders that train a model for all the users QBook uses a genre prediction strategy2 that examines a genre dis- in the community, in QBook a random forest model is trained per tribution and applies a time series analysis model, Auto-Regressive user. This allows the model to specifically learn each user’s interests Integrated Moving Average (ARIMA) [32]. In doing so, QBook can similar to what a personal docent would do. discover different areas of U ’s reading preferences along with U ’s Reading Activity. Reading activity varies among users, influ- degree of interest on each of them. encing QBook’s ability to manipulate ranked candidate books for Predicting genre preference to inform the recommendation pro- curation. For non-active readers–who rate less than 35 books- the cess involves examining genres read by U . We first obtain the genre lack of available information can hinder the process of further per- distribution among the books read by U during continuous periods sonalizing suggestions. In this case, QBook generates the top-K of time and estimate a significance score for each genre дn or U recommendations for U by simply ordering the predictions scores |дn, t,b | obtained using the trained Random Forest model on books in CB. at a specific time period t: GenreImportance= |G | , where G t is t For active readers (who read at least 35 books4 ), it is important the set of books read in t дn,t is the frequency of occurrence of a to identify what motivates their reading selections, which can vary specific genre among books in G t , and |G t | is a size of G t . among different readers. For example, some users are biased by Since changes in reading activity between fixed and known peri- experts opinions while others by the preferences of similar-minded ods of time are not constant, QBook applies non-seasonal ARIMA individuals. QBook explicitly considers these individual biases in models. By doing this, QBook is able to determine a model tailored the book selection process for each active reader, leading to fur- to each genre distribution to predict its importance for U in real ther personalize suggestions. If U is an active reader, then QBook time based on its previous occurrences. ARIMA forecasting (i.e., captures correlations among different data points involved in the temporal prediction) model uses a specific genre distribution to predict the likelihood of future occurrences of that genre based on process of creating AU®,b for U . QBook uses Pearson correlation its importance in previous periods of time. This is why our strategy to indicate the extent to which variables fluctuate together (as il- conducts a search over possible ARIMA models that capture user lustrated in Figure 2). By exploring U past rating behavior, QBook preference and selects the one with a best fit for a specific genre can determine the data point that has the most influence on U in distribution in time for U —the one that best describes the pattern 3 Empirically verified that Random Forests are best suited for our curation task;analysis of the time series and explains how the past affects the future. omitted due to page limitations. 4 Analysis of recent statistics on average number of books read by Americans on 2 We first discussed the benefits of explicitly considering changes in user preferences a yearly basis along with examination of rating distributions on development data over time in [11]. influenced our threshold selection for experimental purposes. IntRS Workshop, October 2018, Vancouver, Canada the process of ratings books, i.e., which data point yield the high- the sentiment of the features, since QBook’s intent is not to make U est correlated value with respect to U ’s ratings. This data point is like one option more than another, but to save time on identifying treated as the most important one, in terms of biasing U ’s decision information important for him in the process of choosing the book making process. QBook further re-orders the scores computed for to read. Along with the other users’ opinions, QBook includes in its each book in CB based on the score of the most influential data explanations experts’ opinions on the book’s quality, as described point and thus provides top-K suggestions further tailored for U . in Section 3.3. In other words, QBook includes in the corresponding explanations a sentence from experts’ reviews that also reference users’ top feature of interest. This way, U is provided with objective opinions by extracting sentences from experts’ reviews pertaining to the feature of U . This increases U ’s trust in QBook, since U can read unbiased opinions that help him determine if he would like to read a particular recommendation or not. For its final step for ex- planation generation, QBook looks into the steps taken in curating each book suggestion. For example, if bc was selected based on U ’s rating history, then corresponding explanation includes a simple sentence of the form bc was chosen since it has been highly rated by users with similar rating patterns to yours. If, instead, experts’ opinion had a strong influence in the curation process, then QBook makes sure that the user is aware of it. Figure 2: Correlation among data points in QEval; “actual" The explanation paired with bc includes three sentences specifi- is the rating assigned by a user, “predicted" is the one esti- cally selected for U . While we pick 3 for simplicity purposes, the mated by QBook, color denotes correlation, and the size of number of sentences included in the explanation can be adjusted. the bubbles captures correlation strength. By providing personalized explanations, QBook is able to tell a story Genre Preference. In the case of active readers, the final step of to U about how each suggestion is related him, which increases QBook for curating their suggestions involves explicitly considering users’ trust in the system, as well as the system’s transparency [40]. U ’s genre preferences. This is accomplished using the strategy Unlike the majority of existing strategies, QBook does not act like a described in Section 3.4. To determine the number of representative “black box" to the user since it provides information regarding the candidates from each genre that should be part of the final set selection and curation of the final set of books that are suggested. of books presented to U , QBook relies on the genre preference Therefore, with this step QBook acts as a personal docent for U . distribution calculated using ARIMA time series analysis and the process discussed in Section 3.4. In doing so, QBook can account 4 EVALUATION for the degree to which U will be likely interested in reading each In this section, we discuss the results of the studies conducted to particular genre at the moment the recommendation is generated. validate QBook’s performance and design methodology. The final list of top-K suggestions for U are generated by consid- ering not only ranking scores and user bias, but also by ensuring 4.1 Framework that the genre distribution of books among the K suggested match Dataset. To the best of our knowledge, there is no benchmark that the genre distribution uniquely expected for U . can be used for evaluating the performance of a curation recommen- By performing this curation step, QBook looks for diversity dation systems. Instead we use resources from Social Book Search among the suggestions by including books from all different areas (SBS) Suggestion Track [2], which consists of 2.7 million book titles of users’ interests and improves personalization by ordering sug- along with user reviews and ratings, each with combined book gestions based on a specific feature for U . Consequently, QBook metadata from Amazon.com and LibraryThing. We complement enables U to choose books from the exhibit tailored solely to him this collection with (i) overlapping catalog records from the Library in order to satisfy his reading needs in a given time. of Congress and (ii) experts’ reviews from known book critiques’ Generating Explanations. In order to be a curator, QBook can websites, such as NPR and Editorial Reviews SBS. We called this not only recommend books to read without justifying why the enhanced SBS dataset QData. suggestions were generated. To do so, QBook pairs each recommen- We split QData in three parts: 80% of the users were used for dation with an explanation enabling U to make the most informed training, denoted QTrain, 10% for development, denoted QDevel, and fastest decisions in terms of selecting a single item among the and the remaining 10% for evaluation, denoted QEval. To ensure suggested ones. To generate explanations, QBook uses archived a representative distribution for development and evaluation pur- book reviews, experts’ reviews, and the set of steps taken to gen- poses, we first clustered users from QData based on number of erate curated recommendations and provides U with personalized books read. Thereafter, we created QTrain, QDevel, QEval by ran- and valuable information. domly selecting the same percentage of users from clusters to “sim- QBook creates explanations for a curated book bc by extracting ulate" real representation of QData in each partition. sentences in reviews pertaining to bc that refer to the most impor- Metrics. For recommendation validation, we used the well- tant literary element of interest to U . Note that if there are multiple known NDCG and Ñ RMSE. We also considered: |K R A | Ñ sentence describing the same feature, QBook arbitrarily selects one Coverage= |K Ñ R | , K is the set of books of the collection to be shown to U . More importantly, QBook does not emphasize known to a given user, R is the set of relevant books to a user and A IntRS Workshop, October 2018, Vancouver, Canada Dragovic et al. is the set of recommended books. This metric captures how many Table 2: Aims of explanations in a recommender system of the items from the dataset are being recommended to all users Aim Definition who get recommendations [15]. Ñ |(R ÑA)−K | Effectiveness Help users make good decisions Novelty= |R A | , where K, R, and A are defined as in Cover- Efficiency Help users make decisions faster age. This metric captures how different a recommendation is with Persuasiveness Convince users to try or buy respect to what the user has already seen along with the relevance Satisfaction Increase the ease of usability or enjoyment of the recommended item [41]. Scrutability Allow users to tell the system it is wrong Serendipity, which measures how surprising the recommenda- Transparency Explain how the system works tions are to a user; computed as in [15]. Trust Increase users’ confidence in the system Table 3: Influence of genre preference change over time on the recommendation process;‘*’ significant for p<0.001 t-test Prediction Strategy KL Accuracy Without Time Series 0.663 0.826 4.2 Results & Discussion With Time Series 0.623* 0.870* Without Time Series (3+ genres) 0.720 0.810 Temporal Analysis. Since time-based genre prediction and its With Time Series (3+ genres) 0.660* 0.857* influence in the recommendation process is a novel strategy, we evaluate it in isolation to demonstrate its effectiveness. To do so, We create top-75 recommendations for each user in QDevel us- we used a disjoint set of 1,214 randomly-selected users from the ing the individual strategies defined in Section 3 and evaluate the dataset introduced earlier in the section. We used KL-Divergence, effectiveness of these recommendations based on NDCG6 . which measures how well a distribution q generated by a predic- As shown in Figure 3, matrix factorization and content based ap- tion strategy approximates a distribution p, the ground truth, i.e., proaches are similarly effective in generating suggestions. However, distribution of genres read by a user over a period of time. We also when combined they slightly increase the value of NDCG. This im- used accuracy, a binary strategy that reflects if the predicted genres provement is statistically significant (p < 0.001; t-test), which means correspond to the ones read by a user over a given period of time. In that, in general, users get more relevant recommendations when establishing the ground truth for evaluation purposes, we adopted both strategies are considered in-tandem. This can be explained the well-known N -1 strategy: the genre of the books rated by a with the fact that these two methodologies complement each other. user U in N -1 time frames are used for training U ’s genre predic- Furthermore, we can see that the similarity between literary fea- tion model, whereas the genre of the books rated by U in the N th tures of interest to a user and literary features most often used to time frame are treated as “relevant". As a baseline for this initial describe a book, has a positive influence on the recommendation assessment, we use a naive strategy that defines the importance of process as it increases NDCG by 2.5 % when explicitly considered each genre for U on the current, i.e., N , time frame based on the as part of the recommendation process. This is anticipated, since genre distribution across the previous N -1 time frames. user-generated reviews hold a lot of information that can allow us As shown in Table 3, for N =11 KL divergence scores indicate that to gain knowledge about each user and personalize suggestions. genre distribution predicted using time-series better approximates The most reliable data points, which not only achieve relatively to the ground truth thus leading to better performance. Furthermore, high NDCG but also are widely applicable and do not depend on the probability of occurrence of each considered genre is closer individual users, are the four strategies that analyze sentiment of to the real values when the time component is included in the expert reviews. These strategies rely on information frequently prediction process. We observed differences among users who read available and thus are applicable to the majority of books examined different number of distinct genres. For users who read only one to by QBook. Based on Figure 3, we can see that data points calculated two genres, the time-based prediction strategy does not perform using sentence level sentiment analysis provide slightly better rec- better than the baseline. However, if a user reads three or more ommendations compared to the ones generated using word level genres, our time-based genre prediction strategy outperforms the sentiment analysis. Even though the individual strategies perform baseline in both metrics. This is not surprising, given that it is not relatively well, we cannot assume that each data point can be cal- hard to determine area(s) of interest for a user who constantly reads culated for every single book. QBook’s improvements in terms of only one or two book genres, which is why the baseline performs NDCG can be justified with its ability to: (i) simultaneously con- as good as time-based prediction strategy. Given that users that sider multiple data points, (ii) include genre-prediction strategy, read 3 or more genres represent 91% of the users in the dataset used and (iii) more completely analyze different perspectives of user- in the remaining of the experiments presented in this section, the book relations to provide recommendations even when some of the proposed strategy provides significant improvements in predicting data points are unavailable. This is demonstrated based on the fact preferred genre for the vast majority of readers. that NDCG of QBook is statistically significant with respect to the Overall Performance. We evaluate the individual strategies NDCG reported for the individual strategies (for p < 0.001). that contribute to QBook’s recommendation process and analyze 5 K is set to 7, based on a study presented in [30], where authors argue that a number how each of them influences the generation of book suggestions. of objects an average human can hold in working memory is 7 ± 2. 6 Q Devel and Q Eval yield comparable scores, indicating consistence in performance regardless of the data use for experimentation and no overfitting. IntRS Workshop, October 2018, Vancouver, Canada Figure 3: Performance evaluation of individual recommendation strategies considered by QBook on QDevel and QEval. To further showcase the validity of Qbook’s design methodol- serendipity and novelty. Based on the results of our analysis, we ogy, we compare its performance with two baselines: SVD (matrix observe that the performance of QBook is consistent, regardless of factorization) and CB (content-based). For their respective imple- the presence or absence of data points used in the recommendation mentations we rely on LensKit. The significant (p < 0.01) NDCG process. QBook’s RMSE (Table 4) indicates that its recommenda- improvement of QBook, with respect to SVD (0.874) and CB (0.856), tion strategy can successfully predict users’ degree of preference demonstrates that, in general, recommendations provided by QBook for books. QBook’s coverage score (0.92), highlights that QBook are preferred over the ones provided by the baselines, which either considers a vast number of diverse books as potential recommenda- consider ratings patterns or content, but not both. tion, as opposed to popular ones. The novelty score (0.73) depicts Recommendation Explanations. There is no gold-standard that a user is provided with suggestions that differ from what he to evaluate explanations offline. Thus, following the strategy in already saw. This characteristic of QBook, together with relatively [16, 39] we conducted an initial qualitative evaluation for demon- high serendipity (0.68), indicates that new and unexpected, yet strating the usefulness of the explanations generated by QBook. relevant, suggestions are generated. We rely on the criteria introduced in [39] and summarized in Table State-of-the-art. We compare QBook with other book recom- 2, which outlines the “characteristics" of good explanations for rec- menders (optimal parameters were empirically defined). ommenders. QBook achieves five out of the seven criteria expected LDAMF [29] harnesses the information in review text by fitting of explanations generated by recommenders.By suggesting curated an LDA model on the review text. books which are described based on users’ preferred features of CTR [43] uses a one-class collaborative filter strategy. Even interest, showcasing opinions of other users on those features and though it is not specifically created for books, we consider it as it describing curation steps, QBook addresses transparency. QBook exploits metadata comparable for that of books. inspires trust on its users, since it does not consider the sentiment HFT combines reviews with ratings [29] and models the rat- connotation of the features to determine if they should be included ings using a matrix factorization model to link the stochastic topic in an explanation. Instead, QBook provides unbiased recommen- distribution in review text and the latent vector in the ratings. dations and explanations; which can increase users’ confidence as SVD++ [23] refers to a matrix factorization model which makes they know QBook offers a real depiction of each suggested book. use of implicit feedback information. Users are also able to make good and fast decisions, in terms of URRP [20], is a Bayesian model that combines collaborative and selecting books among the suggested ones, since based on provided content-based filtering to learn user rating and review preferences. explanations they can infer which books match their current prefer- ‘Free Lunch’ [26] leverages clusters based on information that is ences. With this, QBook increases its effectiveness. Given that users’ present in the user-item matrix, but not directly exploited during overall satisfaction with a recommender is related to the perceived matrix factorization. quality of its recommendations and explanations [16], QBook users RMR [25], which combines baselines by using the information can appreciate not having to spend more time researching books of both ratings and reviews. with characteristics important to them. In Table 4 we summarize the results of the evaluation conducted As per the study in [39] and assessments on several explanation- using QEval in Table 4 in terms of RMSE. QBook outperforms generation strategies [19, 31, 42, 46], we can report that, on average, existing state-of-the-art book recommenders considered in this only two (out of seven) criteria are satisfied. The only strategy that is study in terms of predicting the degree of which a user would like to comparable to QBook’s is the one discussed in [46], which addresses read each recommended book. The difference on RMSE computed five of the criteria. However, this strategy involves sentiment in the for QBook with respect to aforementioned state-of-the-art book generation of the explanations, as opposed to QBook which makes recommenders are statistically significant with p < 0.001. unbiased decisions when identifying users’ features of preference The prediction power of QBook is evidenced by its ability to and selecting which sentences to use to describe these features. achieve lowest RMSE among state-of-the-art approaches. When Common Recommendation Issues. We showcase QBook’s analyzing the performance of different strategies in more detail, we ability to address popular recommendation issues based on RMSE, can see that Matrix Factorization strategies perform better, as in in addition to adopting the evaluation framework presented in the case of Free Lunch (with and without clustering) and SVD++. [27] to simulate online evaluation using offline metrics: coverage, IntRS Workshop, October 2018, Vancouver, Canada Dragovic et al. However, QBook goes beyond matrix factorization by using a con- [13] A. Esuli and F. Sebastiani. Sentiwordnet: A publicly available lexical resource for tent based approach as well as involving different perspectives, opinion mining. In LREC, volume 6, pages 417–422, 2006. [14] A. L. Garrido and S. Ilarri. Tmr: a semantic recommender system using topic including other users’ and experts’ reviews. maps on the items’ descriptions. In ESWC, pages 213–217. Springer, 2014. [15] M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating Table 4: QBook vs. state-of-the-art recommenders recommender systems by coverage and serendipity. In ACM RecSys, pages 257– 260. ACM, 2010. Strategy RMSE Strategy RMSE [16] F. Gedikli, D. Jannach, and M. Ge. How should i explain? a comparison of different explanation types for recommender systems. International Journal of QBook 0.795 Human-Computer Studies, 72(4):367–382, 2014. RMR 1.055 Free Lunch 0.933 [17] S. Givon and V. Lavrenko. Predicting social-tags for cold start book recommen- LDAMF 1.053 Free Lunch w/ Clustering 0.825 dations. In ACM RecSys, pages 333–336. ACM, 2009. CTR 1.052 SVD++ 0.908 [18] J. L. Herlocker, J. A. Konstan, and J. Riedl. Explaining collaborative filtering recommendations. In ACM CSCW, pages 241–250. ACM, 2000. HFT 1.066 URRP 1.104 [19] A. Hernando, J. Bobadilla, F. Ortega, and A. GutiéRrez. Trees for explaining recommendations made through collaborative filtering. Information Sciences, 239:1–17, 2013. 5 CONCLUSIONS & FUTURE WORK [20] M. Jiang, D. Song, L. Liao, and F. Zhu. A bayesian recommender model for user We presented QBook, a book recommender that acts as a curator by rating and review profiling. Tsinghua Science & Technology, 20(6):634–643, 2015. [21] G. Kazai, D. Clarke, I. Yusof, and M. Venanzi. A personalised reader for crowd showcasing tailored book selections that meet the reading needs of curated content. In ACM RecSys, pages 325–326. ACM, 2015. individual users.As part of its recommendation process, QBooks ex- [22] D. Kislyuk, Y. Liu, D. Liu, E. Tzeng, and Y. Jing. Human curation and con- vnets: Powering item-to-item recommendations on pinterest. arXiv preprint amines different areas of user interest, not only the most dominant arXiv:1511.04003, 2015. or recent ones, as well as varied data points. In doing so, QBook can [23] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative yield a diverse set of suggestions, each paired with an explanation, filtering model. In ACM SIGKDD, pages 426–434. ACM, 2008. [24] Y. Koren, R. Bell, C. Volinsky, et al. Matrix factorization techniques for recom- to provide a user not only with reasons why a book was included mender systems. Computer, 42(8):30–37, 2009. in the curated list of recommendations but also how each recom- [25] G. Ling, M. R. Lyu, and I. King. Ratings meet reviews, a combined approach to mendation was selected, with the objective of enhancing trust and recommend. In ACM RecSys, pages 105–112. ACM, 2014. [26] B. Loni, A. Said, M. Larson, and A. Hanjalic. ’free lunch’enhancement for col- transparency towards the user. laborative filtering with factorization machines. In ACM RecSys, pages 281–284. We conducted a number of offline experiments to validate the ACM, 2014. [27] A. Maksai, F. Garcin, and B. Faltings. Predicting online performance of news performance of QBook using a popular dataset. We also demon- recommender systems through richer evaluation metrics. In ACM RecSys, pages strated the importance of considering diverse data sources, beyond 179–186. ACM, 2015. ratings or content, to enhance the recommendation process. [28] C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky. The stanford corenlp natural language processing toolkit. In ACL (System Demon- With this work, we set the algorithmic foundations that will strations), pages 55–60, 2014. allow us to conduct in-depth online experiments in the future, in [29] J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding order to quantify the usability of QBook, the value of its expla- rating dimensions with review text. In ACM RecSys, pages 165–172. ACM, 2013. [30] G. A. Miller. The magical number seven, plus or minus two: Some limits on our nations, and the degree of which its curation strategy can enrich capacity for processing information. Psychological Review, 63(2):81, 1956. the recommendation process from a user’s perspective. Given the [31] J. Misztal and B. Indurkhya. Explaining contextual recommendations: Interaction design study and prototype implementation. In IntRS@RecSys, pages 13–20, 2015. domain-independent nature of our strategies, we plan to validate [32] R. Nau. Introduction to arima models.duke university. http://people.duke.edu/ QBook on datasets other than books to demonstrate its applicability ~rnau/411arim.htm. Accessed: 2016-05-06. on domains other than books. Our goal is to go one step further [33] B. Ohana and B. Tierney. Sentiment classification of reviews using sentiwordnet. 9th IT & T Conference, 2009. and enable our personal curator to generate suggestions in multiple [34] M. S. Pera and Y.-K. Ng. Automating readers’ advisory to make book recommen- domains, based on a complete virtual footprint available for a user. dations for k-12 readers. In ACM RecSys, pages 9–16. ACM, 2014. [35] M. S. Pera and Y.-K. Ng. Analyzing book-related features to recommend books for emergent readers. In ACM HT, pages 221–230. ACM, 2015. REFERENCES [36] F. Ricci, L. Rokach, and B. Shapira. Introduction to recommender systems handbook. [1] Content based recommendation system. Available at: http://eugenelin89.github. Springer, 2011. io/recommender_content_based/. [37] Z. Saaya, R. Rafter, M. Schaal, and B. Smyth. The curated web: a recommendation [2] INEX Amazon/LibraryThing book corpus. http://social-book-search.humanities. challenge. In ACM RecSys, pages 101–104. ACM, 2013. uva.nl/data/ALT_Nondisclosure_Agreements.html. Accessed: 2016-02-07. [38] Y. Teng, L. Zhang, Y. Tian, and X. Li. A novel fahp based book recommendation [3] LensKit open-source tools for recommender systems. https://lenskit.org. method by fusing apriori rule mining. In ISKE, pages 237–243. IEEE, 2015. [4] C. Benkoussas, A. Ollagnier, and P. Bellot. Book recommendation using informa- [39] N. Tintarev and J. Masthoff. A survey of explanations in recommender systems. tion retrieval methods and graph analysis. In CLEF. CEUR, 2015. In IEEE ICDEW, pages 801–810, 2007. [5] R. Blanco, D. Ceccarelli, C. Lucchese, R. Perego, and F. Silvestri. You should read [40] N. Tintarev and J. Masthoff. Evaluating recommender explanations: problems this! let me explain you why: explaining news recommendations to users. In experienced and lessons learned for the evaluation of adaptive systems. In ACM CIKM, pages 1995–1999. ACM, 2012. UCDEAS Workshop associated with UMAP. CEUR-WS, 2009. [6] C. Borrelli. Everybody’s a curator. chicago tribune. https://goo.gl/hpTF3Q. [41] S. Vargas and P. Castells. Rank and relevance in novelty and diversity metrics Accessed: 2015-12-06. for recommender systems. In ACM RecSys, pages 109–116. ACM, 2011. [7] L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. [42] J. Vig, S. Sen, and J. Riedl. Tagsplanations: explaining recommendations using [8] S. Channamsetty and M. D. Ekstrand. Recommender response to diversity and tags. In IUI, pages 47–56. ACM, 2009. popularity bias in user profiles. In AAAI FLAIRS, pages 657–660, 2017. [43] C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific [9] L. Chen and F. Wang. Sentiment-enhanced explanation of product recommenda- articles. In TACM SIGKDD, pages 448–456. ACM, 2011. tions. In WWW, pages 239–240. ACM, 2014. [44] J. Wang and Y. Zhang. Opportunity model for e-commerce recommendation: [10] Dictionary. Oxford: Oxford university press, 1989. right product; right time. In ACM SIGIR, pages 303–312, 2013. [11] N. Dragovic and M. S. Pera. Genre prediction to inform the recommendation pro- [45] M. C. Willemsen, M. P. Graus, and B. P. Knijnenburg. Understanding the role cess. Proceedings of the Poster Track of the 10th ACM Conference on Recommender of latent feature diversification on choice difficulty and satisfaction. UMUAI, Systems, 2016. 26(4):347–389, 2016. [12] N. Dragovic and M. S. Pera. Exploiting reviews to generate personalized and [46] Y. Zhang, G. Lai, M. Zhang, Y. Zhang, Y. Liu, and S. Ma. Explicit factor models justified recommendations to guide users’ selection. In AAAI FLAIRS, pages for explainable recommendation based on phrase-level sentiment analysis. In 661–664, 2017. ACM SIGIR, pages 83–92. ACM, 2014.