=Paper=
{{Paper
|id=Vol-1448/paper2
|storemode=property
|title=Exploiting Regression Trees as User Models for Intent-Aware Multi-attribute Diversity
|pdfUrl=https://ceur-ws.org/Vol-1448/paper2.pdf
|volume=Vol-1448
|dblpUrl=https://dblp.org/rec/conf/recsys/TomeoNGLSS15
}}
==Exploiting Regression Trees as User Models for Intent-Aware Multi-attribute Diversity==
Exploiting Regression Trees as User Models for Intent-Aware Multi-attribute Diversity Paolo Tomeo1 , Tommaso Di Noia1 , Marco de Gemmis2 , Pasquale Lops2 , Giovanni Semeraro2 , Eugenio Di Sciascio1 1 Polytechnic University of Bari – Via Orabona, 4 – 70125 Bari, Italy 2 University of Bari Aldo Moro – Via Orabona, 4 – 70125 Bari, Italy 1 {firstname.lastname}@poliba.it 2 {firstname.lastname}@uniba.it ABSTRACT foster the user satisfaction as it increases the odds of finding Diversity in a recommendation list has been recognized as relevant recommendations [1]. one of the key factors to increase user’s satisfaction when Here our focus is on both the individual (or intra-list) di- interacting with a recommender system. Analogously to the versity, namely the degree of dissimilarity among all items modelling and exploitation of query intent in Information in the list provided to a user, and the aggregate diversity Retrieval adopted to improve diversity in search results, in [3], namely the number and distribution of distinct items this paper we focus on eliciting and using the profile of a recommended across all users. The item-to-item dissimi- user which is in turn exploited to represent her intents. The larity can be evaluated by using content-based attributes model is based on regression trees and is used to improve (e.g. genre in movie and music domains, product category personalized diversification of the recommendation list in a in e-commerce) [18] or statistical information (e.g. number multi-attribute setting. We tested the proposed approach of co-ratings) [23]. Usually, approaches to the diversifica- and showed its effectiveness in two different domains, i.e. tion take into account only one single attribute while, in the books and movies. approach we present here, multiple attributes are selected to describe the items. The rationale behind this choice is that we believe there are numerous and heterogeneous item Categories and Subject Descriptors dimensions conditioning user’s interests and choices. More- H.3.3 [Information Systems]: Information Search and Re- over, depending on the user these dimensions may interact trieval with each other thus contributing to the creation of her in- tents. The question is how to tackle multiple attributes to Keywords address the diversification problem. In this paper we use regression trees as user modeling tech- Personalized diversity; Intent-aware diversification; Regres- nique to infer the individual interests, useful to provide an sion Trees intent-aware diversification. Compared to approaches where item attributes are treated independently one to each other, 1. INTRODUCTION regression trees make possible to represent user tastes as a In the recent years, diversification has gained more and combination of interrelated characteristics. For instance, a more importance in the field of recommender systems. En- user could have a preference for horror movies of the 80s gines able to get excellent results in terms of accuracy of irrespective of the director, or for horror movies of the 90s results have been proved to be not effective when we con- directed by a a specific director. In a regression tree, con- sider other factors related to the quality of user experience ditional probability lets to build such inference rules about [10]. As a matter of fact, when interacting with a system user’s preferences. We conducted experiments on the movie exposing a recommendation service, the user perceives as and on the book domains to empirically evaluate our ap- good suggestions those showing also an appropriate degree proach. The performance was measured in terms of accuracy of diversity, novelty or serendipity, just to cite a few. The and both individual and aggregate diversity. attitude of populating the recommendation list with sim- The main contributions of this paper are: ilar items could exacerbate the over-specialization problem • a novel intent-aware diversification approach able to that content-based recommender systems tend to suffer from combine multiple attributes. It bases on the use of [9], even though it appears also in collaborative-filtering ap- regression trees (and rules) to infer and encode the proaches. Improving diversity is generally a good choice to model of users’ interests; • a novel method to combine different diversification ap- proaches; • an experimental evaluation which shows the perfor- mance of the proposed approaches with respect to both accuracy and diversity measures. The paper is organized as follows. Section 2 describes the CBRecSys 2015, September 20, 2015, Vienna, Austria. greedy approach to diversification problem, the xQuAD al- Copyright remains with the authors and/or original copyright holders. gorithm and some evaluation metrics. We then continue in Section 3 by showing how to face the multi-attribute diver- where p(i|f ) represents the likelihood of item i being chosen sification and how to leverage regression trees in the diversi- given the feature f while p(f|u) represents the user interest fication process with xQuAD to provide more personalized in the feature. recommendations. Section 4 describes the experimental con- A number of measures have been proposed to evaluate the figuration and the datasets used for the experiments while diversity in a recommendation list. Smyth and McClave [17] Section 5 presents and describes the experimental results, proposed the ILD (Intra-List Diversity), that computes the showing the competitive performance of the proposed ap- average distance between each couple of items in the list L: proach. In Section 6 we review the related work at the best 1 X of our knowledge. Conclusions close the paper. ILD(L) = (1 − sim(i, j)) (3) |L| (|L| − 1) i,j∈L,i6=j 2. DIVERSITY IN RECOMMENDATIONS The sim function is a configurable and application-dependent component which can use content-based item features or sta- The recommendation step can be followed by a re-ranking tistical information (e.g. number of co-ratings) to compute phase finalized to improve other qualities besides accuracy the similarity between items. We used also the metric α- [3]. Some of re-ranking approaches proposed so far are nDCG, that is the redundancy-aware variant of Normalized based on greedy algorithms designed to handle the balance Discounted Cumulative Gain proposed in [5]. We adopt the between accuracy and diversity in a recommendations list adapted version for recommendation proposed in [16]: [26]. Their scheme of work is explained through Algorithm 1, where P = h1, ..., ni is the recommendation list for user u |L| P cov(L,f,r−1) 1 f ∈F (Lr ) (1 − α) X generated using the predicted ratings and the output is the α-nDCG(L, u) = α-iDCG r=1 log2 (1 + r) re-ranked list S of recommendations, such that S ⊂ P and whose length is N ≤ n. (4) where cov(L, f, r − 1) is the number of items ranked up to Data: The original recommendation list P, N ≤ n Result: The re-ranked recommendation list S position r − 1 containing the feature f . F (Lr ) represents the set of features of the r-th item. The α parameter is used 1 S = hi; 2 while | S | ≤ N do to balance the emphasis between relevance and diversity. α- i∗ = argmax fobj (i, S); iDCG denotes the value of α-nDCG for the best “ideally” 3 i∈P\S 4 S = S ◦ i∗ ; diversified list. Considering that the computation of the 5 P = P \ {i∗ } ideal value is NP-complete [5], we adopt a greedy approach: 6 end at each step we select solely the item with the highest value, 7 return S. regardless of the next steps. Algorithm 1: The greedy strategy 3. INTENT-AWARE MULTI-ATTRIBUTE At each iteration, the algorithm selects the item maximiz- ing the objective function fobj (line 3) – which in turn can DIVERSITY be defined to deal with the trade-off between accuracy and In this section we show how we address the intent-aware diversity – and then adds it to the re-ranked list (line 4). diversity problem when dealing with multi-attribute item For our purpose, we focus on the intent-aware approach descriptions. The presentation relies on content-based at- xQuAD (eXplicit Query Aspect Diversification), with the tributes (e.g. genres, years, etc. in the movies domain), aim to diversify the user intents. It was proposed for search but the proposed approach can be used independently of diversification in information retrieval by Santos et al. [15], the attributes types. Therefore, one could also use statisti- as a probabilistic framework to explicitly model an ambigu- cal information as item attributes, e.g., popularity or rating ous query as a set of sub-queries that will cover the poten- variance. As explained in the previous section, we refer to tial aspects of the initial query. Then it was adapted for features as possible instances of a generic attribute. We recommendation diversification by Vargas and Castells [20], tried different reformulations of the div function in xQuAD replacing query and relative aspects with user and items (Equation 2) to deal with multi-attribute values. After an categories, respectively. Hereafter we refer to generic item empirical evaluation, we chose the best div ma (for multi- features - such as categories - as features, considering the attribute) in terms of accuracy-diversity balance: features as possible instances of a generic attribute. More formally, xQuAD greedily selects diverse recommen- P f ∈dom(A) p(i|f )p(f |u)(1 − avgj∈S p(j|f )) ma X dations maximizing the following objective function: div (i, S, u) = P A∈A f ∈dom(A) p(f |u) fobj (i, S, u) = λ r∗ (u, i) + (1 − λ)div(i, S, u) (1) (5) ∗ where: with r (u, i) being the score predicted by the baseline recom- mender; the λ parameter allowing to manage the accuracy- • A is the set of attributes; diversity balance, where higher values give more weight to • for each attribute A ∈ A and each feature in the at- accuracy, while lower values give more weight to diversity. tribute domain f ∈ dom(A), p(i|f ) represents the im- The last component in Equation 1 promotes the diversity, portance of f for the item i. It is computed as a binary providing a measure of novelty with respect to the items function that returns 1 if the item contains f , 0 other- already selected in S. As for the function div(i, S, u), the wise; original formulation in [20] is: X Y • p(f |u) represents the importance of the feature f for div orig (i, S, u) = p(i|f )p(f |u) (1 − p(s|f )) (2) the user u and is computed as the relative frequency f s∈S of the feature f on the rated items from the user u. Here after we will refer to xQuAD using Equation 5 as basic • DivRT. p(j|m) is the average similarity between m xQuAD. and each rule covered by item j. More formally: Besides dealing with multi-attribute descriptions, the idea p(j|m) = avgm0 ∈M (u,j) sim(m, m0 ) (8) behind our approach is to infer and model the user profile The rationale behind this formulation is that some by means of a regression tree, a predictive model where the rules may be similar with each other thus not bring- user interest represents the target variable, which can take ing any actual diversification if considered separately. continuous values. Once a regression tree is produced for a The computation of sim(m, m0 ) takes into account the user u, then it is converted into a set of rules RT (u). Each overlapping between the rules m and m0 as follows: rule maps the presence/absence of a categorical feature or a P 0 constraint on a numerical one to a value v in a continuous 0 ci ∈body(m) overlap(m, m , ci ) sim(m, m ) = interval. This latter indicates the predicted interest of the max(|body(m)|, |body(m0 )|) user on the items satisfying the rule. In our implementation we used the interval [1, 5] since the value of the target vari- For instance, considering the attributes represented in able has been calculated as the rating mean of the training Figure 1, we have for actor, genre and director: instances classified by the inferred rule. Please note that the 1, ci ∈ body(m) ∧ ci ∈ body(m0 ) choice of a specific value interval for the target variable does 0 overlap(m, m , ci ) = not affect the overall approach. Each rule m has then the 0, otherwise form For the numerical attribute year we may adopt a dif- body(m) 7→ interest = v ferent formulation for the function overlap(m, m0 , ci ). with body(m) = {c1 , . . . , cn }. An example of a set of rules Here we compute, if any, the overlap between the in- produced for a user is shown in Figure 1. terval in body(m) and the one in body(m0 ) normalized with respect to maximum interval’s length. As an ex- 1. {horror ∈ dom(genres), western ∈/ dom(genres), DarioArgento ∈ dom(directors)} 7→ interest = 4.2 ample, if year > 1990 is in body(m) and year < 2010 is in body(m0 ) we may define the overlapping function |1990−2010| 2. {horror ∈/ dom(genres), thriller ∈ dom(genres)} as overlap(m, m0 , ci ) = max(dom(year))−min(dom(year)) . 7→ interest = 2.1 3. {year > 1990, horror ∈ / dom(genres), The functions introduced above have been used in the drama ∈ dom(genres), Aronof sky ∈ dom(directors)} 7→ interest = 4.0 experimental setting in order to compute the function overlap(m, m0 , ci ) (see Section 4). 4. {year < 1990, drama ∈ dom(genres), AlP acino ∈ dom(actors)} 7→ interest = 3.9 RT and DivRT can be used instead of the basic xQuAD as 5. {horror ∈ / dom(genres)} 7→ interest = 3.2 diversification algorithms in the re-ranking phase. Alterna- tively, basic xQuAD and RT or DivRT can be pipelined to Figure 1: Example of a set of rules generated via benefit from the strengths of them both. For instance, one the regression tree could use xQuAD to select 50 diversified recommendations and then RT to select 20 recommendations from those 50, Eventually, under the assumption that they represent spe- or vice versa. Hereafter, we use the syntax X-after-Y, e.g. cific user interests, the computed rules are used in the re- xQuAD-after-RT, to indicate that algorithm X is executed ranking phase as item features to improve the intent-aware on the results of Y. recommendation diversity. We propose also a div function for xQuAD so that each 4. EXPERIMENTS item is evaluated according to the rules it satisfies. We carried out a number of experiments to evaluate the X performance of the methods presented in the Section 3 on div rules (i, S, u) = p(m|u)(1 − avgj∈S p(j|m)) (6) two well known datasets: MovieLens1M and LibraryThing. m∈M (u,i) MovieLens 1M1 dataset contains 1 million ratings from Here M (u, i) represents the set of rules for the user u matched 6,040 users on 3,952 movies. The original dataset contains by the item i while p(m|u) represents the importance of the information about genres and year of release, and was en- rule m for u and is computed as: riched with further attribute information such as actors and directors extracted from DBpedia2 . More details about this DBpedia enriched version of the dataset are available in [11]. interestm p(m|u) = (7) Because not all movies have a corresponding resource in DB- |M (u, i)| pedia, the final dataset contains 998,963 ratings from 6,040 In Equation 7, interestm is the normalized predicted out- users on 3,883 items. We built training and test sets by come of the regression tree for the rule m. Finally, the last employing a 60%-40% temporal split for each user. component in Equation 6 indicates the complement of the Moreover, we used the LibraryThing3 dataset, which con- coverage of the rule among the already selected recommen- tains more than 2 million ratings from 7,279 users on 37,232 dations. We propose two different versions of this adapted books. As in the dataset there are many duplicated ratings, xQuAD. 1 Available at http://grouplens.org/datasets/movielens 2 • RT. p(j|m) is a binary function that returns 1 if the http://dbpedia.org 3 item j matches the rule, 0 otherwise. Available at http://www.macle.nl/tud/LT when a user has rated more than once the same item, we se- 4.1 Experimental Configuration lected her last rating. The unique ratings are 749,401. Also For both datasets, we used the Bayesian Personalized Rank- in this case, we enriched the dataset by mapping the books ing Matrix Factorization algorithm (BPRMF) available in with BaseKB4 , the RDF version of Freebase5 and then ex- MyMediaLite9 as baseline (using the default parameters). tracting three attributes: genre, author and subjects. The We performed experiments using other recommendation al- subjects in Freebase represent the topic of the book, for in- gorithms, but we do not report results here since they are stance Pilot experiment, Education, Culture of Italy, Martin very similar to those obtained by BPRMF. Luther King and so on. The dump of the mapping is avail- We selected the top-200 recommendations for each user to able online6 . The final dataset contains 565,310 ratings from generate the initial list P used for performing the re-ranking 7,278 users on 27,358 books. We built training and test sets as shown in Algorithm 1. by employing a 80%-20% hold-out split. The different ratio Accuracy is measured in terms of Precision, Recall and used for LibraryThing respect to Movielens (60%-40%) de- nDCG, but we only report nDCG values since the trend of pends on its higher sparsity: holding 80% to build the user the other two metrics is very similar. Individual diversity profile ensures a sufficient number of ratings to train the is measured using ILD and α-nDCG (see Section 2) with system. α = 0.5 to equally balance diversity and accuracy, while aggregate diversity is measured using both the catalog cov- Movielens LibraryThing erage – computed as the percentage of items recommended Number of users 6,040 7,278 at least to one user – and the entropy – computed as in Number of items 3,883 27,358 [3] to analyse the distribution of recommendations among Number of ratings 998,963 565,310 all users. These two last metrics need to be considered to- Data sparsity 95.7% 99.7% gether, since the coverage gives a indication about the ability Avg users per item 275.57 20.66 of a recommender to cover the items catalog and the entropy Avg items per user 165.39 77.68 shows the ability to equally spread out the recommendations across all the items. Hence, only an improvement of both Table 1: Statistics about the two datasets those metrics indicates a real increasing of aggregate diver- sity, that in turn denotes a better personalization of the Since the number of distinct values was too large for year, recommendations [3]. actors and director attributes in MovieLens and for all the As similarity measure for computing the ILD metric (Equa- attributes in LibraryThing, we convert years in the corre- tion 3) we used the Jaccard index. Considering that there sponding decades and performed a K-means clustering for are more attributes for each item, we computed the average other attributes on the basis of DBpedia categories7 for of the Jaccard index value for each attribute shared between MovieLens and Freebase categories8 for LibraryThing. Ta- two items. α-nDCG is computed as the average of the Equa- ble 2 and 3 report the number of attribute values and clus- tion 4 for each attribute. ters. The number of clusters was decided according to the As presented in Section 3, we propose two novel diver- calculation of the within-cluster sum of squares (withinss sification approaches: RT and DivRT. We also propose a measure from the R Stats Package, version 2.15.3), that is method to combine in sequence different algorithms by means picking the value of K corresponding to an evident break in of a two phase re-ranking procedure, with the aim of ben- the distribution of the withinss measure against the number efiting from the strengths of both. Therefore we evalu- of clusters extracted. ated other two approaches: xQuAD-after-RT and RT-after- Num. Values Num. Clusters xQuAD, applying the second re-ranking phase on the set Genres 19 - of 50 recommendations provided from the first phase. We Decades 10 - have also evaluated the combination with xQuAD and Di- Actors 14736 20 vRT, but the results are very similar using RT, so they will not be shown. To evaluate the performances, we compare Directors 3194 20 the top-10 recommendation list generating from all the ap- proaches with basic xQuAD, by varying the λ parameter Table 2: Statistics about MovieLens attributes from 0 to 0.95 with step fixed to 0.05 in Equation 1 (higher values of λ give more weight to accuracy, lower values to Num. Values Num. Clusters diversity). Genres 270 30 The rules are produced using M5Rules10 algorithm avail- Authors 12868 22 able in Weka based on the M5 algorithm proposed by Quin- Subjects 2911 20 lan [12] and improved by Wang and Witten [22]. M5Rules generates a list of rules for regression problems using a Table 3: Statistics about LibraryThing attributes separate-and-conquer learning strategy. Iteratively it builds a model tree using M5 and converts the best leaf into a rule. We decided to use unpruned rules in order to have more rules matchable with the items. 4 http://basekb.com 5 https://www.freebase.com 6 9 URL removed to guarantee anonymous submission. http://mymedialite.net/ 7 10 http://purl.org/dc/terms/subject http://weka.sourceforge.net/doc.dev/weka/ 8 http://www.w3.org/1999/02/22-rdf-syntax-ns#type classifiers/rules/M5Rules.html 5. RESULTS DISCUSSION accurate recommendations, with λ > 0.65, but the differ- Results of the experiments on MovieLens and Library- ences are not statistically significant. In terms of individual Thing are reported in Figure 2 and 3, respectively. diversity, all of them are able to overcome the baseline (ILD MovieLens. xQuAD obtains the best results in terms = 0.4 and α-nDCG = 0.285) except when using the pure of ILD (Figure 2(a)) and α-nDCG (Figure 2(b)), though rule-based approaches in terms of ILD. However they are the xQuAD-after-RT results are very close and, with higher able to improve α-nDCG. For the latter two metrics, the λ values (namely giving more importance to the accuracy differences are always statistically significant (p < 0.001). factor), the differences between them are not significant. In terms of aggregate diversity, xQuAD does not improve This outcome is due to the fact that the diversity metrics the baseline result (coverage = 0.15 and α-nDCG = 0.77), are attribute-based and xQuAD operates directly diversi- while using the rules leads to better results. According to fying the attributes values, while the proposed rule-based a comprehensive analysis on LibraryThing, the pure rule- approaches do not take into account all the attributes val- based approaches may give more personalized recommenda- ues. This also explains why the pure rule-based approaches tions with a better diversity, especially using RT, with also (RT and DivRT) obtain the worst diversity results, while a small accuracy loss. Similarly to the analysis on Movie- the combined algorithms (xQuAD-after-RT and RT-after- Lens, the results on LibraryThing suggest that diversifying xQuAD) obtain better results. It is noteworthy that these with only the rules is a good choice when aggregate diver- last two configurations have no substantial difference with sity is more important than individual diversity, conversely ILD, but, in terms of α-nDCG, xQuAD-after-RT consider- xQuAD remains the best choice to improve the individual ably overcomes RT-after-xQuAD. This demonstrates that diversity and combined with the rule-based diversification the pipeline of xQuAD and the rule-based approach ob- improves also the aggregate diversity. tains good diversity. Considering coverage (Figure 2(c)) The final conclusions of this analysis are that using a re- and entropy (Figure 2(d)) to evaluate the aggregate diver- gression tree to infer rules representing user interests on sity, the results show that using the rules the recommen- multi-attribute values in the diversification process with dations are much more personalized. It is interesting to xQuAD leads to more personalized recommendations but note the compromise provided by xQuAD-after-RT, that with a less diversified list and that combining attribute- obtains equidistant results between xQuAD and the rule- based and rule-based diversifications in two phase re-ranking based algorithms, unlike RT-after-xQuAD that slightly over- is a good way for taking the advantages of both. The bet- comes xQuAD. With respect to the baseline, no configura- ter degree of personalization may depend on the fact that tion is able to give more accurate recommendations (nDCG the rules are different among the users since they represents = 0.14); all are able to increase the individual diversity their individual interests. The lower individual diversity val- (ILD = 0.34 and α-nDCG = 0.27). With nDCG and the ues with ILD and α-nDCG are due to the nature of these individual diversity, the differences are always statistically metrics which are based directly on the attributes values significant (p < 0.001), except using the pure ruled-based while the pure rule-based approaches do not take into ac- approaches with λ > 0.65. The situation is more complex count all the attributes values. in terms of aggregate diversity, since the coverage grows very little on the baseline (coverage = 0.29) and the entropy 6. RELATED WORK slightly decreases (entropy = 0.78) with higher λ values. Ac- There is a noteworthy effort by the research community in cording to a comprehensive analysis on MovieLens, the pure addressing the challenge of recommendation diversity. That rule-based approaches may give personalized and diversified interest arises from the necessity of avoiding monotony in recommendations, also with small accuracy loss. However, recommendations and controlling the balance between accu- when individual diversity is more important than aggregate racy and diversity, since increasing diversity inevitably puts diversity, combining xQuAD with a previous rule-based re- accuracy at risk [25]. However, a user study in the movie ranking gives a good compromise between individual and domain [7] demonstrates that user satisfaction is positively aggregate diversity. dependent on diversity and there may not be the intrinsic LibraryThing. At first glance, the LibraryThing results trade-off when considering user perception instead of tradi- appear similar to those on MovieLens. Although they are tional accuracy metrics. generally consistent, there are interesting differences. Also Typically, the proposed approaches aim to replace items in this case, xQuAD obtains the best diversity values, with in an already computed recommendation list, by minimizing ILD (Figure 3(a)) and α-nDCG (Figure 3(b)). However, the similarity among all items. Some approaches exploit a both the combined approaches obtain really interesting re- re-ranking phase with a greedy selection (see Section 2), for sults, very close to xQuAD, except for the lower λ val- instance [18], or with other techniques such us the Swap al- ues (namely giving more importance to the diversification gorithm [23], which starts with a list of K scoring items and factor). Unlike what happens on MovieLens, in this case swaps the item which contributes the least to the diversity RT-after-xQuAD obtains good results also in terms of α- of the entire set with the next highest scoring item among nDCG. The pure rule-based approaches still obtain worse the remaining items, by controlling the drop of the overall results. Considering coverage (Figure 3(c)) and entropy relevance by a pre-defined upper bound. (Figure 3(d)) to evaluate the aggregate diversity, the results Other types of approaches try to directly generate diver- show that using the rules the recommendations are much sified recommendation lists. For instance, [2] proposes a more personalized than using only xQuAD. The combined probabilistic neighborhood selection in collaborative filter- approaches are able to improve the aggregate diversity with ing for selecting diverse neighbors, while in [16], an adaptive respect to xQuAD, albeit they are still distant from the pure diversification approach is based on Latent Factor Portfolio rule-based approaches, especially in terms of coverage. With model for capturing the user interests range and the uncer- respect to the baseline, all configurations give a little more tainty of the user preferences by employing the variance of (a) (b) (c) (d) Figure 2: Accuracy-diversity curves on MovieLens at Top-10 obtained by varying the λ parameter from 0 to 0.95 (step 0.05). The statistical significance is measured based on the results from individual users, according the Wilcoxon signed-rank significance test. For nDCG and ILD 2(a), all the differences are statistically significant with (p < 0.01), except for those between RT and DivRT. For α-nDCG 2(b), the trend is the same, except for the differences between xQuAD and xQuAD-after-RT with λ > 0.7. the learned user latent factors. In [13] it is proposed a hybrid intent-aware diversification, namely the process of increas- method based on evolutionary search following the Strength ing the diversity taking into account the user interests. Some Pareto approach for finding appropriate weights for the con- approaches are based on adapted algorithms proposed for stituent algorithms with the final aim of improving accuracy, the same purpose in the Information Retrieval field, such as diversity and novelty balance. [24] considers the problem to IA-Select [4] and xQuAD [15]. An approach for extraction improve diversity while maintaining adequate accuracy as of sub-profiles reflecting the user interests has been proposed a binary optimization problem and proposes an approach in [20]. There a combination of sub-profile recommendations based on solving a trust region relaxation. The advantages is generated, with the aim of maximizing the number of user of this approach is that it seeks to find the best sub-set of tastes represented and simultaneously avoiding redundancy items over all possible sub-sets, while the greedy selections in the top-N recommendations. A more recent approach finds sub-optimal solutions. [19], based on a binomial greedy re-ranking algorithm, com- Multi-attribute diversity has been substantially non-treated bines global item genre distribution statistics and personal- in the literature of recommender systems. A recent work [6] ized user interests to satisfy coverage and non-redundancy proposes an adaptive approach able to customize the degree of genres in the final list. of recommendation diversity of the top-N list taking into The aggregate diversity, also known as sales diversity, is account the inclination to diversity of the user over differ- considered another important factor in recommendation for ent content-based item attributes. Specifically, entropy is both business and user perspective: the user may receive employed as a measure of diversity degree within user pref- less obvious and more personalized recommendations, com- erences and used in conjunction with user profile dimension ply with the target to help users discover new content [21] for calibrating the degree of diversification. and the business may increase the sales [8]. [3] proposes the Furthermore, increasing attention has been paid to the concept of aggregated diversity as the ability of a system to (a) (b) (c) (d) Figure 3: Accuracy-diversity curves on LibraryThing at Top-10 obtained by varying the λ parameter from 0 to 0.95 (step 0.05). The statistical significance is measured based on the results from individual users, according the Wilcoxon signed-rank significance test. For nDCG, the differences between RT and DivRT are non significant with λ ∈ [0.2, 0.5]. For ILD 3(a), all the differences are statistically significant with (p<0.001), except for those between RT and DivRT. For α-nDCG 3(b), all the differences are statistically significant (p<0.001). recommend across all users as many different items as pos- intent-aware diversification algorithm, and leverages regres- sible and proposes efficient and parametrizable re-ranking sion trees as user modeling technique. In their rule-based techniques for improving aggregate diversity with controlled equivalent representation, they are exploited to foster the accuracy loss. Those techniques are simply based on sta- diversification of recommendation results both in terms of tistical informations such us items average ratings, average individual diversity and in terms of aggregate one. predicted rating values, and so on. [21] explores the impact The experimental evaluation on two datasets in the movie on aggregate diversity and novelty inverting the recommen- and book domains demonstrates that considering the rules dation task, namely ranking users for items. Specifically, two generated from the different attributes available in an item approaches have been proposed: one based on an inverted description provides diversified and personalized recommen- neighborhood formation and the other on a probabilistic for- dations, with a small loss of accuracy. The analysis of the re- mulation for recommending users to items. [14] proposed a sults suggests that a pure rule-based diversification is a good k-furthest neighbors collaborative filtering algorithm to mit- choice when the aggregate diversity is more needed than in- igate the popularity bias and increase diversity, considering dividual diversity. Conversely, basic xQuAD remains the also other factors in user-centric evaluation, such as novelty, best choice to improve the individual diversity while its com- serendipity, obviousness and usefulness. bination with the rule-based diversification improves also the aggregate diversity. For future work, we would like to evaluate the impact of 7. CONCLUSIONS AND FUTURE WORK our approach also on the recommendation novelty. A way This paper addresses the problem of intent-aware diversi- to improve the novelty could be the expansion of the rules fication in recommender systems in multi-attribute settings. by exploiting collaborative information. The proposed approach bases on xQuAD [20], a relevant Acknowledgements. The authors acknowledge partial sup- pages 343–348, Singapore, 1992. World Scientific. port of PON02 00563 3470993 VINCENTE, PON04a2 E RES [13] M. T. Ribeiro, A. Lacerda, A. Veloso, and N. Ziviani. NOVAE, PON02 00563 3446857 KHIRA e PON01 03113 ER- Pareto-efficient hybridization for multi-objective MES. recommender systems. In RecSys ’12, pages 19–26. ACM, 2012. 8. REFERENCES [14] A. Said, B. Fields, B. J. Jain, and S. Albayrak. [1] P. Adamopoulos and A. Tuzhilin. On unexpectedness User-centric evaluation of a k-furthest neighbor in recommender systems: Or how to expect the collaborative filtering recommender algorithm. In unexpected. In in Proc of RecSys ’11 Intl. Workshop Proceedings of the 2013 Conference on Computer on Novelty and Diversity in Recommender Systems, Supported Cooperative Work, CSCW ’13, pages 2011. 1399–1408. ACM, 2013. [2] P. Adamopoulos and A. Tuzhilin. On [15] R. L.T. Santos, C. Macdonald, and I. Ounis. over-specialization and concentration bias of Exploiting query reformulations for web search result recommendations: Probabilistic neighborhood diversification. In WWW ’10, pages 881–890. ACM, selection in collaborative filtering systems. In 2010. Proceedings of the 8th ACM Conference on [16] Y. Shi, X. Zhao, J. Wang, M. Larson, and A. Hanjalic. Recommender Systems, RecSys ’14, pages 153–160. Adaptive diversification of recommendation results via ACM, 2014. latent factor portfolio. In ACM SIGIR ’12, pages [3] G. Adomavicius and Y. Kwon. Improving aggregate 175–184, 2012. recommendation diversity using ranking-based [17] B. Smyth and P. McClave. Similarity vs. diversity. In techniques. IEEE Trans. Knowl. Data Eng., Proceedings of the 4th International Conference on 24(5):896–911, 2012. Case-Based Reasoning: Case-Based Reasoning [4] P. Castells, S. Vargas, and J. Wang. Novelty and Research and Development, ICCBR ’01, pages Diversity Metrics for Recommender Systems: Choice, 347–361. Springer-Verlag, 2001. Discovery and Relevance. In International Workshop [18] S. Vargas, L. Baltrunas, A. Karatzoglou, and on Diversity in Document Retrieval (DDR 2011) at P. Castells. Coverage, redundancy and size-awareness the 33rd European Conference on Information in genre diversity for recommender systems. In RecSys Retrieval (ECIR 2011), April 2011. ’14, pages 209–216, 2014. [5] C. L.A. Clarke, M. Kolla, G. V. Cormack, [19] S. Vargas, L. Baltrunas, A. Karatzoglou, and O. Vechtomova, A. Ashkan, S. Büttcher, and P. Castells. Coverage, redundancy and size-awareness I. MacKinnon. Novelty and diversity in information in genre diversity for recommender systems. In RecSys retrieval evaluation. In Proceedings of the 31st Annual ’14, pages 209–216. ACM, 2014. International ACM SIGIR Conference on Research [20] S. Vargas and P. Castells. Exploiting the diversity of and Development in Information Retrieval, SIGIR ’08, user preferences for recommendation. In OAIR ’13, pages 659–666. ACM, 2008. pages 129–136, 2013. [6] T. Di Noia, V. C. Ostuni, J. Rosati, P. Tomeo, and [21] S. Vargas and P. Castells. Improving sales diversity by E. Di Sciascio. An analysis of users’ propensity toward recommending users to items. In Eighth ACM diversity in recommendations. In ACM RecSys ’14, Conference on Recommender Systems, RecSys ’14, RecSys ’14, pages 285–288. ACM, 2014. Foster City, Silicon Valley, CA, USA - October 06 - [7] M. D. Ekstrand, F. M. Harper, M. C. Willemsen, and 10, 2014, pages 145–152, 2014. J. A. Konstan. User perception of differences in [22] Y. Wang and I. H. Witten. Induction of model trees recommender algorithms. In Proceedings of the 8th for predicting continuous classes. In Poster papers of ACM Conference on Recommender Systems, RecSys the 9th European Conference on Machine Learning. ’14, pages 161–168. ACM, 2014. Springer, 1997. [8] D. Fleder and K. Hosanagar. Blockbuster culture’s [23] C. Yu, L. Lakshmanan, and S. Amer-Yahia. It takes next rise or fall: The impact of recommender systems variety to make a world: Diversification in on sales diversity. Management science, 55(5):697–712, recommender systems. In EDBT ’09, pages 368–378, 2009. 2009. [9] N. Hurley and M. Zhang. Novelty and diversity in [24] M. Zhang and N. Hurley. Avoiding monotony: top-n recommendation – analysis and evaluation. Improving the diversity of recommendation lists. In ACM TOIT, 10(4):14:1–14:30, 2011. ACM RecSys ’08, pages 123–130, 2008. [10] S. M. McNee, J. Riedl, and J. A. Konstan. Being [25] T. Zhou, Z. Kuscsik, J.G. Liu, M. Medo, J.R. accurate is not enough: How accuracy metrics have Wakeling, and Y.C. Zhang. Solving the apparent hurt recommender systems. In CHI ’06 Extended diversity-accuracy dilemma of recommender systems. Abstracts on Human Factors in Computing Systems, Proceedings of the National Academy of Sciences, CHI EA ’06, pages 1097–1101, 2006. 107:4511–4515, 2010. [11] V. C. Ostuni, T. Di Noia, E. Di Sciascio, and [26] C. Ziegler, S. M. McNee, J. A. Konstan, and R. Mirizzi. Top-n recommendations from implicit G. Lausen. Improving recommendation lists through feedback leveraging linked open data. In ACM RecSys topic diversification. In WWW ’05, pages 22–32, 2005. ’13, pages 85–92, 2013. [12] R. J. Quinlan. Learning with continuous classes. In 5th Australian Joint Conference on Artificial Intelligence,