=Paper= {{Paper |id=Vol-1448/paper2 |storemode=property |title=Exploiting Regression Trees as User Models for Intent-Aware Multi-attribute Diversity |pdfUrl=https://ceur-ws.org/Vol-1448/paper2.pdf |volume=Vol-1448 |dblpUrl=https://dblp.org/rec/conf/recsys/TomeoNGLSS15 }} ==Exploiting Regression Trees as User Models for Intent-Aware Multi-attribute Diversity== https://ceur-ws.org/Vol-1448/paper2.pdf
              Exploiting Regression Trees as User Models for
                   Intent-Aware Multi-attribute Diversity

                   Paolo Tomeo1 , Tommaso Di Noia1 , Marco de Gemmis2 , Pasquale Lops2 ,
                                 Giovanni Semeraro2 , Eugenio Di Sciascio1
                                1
                                     Polytechnic University of Bari – Via Orabona, 4 – 70125 Bari, Italy
                                 2
                                     University of Bari Aldo Moro – Via Orabona, 4 – 70125 Bari, Italy
                          1
                              {firstname.lastname}@poliba.it 2 {firstname.lastname}@uniba.it

ABSTRACT                                                                foster the user satisfaction as it increases the odds of finding
Diversity in a recommendation list has been recognized as               relevant recommendations [1].
one of the key factors to increase user’s satisfaction when                Here our focus is on both the individual (or intra-list) di-
interacting with a recommender system. Analogously to the               versity, namely the degree of dissimilarity among all items
modelling and exploitation of query intent in Information               in the list provided to a user, and the aggregate diversity
Retrieval adopted to improve diversity in search results, in            [3], namely the number and distribution of distinct items
this paper we focus on eliciting and using the profile of a             recommended across all users. The item-to-item dissimi-
user which is in turn exploited to represent her intents. The           larity can be evaluated by using content-based attributes
model is based on regression trees and is used to improve               (e.g. genre in movie and music domains, product category
personalized diversification of the recommendation list in a            in e-commerce) [18] or statistical information (e.g. number
multi-attribute setting. We tested the proposed approach                of co-ratings) [23]. Usually, approaches to the diversifica-
and showed its effectiveness in two different domains, i.e.             tion take into account only one single attribute while, in the
books and movies.                                                       approach we present here, multiple attributes are selected
                                                                        to describe the items. The rationale behind this choice is
                                                                        that we believe there are numerous and heterogeneous item
Categories and Subject Descriptors                                      dimensions conditioning user’s interests and choices. More-
H.3.3 [Information Systems]: Information Search and Re-                 over, depending on the user these dimensions may interact
trieval                                                                 with each other thus contributing to the creation of her in-
                                                                        tents. The question is how to tackle multiple attributes to
Keywords                                                                address the diversification problem.
                                                                           In this paper we use regression trees as user modeling tech-
Personalized diversity; Intent-aware diversification; Regres-           nique to infer the individual interests, useful to provide an
sion Trees                                                              intent-aware diversification. Compared to approaches where
                                                                        item attributes are treated independently one to each other,
1.    INTRODUCTION                                                      regression trees make possible to represent user tastes as a
   In the recent years, diversification has gained more and             combination of interrelated characteristics. For instance, a
more importance in the field of recommender systems. En-                user could have a preference for horror movies of the 80s
gines able to get excellent results in terms of accuracy of             irrespective of the director, or for horror movies of the 90s
results have been proved to be not effective when we con-               directed by a a specific director. In a regression tree, con-
sider other factors related to the quality of user experience           ditional probability lets to build such inference rules about
[10]. As a matter of fact, when interacting with a system               user’s preferences. We conducted experiments on the movie
exposing a recommendation service, the user perceives as                and on the book domains to empirically evaluate our ap-
good suggestions those showing also an appropriate degree               proach. The performance was measured in terms of accuracy
of diversity, novelty or serendipity, just to cite a few. The           and both individual and aggregate diversity.
attitude of populating the recommendation list with sim-                   The main contributions of this paper are:
ilar items could exacerbate the over-specialization problem                • a novel intent-aware diversification approach able to
that content-based recommender systems tend to suffer from                   combine multiple attributes. It bases on the use of
[9], even though it appears also in collaborative-filtering ap-              regression trees (and rules) to infer and encode the
proaches. Improving diversity is generally a good choice to                  model of users’ interests;
                                                                           • a novel method to combine different diversification ap-
                                                                             proaches;
                                                                           • an experimental evaluation which shows the perfor-
                                                                             mance of the proposed approaches with respect to both
                                                                             accuracy and diversity measures.
                                                                          The paper is organized as follows. Section 2 describes the
CBRecSys 2015, September 20, 2015, Vienna, Austria.                     greedy approach to diversification problem, the xQuAD al-
Copyright remains with the authors and/or original copyright holders.   gorithm and some evaluation metrics. We then continue in
Section 3 by showing how to face the multi-attribute diver-             where p(i|f ) represents the likelihood of item i being chosen
sification and how to leverage regression trees in the diversi-         given the feature f while p(f|u) represents the user interest
fication process with xQuAD to provide more personalized                in the feature.
recommendations. Section 4 describes the experimental con-                A number of measures have been proposed to evaluate the
figuration and the datasets used for the experiments while              diversity in a recommendation list. Smyth and McClave [17]
Section 5 presents and describes the experimental results,              proposed the ILD (Intra-List Diversity), that computes the
showing the competitive performance of the proposed ap-                 average distance between each couple of items in the list L:
proach. In Section 6 we review the related work at the best                                      1          X
of our knowledge. Conclusions close the paper.                                 ILD(L) =                          (1 − sim(i, j))   (3)
                                                                                           |L| (|L| − 1)
                                                                                                              i,j∈L,i6=j

2.      DIVERSITY IN RECOMMENDATIONS                                    The sim function is a configurable and application-dependent
                                                                        component which can use content-based item features or sta-
   The recommendation step can be followed by a re-ranking
                                                                        tistical information (e.g. number of co-ratings) to compute
phase finalized to improve other qualities besides accuracy
                                                                        the similarity between items. We used also the metric α-
[3]. Some of re-ranking approaches proposed so far are
                                                                        nDCG, that is the redundancy-aware variant of Normalized
based on greedy algorithms designed to handle the balance
                                                                        Discounted Cumulative Gain proposed in [5]. We adopt the
between accuracy and diversity in a recommendations list
                                                                        adapted version for recommendation proposed in [16]:
[26]. Their scheme of work is explained through Algorithm
1, where P = h1, ..., ni is the recommendation list for user u                                        |L| P                    cov(L,f,r−1)
                                                                                                1           f ∈F (Lr ) (1 − α)
                                                                                                      X
generated using the predicted ratings and the output is the              α-nDCG(L, u) =
                                                                                            α-iDCG r=1              log2 (1 + r)
re-ranked list S of recommendations, such that S ⊂ P and
whose length is N ≤ n.                                                                                                                   (4)
                                                                        where cov(L, f, r − 1) is the number of items ranked up to
     Data: The original recommendation list P, N ≤ n
     Result: The re-ranked recommendation list S                        position r − 1 containing the feature f . F (Lr ) represents
                                                                        the set of features of the r-th item. The α parameter is used
1 S = hi;
2 while | S | ≤ N do                                                    to balance the emphasis between relevance and diversity. α-
     i∗ = argmax fobj (i, S);                                           iDCG denotes the value of α-nDCG for the best “ideally”
3          i∈P\S
4    S = S ◦ i∗ ;
                                                                        diversified list. Considering that the computation of the
5    P = P \ {i∗ }                                                      ideal value is NP-complete [5], we adopt a greedy approach:
6 end                                                                   at each step we select solely the item with the highest value,
7 return S.                                                             regardless of the next steps.
                 Algorithm 1: The greedy strategy
                                                                        3.     INTENT-AWARE MULTI-ATTRIBUTE
   At each iteration, the algorithm selects the item maximiz-
ing the objective function fobj (line 3) – which in turn can                   DIVERSITY
be defined to deal with the trade-off between accuracy and                 In this section we show how we address the intent-aware
diversity – and then adds it to the re-ranked list (line 4).            diversity problem when dealing with multi-attribute item
   For our purpose, we focus on the intent-aware approach               descriptions. The presentation relies on content-based at-
xQuAD (eXplicit Query Aspect Diversification), with the                 tributes (e.g. genres, years, etc. in the movies domain),
aim to diversify the user intents. It was proposed for search           but the proposed approach can be used independently of
diversification in information retrieval by Santos et al. [15],         the attributes types. Therefore, one could also use statisti-
as a probabilistic framework to explicitly model an ambigu-             cal information as item attributes, e.g., popularity or rating
ous query as a set of sub-queries that will cover the poten-            variance. As explained in the previous section, we refer to
tial aspects of the initial query. Then it was adapted for              features as possible instances of a generic attribute. We
recommendation diversification by Vargas and Castells [20],             tried different reformulations of the div function in xQuAD
replacing query and relative aspects with user and items                (Equation 2) to deal with multi-attribute values. After an
categories, respectively. Hereafter we refer to generic item            empirical evaluation, we chose the best div ma (for multi-
features - such as categories - as features, considering the            attribute) in terms of accuracy-diversity balance:
features as possible instances of a generic attribute.
   More formally, xQuAD greedily selects diverse recommen-
                                                                                                       P
                                                                                                       f ∈dom(A) p(i|f )p(f |u)(1 − avgj∈S p(j|f ))
                                                                              ma
                                                                                                 X
dations maximizing the following objective function:                    div        (i, S, u) =                   P
                                                                                                 A∈A                f ∈dom(A) p(f |u)
             fobj (i, S, u) = λ r∗ (u, i) + (1 − λ)div(i, S, u)   (1)                                                                      (5)
         ∗
                                                                        where:
with r (u, i) being the score predicted by the baseline recom-
mender; the λ parameter allowing to manage the accuracy-                      • A is the set of attributes;
diversity balance, where higher values give more weight to                    • for each attribute A ∈ A and each feature in the at-
accuracy, while lower values give more weight to diversity.                     tribute domain f ∈ dom(A), p(i|f ) represents the im-
The last component in Equation 1 promotes the diversity,                        portance of f for the item i. It is computed as a binary
providing a measure of novelty with respect to the items                        function that returns 1 if the item contains f , 0 other-
already selected in S. As for the function div(i, S, u), the                    wise;
original formulation in [20] is:
                          X                Y                                  • p(f |u) represents the importance of the feature f for
     div orig (i, S, u) =   p(i|f )p(f |u)   (1 − p(s|f ))  (2)                 the user u and is computed as the relative frequency
                              f               s∈S                               of the feature f on the rated items from the user u.
Here after we will refer to xQuAD using Equation 5 as basic               • DivRT. p(j|m) is the average similarity between m
xQuAD.                                                                      and each rule covered by item j. More formally:

Besides dealing with multi-attribute descriptions, the idea                           p(j|m) = avgm0 ∈M (u,j) sim(m, m0 )        (8)
behind our approach is to infer and model the user profile                  The rationale behind this formulation is that some
by means of a regression tree, a predictive model where the                 rules may be similar with each other thus not bring-
user interest represents the target variable, which can take                ing any actual diversification if considered separately.
continuous values. Once a regression tree is produced for a                 The computation of sim(m, m0 ) takes into account the
user u, then it is converted into a set of rules RT (u). Each               overlapping between the rules m and m0 as follows:
rule maps the presence/absence of a categorical feature or a                                   P                           0
constraint on a numerical one to a value v in a continuous                                0       ci ∈body(m) overlap(m, m , ci )
                                                                                sim(m, m ) =
interval. This latter indicates the predicted interest of the                                   max(|body(m)|, |body(m0 )|)
user on the items satisfying the rule. In our implementation
we used the interval [1, 5] since the value of the target vari-             For instance, considering the attributes represented in
able has been calculated as the rating mean of the training                 Figure 1, we have for actor, genre and director:
instances classified by the inferred rule. Please note that the                                    1, ci ∈ body(m) ∧ ci ∈ body(m0 )
                                                                                                  
choice of a specific value interval for the target variable does                          0
                                                                            overlap(m, m , ci ) =
not affect the overall approach. Each rule m has then the                                          0, otherwise
form
                                                                            For the numerical attribute year we may adopt a dif-
                    body(m) 7→ interest = v
                                                                            ferent formulation for the function overlap(m, m0 , ci ).
with body(m) = {c1 , . . . , cn }. An example of a set of rules             Here we compute, if any, the overlap between the in-
produced for a user is shown in Figure 1.                                   terval in body(m) and the one in body(m0 ) normalized
                                                                            with respect to maximum interval’s length. As an ex-
   1.   {horror ∈ dom(genres), western ∈/ dom(genres),
        DarioArgento ∈ dom(directors)} 7→ interest = 4.2
                                                                            ample, if year > 1990 is in body(m) and year < 2010
                                                                            is in body(m0 ) we may define the overlapping function
                                                                                                               |1990−2010|
   2.   {horror ∈/ dom(genres), thriller ∈ dom(genres)}                     as overlap(m, m0 , ci ) = max(dom(year))−min(dom(year)) .
        7→ interest = 2.1

   3.   {year > 1990, horror ∈
                             / dom(genres),                                 The functions introduced above have been used in the
        drama ∈ dom(genres), Aronof sky ∈ dom(directors)}
        7→ interest = 4.0                                                   experimental setting in order to compute the function
                                                                            overlap(m, m0 , ci ) (see Section 4).
   4.   {year < 1990, drama ∈ dom(genres),
        AlP acino ∈ dom(actors)} 7→ interest = 3.9
                                                                     RT and DivRT can be used instead of the basic xQuAD as
   5.   {horror ∈
                / dom(genres)} 7→ interest = 3.2                     diversification algorithms in the re-ranking phase. Alterna-
                                                                     tively, basic xQuAD and RT or DivRT can be pipelined to
Figure 1: Example of a set of rules generated via                    benefit from the strengths of them both. For instance, one
the regression tree                                                  could use xQuAD to select 50 diversified recommendations
                                                                     and then RT to select 20 recommendations from those 50,
   Eventually, under the assumption that they represent spe-         or vice versa. Hereafter, we use the syntax X-after-Y, e.g.
cific user interests, the computed rules are used in the re-         xQuAD-after-RT, to indicate that algorithm X is executed
ranking phase as item features to improve the intent-aware           on the results of Y.
recommendation diversity.
   We propose also a div function for xQuAD so that each             4.    EXPERIMENTS
item is evaluated according to the rules it satisfies.
                                                                        We carried out a number of experiments to evaluate the
                           X                                         performance of the methods presented in the Section 3 on
 div rules (i, S, u) =               p(m|u)(1 − avgj∈S p(j|m)) (6)   two well known datasets: MovieLens1M and LibraryThing.
                         m∈M (u,i)                                      MovieLens 1M1 dataset contains 1 million ratings from
Here M (u, i) represents the set of rules for the user u matched     6,040 users on 3,952 movies. The original dataset contains
by the item i while p(m|u) represents the importance of the          information about genres and year of release, and was en-
rule m for u and is computed as:                                     riched with further attribute information such as actors and
                                                                     directors extracted from DBpedia2 . More details about this
                                                                     DBpedia enriched version of the dataset are available in [11].
                                      interestm
                         p(m|u) =                              (7)   Because not all movies have a corresponding resource in DB-
                                       |M (u, i)|                    pedia, the final dataset contains 998,963 ratings from 6,040
In Equation 7, interestm is the normalized predicted out-            users on 3,883 items. We built training and test sets by
come of the regression tree for the rule m. Finally, the last        employing a 60%-40% temporal split for each user.
component in Equation 6 indicates the complement of the                 Moreover, we used the LibraryThing3 dataset, which con-
coverage of the rule among the already selected recommen-            tains more than 2 million ratings from 7,279 users on 37,232
dations. We propose two different versions of this adapted           books. As in the dataset there are many duplicated ratings,
xQuAD.                                                               1
                                                                       Available at http://grouplens.org/datasets/movielens
                                                                     2
   • RT. p(j|m) is a binary function that returns 1 if the             http://dbpedia.org
                                                                     3
     item j matches the rule, 0 otherwise.                             Available at http://www.macle.nl/tud/LT
when a user has rated more than once the same item, we se-       4.1   Experimental Configuration
lected her last rating. The unique ratings are 749,401. Also        For both datasets, we used the Bayesian Personalized Rank-
in this case, we enriched the dataset by mapping the books       ing Matrix Factorization algorithm (BPRMF) available in
with BaseKB4 , the RDF version of Freebase5 and then ex-         MyMediaLite9 as baseline (using the default parameters).
tracting three attributes: genre, author and subjects. The       We performed experiments using other recommendation al-
subjects in Freebase represent the topic of the book, for in-    gorithms, but we do not report results here since they are
stance Pilot experiment, Education, Culture of Italy, Martin     very similar to those obtained by BPRMF.
Luther King and so on. The dump of the mapping is avail-            We selected the top-200 recommendations for each user to
able online6 . The final dataset contains 565,310 ratings from   generate the initial list P used for performing the re-ranking
7,278 users on 27,358 books. We built training and test sets     as shown in Algorithm 1.
by employing a 80%-20% hold-out split. The different ratio          Accuracy is measured in terms of Precision, Recall and
used for LibraryThing respect to Movielens (60%-40%) de-         nDCG, but we only report nDCG values since the trend of
pends on its higher sparsity: holding 80% to build the user      the other two metrics is very similar. Individual diversity
profile ensures a sufficient number of ratings to train the      is measured using ILD and α-nDCG (see Section 2) with
system.                                                          α = 0.5 to equally balance diversity and accuracy, while
                                                                 aggregate diversity is measured using both the catalog cov-
                           Movielens     LibraryThing            erage – computed as the percentage of items recommended
      Number of users        6,040            7,278              at least to one user – and the entropy – computed as in
      Number of items        3,883           27,358              [3] to analyse the distribution of recommendations among
      Number of ratings     998,963         565,310              all users. These two last metrics need to be considered to-
      Data sparsity          95.7%           99.7%               gether, since the coverage gives a indication about the ability
      Avg users per item     275.57           20.66              of a recommender to cover the items catalog and the entropy
      Avg items per user     165.39           77.68              shows the ability to equally spread out the recommendations
                                                                 across all the items. Hence, only an improvement of both
       Table 1: Statistics about the two datasets                those metrics indicates a real increasing of aggregate diver-
                                                                 sity, that in turn denotes a better personalization of the
   Since the number of distinct values was too large for year,   recommendations [3].
actors and director attributes in MovieLens and for all the         As similarity measure for computing the ILD metric (Equa-
attributes in LibraryThing, we convert years in the corre-       tion 3) we used the Jaccard index. Considering that there
sponding decades and performed a K-means clustering for          are more attributes for each item, we computed the average
other attributes on the basis of DBpedia categories7 for         of the Jaccard index value for each attribute shared between
MovieLens and Freebase categories8 for LibraryThing. Ta-         two items. α-nDCG is computed as the average of the Equa-
ble 2 and 3 report the number of attribute values and clus-      tion 4 for each attribute.
ters. The number of clusters was decided according to the           As presented in Section 3, we propose two novel diver-
calculation of the within-cluster sum of squares (withinss       sification approaches: RT and DivRT. We also propose a
measure from the R Stats Package, version 2.15.3), that is       method to combine in sequence different algorithms by means
picking the value of K corresponding to an evident break in      of a two phase re-ranking procedure, with the aim of ben-
the distribution of the withinss measure against the number      efiting from the strengths of both. Therefore we evalu-
of clusters extracted.                                           ated other two approaches: xQuAD-after-RT and RT-after-
                    Num. Values      Num. Clusters               xQuAD, applying the second re-ranking phase on the set
        Genres           19                -                     of 50 recommendations provided from the first phase. We
        Decades          10                -                     have also evaluated the combination with xQuAD and Di-
        Actors         14736              20                     vRT, but the results are very similar using RT, so they will
                                                                 not be shown. To evaluate the performances, we compare
        Directors      3194               20
                                                                 the top-10 recommendation list generating from all the ap-
                                                                 proaches with basic xQuAD, by varying the λ parameter
    Table 2: Statistics about MovieLens attributes
                                                                 from 0 to 0.95 with step fixed to 0.05 in Equation 1 (higher
                                                                 values of λ give more weight to accuracy, lower values to
                    Num. Values      Num. Clusters               diversity).
        Genres          270               30                        The rules are produced using M5Rules10 algorithm avail-
        Authors        12868              22                     able in Weka based on the M5 algorithm proposed by Quin-
        Subjects       2911               20                     lan [12] and improved by Wang and Witten [22]. M5Rules
                                                                 generates a list of rules for regression problems using a
    Table 3: Statistics about LibraryThing attributes            separate-and-conquer learning strategy. Iteratively it builds
                                                                 a model tree using M5 and converts the best leaf into a rule.
                                                                 We decided to use unpruned rules in order to have more
                                                                 rules matchable with the items.

4
  http://basekb.com
5
  https://www.freebase.com
6                                                                9
  URL removed to guarantee anonymous submission.                  http://mymedialite.net/
7                                                                10
  http://purl.org/dc/terms/subject                                http://weka.sourceforge.net/doc.dev/weka/
8
  http://www.w3.org/1999/02/22-rdf-syntax-ns#type                classifiers/rules/M5Rules.html
5.   RESULTS DISCUSSION                                          accurate recommendations, with λ > 0.65, but the differ-
   Results of the experiments on MovieLens and Library-          ences are not statistically significant. In terms of individual
Thing are reported in Figure 2 and 3, respectively.              diversity, all of them are able to overcome the baseline (ILD
   MovieLens. xQuAD obtains the best results in terms            = 0.4 and α-nDCG = 0.285) except when using the pure
of ILD (Figure 2(a)) and α-nDCG (Figure 2(b)), though            rule-based approaches in terms of ILD. However they are
the xQuAD-after-RT results are very close and, with higher       able to improve α-nDCG. For the latter two metrics, the
λ values (namely giving more importance to the accuracy          differences are always statistically significant (p < 0.001).
factor), the differences between them are not significant.       In terms of aggregate diversity, xQuAD does not improve
This outcome is due to the fact that the diversity metrics       the baseline result (coverage = 0.15 and α-nDCG = 0.77),
are attribute-based and xQuAD operates directly diversi-         while using the rules leads to better results. According to
fying the attributes values, while the proposed rule-based       a comprehensive analysis on LibraryThing, the pure rule-
approaches do not take into account all the attributes val-      based approaches may give more personalized recommenda-
ues. This also explains why the pure rule-based approaches       tions with a better diversity, especially using RT, with also
(RT and DivRT) obtain the worst diversity results, while         a small accuracy loss. Similarly to the analysis on Movie-
the combined algorithms (xQuAD-after-RT and RT-after-            Lens, the results on LibraryThing suggest that diversifying
xQuAD) obtain better results. It is noteworthy that these        with only the rules is a good choice when aggregate diver-
last two configurations have no substantial difference with      sity is more important than individual diversity, conversely
ILD, but, in terms of α-nDCG, xQuAD-after-RT consider-           xQuAD remains the best choice to improve the individual
ably overcomes RT-after-xQuAD. This demonstrates that            diversity and combined with the rule-based diversification
the pipeline of xQuAD and the rule-based approach ob-            improves also the aggregate diversity.
tains good diversity. Considering coverage (Figure 2(c))            The final conclusions of this analysis are that using a re-
and entropy (Figure 2(d)) to evaluate the aggregate diver-       gression tree to infer rules representing user interests on
sity, the results show that using the rules the recommen-        multi-attribute values in the diversification process with
dations are much more personalized. It is interesting to         xQuAD leads to more personalized recommendations but
note the compromise provided by xQuAD-after-RT, that             with a less diversified list and that combining attribute-
obtains equidistant results between xQuAD and the rule-          based and rule-based diversifications in two phase re-ranking
based algorithms, unlike RT-after-xQuAD that slightly over-      is a good way for taking the advantages of both. The bet-
comes xQuAD. With respect to the baseline, no configura-         ter degree of personalization may depend on the fact that
tion is able to give more accurate recommendations (nDCG         the rules are different among the users since they represents
= 0.14); all are able to increase the individual diversity       their individual interests. The lower individual diversity val-
(ILD = 0.34 and α-nDCG = 0.27). With nDCG and the                ues with ILD and α-nDCG are due to the nature of these
individual diversity, the differences are always statistically   metrics which are based directly on the attributes values
significant (p < 0.001), except using the pure ruled-based       while the pure rule-based approaches do not take into ac-
approaches with λ > 0.65. The situation is more complex          count all the attributes values.
in terms of aggregate diversity, since the coverage grows
very little on the baseline (coverage = 0.29) and the entropy    6.   RELATED WORK
slightly decreases (entropy = 0.78) with higher λ values. Ac-       There is a noteworthy effort by the research community in
cording to a comprehensive analysis on MovieLens, the pure       addressing the challenge of recommendation diversity. That
rule-based approaches may give personalized and diversified      interest arises from the necessity of avoiding monotony in
recommendations, also with small accuracy loss. However,         recommendations and controlling the balance between accu-
when individual diversity is more important than aggregate       racy and diversity, since increasing diversity inevitably puts
diversity, combining xQuAD with a previous rule-based re-        accuracy at risk [25]. However, a user study in the movie
ranking gives a good compromise between individual and           domain [7] demonstrates that user satisfaction is positively
aggregate diversity.                                             dependent on diversity and there may not be the intrinsic
   LibraryThing. At first glance, the LibraryThing results       trade-off when considering user perception instead of tradi-
appear similar to those on MovieLens. Although they are          tional accuracy metrics.
generally consistent, there are interesting differences. Also       Typically, the proposed approaches aim to replace items
in this case, xQuAD obtains the best diversity values, with      in an already computed recommendation list, by minimizing
ILD (Figure 3(a)) and α-nDCG (Figure 3(b)). However,             the similarity among all items. Some approaches exploit a
both the combined approaches obtain really interesting re-       re-ranking phase with a greedy selection (see Section 2), for
sults, very close to xQuAD, except for the lower λ val-          instance [18], or with other techniques such us the Swap al-
ues (namely giving more importance to the diversification        gorithm [23], which starts with a list of K scoring items and
factor). Unlike what happens on MovieLens, in this case          swaps the item which contributes the least to the diversity
RT-after-xQuAD obtains good results also in terms of α-          of the entire set with the next highest scoring item among
nDCG. The pure rule-based approaches still obtain worse          the remaining items, by controlling the drop of the overall
results. Considering coverage (Figure 3(c)) and entropy          relevance by a pre-defined upper bound.
(Figure 3(d)) to evaluate the aggregate diversity, the results      Other types of approaches try to directly generate diver-
show that using the rules the recommendations are much           sified recommendation lists. For instance, [2] proposes a
more personalized than using only xQuAD. The combined            probabilistic neighborhood selection in collaborative filter-
approaches are able to improve the aggregate diversity with      ing for selecting diverse neighbors, while in [16], an adaptive
respect to xQuAD, albeit they are still distant from the pure    diversification approach is based on Latent Factor Portfolio
rule-based approaches, especially in terms of coverage. With     model for capturing the user interests range and the uncer-
respect to the baseline, all configurations give a little more   tainty of the user preferences by employing the variance of
                                    (a)                                                  (b)




                                    (c)                                                  (d)




Figure 2: Accuracy-diversity curves on MovieLens at Top-10 obtained by varying the λ parameter from 0 to
0.95 (step 0.05). The statistical significance is measured based on the results from individual users, according
the Wilcoxon signed-rank significance test. For nDCG and ILD 2(a), all the differences are statistically
significant with (p < 0.01), except for those between RT and DivRT. For α-nDCG 2(b), the trend is the
same, except for the differences between xQuAD and xQuAD-after-RT with λ > 0.7.


the learned user latent factors. In [13] it is proposed a hybrid   intent-aware diversification, namely the process of increas-
method based on evolutionary search following the Strength         ing the diversity taking into account the user interests. Some
Pareto approach for finding appropriate weights for the con-       approaches are based on adapted algorithms proposed for
stituent algorithms with the final aim of improving accuracy,      the same purpose in the Information Retrieval field, such as
diversity and novelty balance. [24] considers the problem to       IA-Select [4] and xQuAD [15]. An approach for extraction
improve diversity while maintaining adequate accuracy as           of sub-profiles reflecting the user interests has been proposed
a binary optimization problem and proposes an approach             in [20]. There a combination of sub-profile recommendations
based on solving a trust region relaxation. The advantages         is generated, with the aim of maximizing the number of user
of this approach is that it seeks to find the best sub-set of      tastes represented and simultaneously avoiding redundancy
items over all possible sub-sets, while the greedy selections      in the top-N recommendations. A more recent approach
finds sub-optimal solutions.                                       [19], based on a binomial greedy re-ranking algorithm, com-
   Multi-attribute diversity has been substantially non-treated    bines global item genre distribution statistics and personal-
in the literature of recommender systems. A recent work [6]        ized user interests to satisfy coverage and non-redundancy
proposes an adaptive approach able to customize the degree         of genres in the final list.
of recommendation diversity of the top-N list taking into             The aggregate diversity, also known as sales diversity, is
account the inclination to diversity of the user over differ-      considered another important factor in recommendation for
ent content-based item attributes. Specifically, entropy is        both business and user perspective: the user may receive
employed as a measure of diversity degree within user pref-        less obvious and more personalized recommendations, com-
erences and used in conjunction with user profile dimension        ply with the target to help users discover new content [21]
for calibrating the degree of diversification.                     and the business may increase the sales [8]. [3] proposes the
   Furthermore, increasing attention has been paid to the          concept of aggregated diversity as the ability of a system to
                                    (a)                                                 (b)




                                    (c)                                                 (d)




Figure 3: Accuracy-diversity curves on LibraryThing at Top-10 obtained by varying the λ parameter from
0 to 0.95 (step 0.05). The statistical significance is measured based on the results from individual users,
according the Wilcoxon signed-rank significance test. For nDCG, the differences between RT and DivRT are
non significant with λ ∈ [0.2, 0.5]. For ILD 3(a), all the differences are statistically significant with (p<0.001),
except for those between RT and DivRT. For α-nDCG 3(b), all the differences are statistically significant
(p<0.001).


recommend across all users as many different items as pos-        intent-aware diversification algorithm, and leverages regres-
sible and proposes efficient and parametrizable re-ranking        sion trees as user modeling technique. In their rule-based
techniques for improving aggregate diversity with controlled      equivalent representation, they are exploited to foster the
accuracy loss. Those techniques are simply based on sta-          diversification of recommendation results both in terms of
tistical informations such us items average ratings, average      individual diversity and in terms of aggregate one.
predicted rating values, and so on. [21] explores the impact         The experimental evaluation on two datasets in the movie
on aggregate diversity and novelty inverting the recommen-        and book domains demonstrates that considering the rules
dation task, namely ranking users for items. Specifically, two    generated from the different attributes available in an item
approaches have been proposed: one based on an inverted           description provides diversified and personalized recommen-
neighborhood formation and the other on a probabilistic for-      dations, with a small loss of accuracy. The analysis of the re-
mulation for recommending users to items. [14] proposed a         sults suggests that a pure rule-based diversification is a good
k-furthest neighbors collaborative filtering algorithm to mit-    choice when the aggregate diversity is more needed than in-
igate the popularity bias and increase diversity, considering     dividual diversity. Conversely, basic xQuAD remains the
also other factors in user-centric evaluation, such as novelty,   best choice to improve the individual diversity while its com-
serendipity, obviousness and usefulness.                          bination with the rule-based diversification improves also the
                                                                  aggregate diversity.
                                                                     For future work, we would like to evaluate the impact of
7.   CONCLUSIONS AND FUTURE WORK                                  our approach also on the recommendation novelty. A way
   This paper addresses the problem of intent-aware diversi-      to improve the novelty could be the expansion of the rules
fication in recommender systems in multi-attribute settings.      by exploiting collaborative information.
The proposed approach bases on xQuAD [20], a relevant
Acknowledgements. The authors acknowledge partial sup-              pages 343–348, Singapore, 1992. World Scientific.
port of PON02 00563 3470993 VINCENTE, PON04a2 E RES            [13] M. T. Ribeiro, A. Lacerda, A. Veloso, and N. Ziviani.
NOVAE, PON02 00563 3446857 KHIRA e PON01 03113 ER-                  Pareto-efficient hybridization for multi-objective
MES.                                                                recommender systems. In RecSys ’12, pages 19–26.
                                                                    ACM, 2012.
8.   REFERENCES                                                [14] A. Said, B. Fields, B. J. Jain, and S. Albayrak.
 [1] P. Adamopoulos and A. Tuzhilin. On unexpectedness              User-centric evaluation of a k-furthest neighbor
     in recommender systems: Or how to expect the                   collaborative filtering recommender algorithm. In
     unexpected. In in Proc of RecSys ’11 Intl. Workshop            Proceedings of the 2013 Conference on Computer
     on Novelty and Diversity in Recommender Systems,               Supported Cooperative Work, CSCW ’13, pages
     2011.                                                          1399–1408. ACM, 2013.
 [2] P. Adamopoulos and A. Tuzhilin. On                        [15] R. L.T. Santos, C. Macdonald, and I. Ounis.
     over-specialization and concentration bias of                  Exploiting query reformulations for web search result
     recommendations: Probabilistic neighborhood                    diversification. In WWW ’10, pages 881–890. ACM,
     selection in collaborative filtering systems. In               2010.
     Proceedings of the 8th ACM Conference on                  [16] Y. Shi, X. Zhao, J. Wang, M. Larson, and A. Hanjalic.
     Recommender Systems, RecSys ’14, pages 153–160.                Adaptive diversification of recommendation results via
     ACM, 2014.                                                     latent factor portfolio. In ACM SIGIR ’12, pages
 [3] G. Adomavicius and Y. Kwon. Improving aggregate                175–184, 2012.
     recommendation diversity using ranking-based              [17] B. Smyth and P. McClave. Similarity vs. diversity. In
     techniques. IEEE Trans. Knowl. Data Eng.,                      Proceedings of the 4th International Conference on
     24(5):896–911, 2012.                                           Case-Based Reasoning: Case-Based Reasoning
 [4] P. Castells, S. Vargas, and J. Wang. Novelty and               Research and Development, ICCBR ’01, pages
     Diversity Metrics for Recommender Systems: Choice,             347–361. Springer-Verlag, 2001.
     Discovery and Relevance. In International Workshop        [18] S. Vargas, L. Baltrunas, A. Karatzoglou, and
     on Diversity in Document Retrieval (DDR 2011) at               P. Castells. Coverage, redundancy and size-awareness
     the 33rd European Conference on Information                    in genre diversity for recommender systems. In RecSys
     Retrieval (ECIR 2011), April 2011.                             ’14, pages 209–216, 2014.
 [5] C. L.A. Clarke, M. Kolla, G. V. Cormack,                  [19] S. Vargas, L. Baltrunas, A. Karatzoglou, and
     O. Vechtomova, A. Ashkan, S. Büttcher, and                    P. Castells. Coverage, redundancy and size-awareness
     I. MacKinnon. Novelty and diversity in information             in genre diversity for recommender systems. In RecSys
     retrieval evaluation. In Proceedings of the 31st Annual        ’14, pages 209–216. ACM, 2014.
     International ACM SIGIR Conference on Research            [20] S. Vargas and P. Castells. Exploiting the diversity of
     and Development in Information Retrieval, SIGIR ’08,           user preferences for recommendation. In OAIR ’13,
     pages 659–666. ACM, 2008.                                      pages 129–136, 2013.
 [6] T. Di Noia, V. C. Ostuni, J. Rosati, P. Tomeo, and        [21] S. Vargas and P. Castells. Improving sales diversity by
     E. Di Sciascio. An analysis of users’ propensity toward        recommending users to items. In Eighth ACM
     diversity in recommendations. In ACM RecSys ’14,               Conference on Recommender Systems, RecSys ’14,
     RecSys ’14, pages 285–288. ACM, 2014.                          Foster City, Silicon Valley, CA, USA - October 06 -
 [7] M. D. Ekstrand, F. M. Harper, M. C. Willemsen, and             10, 2014, pages 145–152, 2014.
     J. A. Konstan. User perception of differences in          [22] Y. Wang and I. H. Witten. Induction of model trees
     recommender algorithms. In Proceedings of the 8th              for predicting continuous classes. In Poster papers of
     ACM Conference on Recommender Systems, RecSys                  the 9th European Conference on Machine Learning.
     ’14, pages 161–168. ACM, 2014.                                 Springer, 1997.
 [8] D. Fleder and K. Hosanagar. Blockbuster culture’s         [23] C. Yu, L. Lakshmanan, and S. Amer-Yahia. It takes
     next rise or fall: The impact of recommender systems           variety to make a world: Diversification in
     on sales diversity. Management science, 55(5):697–712,         recommender systems. In EDBT ’09, pages 368–378,
     2009.                                                          2009.
 [9] N. Hurley and M. Zhang. Novelty and diversity in          [24] M. Zhang and N. Hurley. Avoiding monotony:
     top-n recommendation – analysis and evaluation.                Improving the diversity of recommendation lists. In
     ACM TOIT, 10(4):14:1–14:30, 2011.                              ACM RecSys ’08, pages 123–130, 2008.
[10] S. M. McNee, J. Riedl, and J. A. Konstan. Being           [25] T. Zhou, Z. Kuscsik, J.G. Liu, M. Medo, J.R.
     accurate is not enough: How accuracy metrics have              Wakeling, and Y.C. Zhang. Solving the apparent
     hurt recommender systems. In CHI ’06 Extended                  diversity-accuracy dilemma of recommender systems.
     Abstracts on Human Factors in Computing Systems,               Proceedings of the National Academy of Sciences,
     CHI EA ’06, pages 1097–1101, 2006.                             107:4511–4515, 2010.
[11] V. C. Ostuni, T. Di Noia, E. Di Sciascio, and             [26] C. Ziegler, S. M. McNee, J. A. Konstan, and
     R. Mirizzi. Top-n recommendations from implicit                G. Lausen. Improving recommendation lists through
     feedback leveraging linked open data. In ACM RecSys            topic diversification. In WWW ’05, pages 22–32, 2005.
     ’13, pages 85–92, 2013.
[12] R. J. Quinlan. Learning with continuous classes. In 5th
     Australian Joint Conference on Artificial Intelligence,