=Paper= {{Paper |id=None |storemode=property |title=An Improved Data Aggregation Strategy for Group Recommendations |pdfUrl=https://ceur-ws.org/Vol-1050/paper6.pdf |volume=Vol-1050 |dblpUrl=https://dblp.org/rec/conf/recsys/PessemierDM13 }} ==An Improved Data Aggregation Strategy for Group Recommendations== https://ceur-ws.org/Vol-1050/paper6.pdf
                        An Improved Data Aggregation Strategy
                             for Group Recommendations

                Toon De Pessemier                                Simon Dooms                               Luc Martens
              iMinds - Ghent University                    iMinds - Ghent University                 iMinds - Ghent University
              G. Crommenlaan 8 box 201                     G. Crommenlaan 8 box 201                  G. Crommenlaan 8 box 201
               B-9050 Ghent, Belgium                         B-9050 Ghent, Belgium                     B-9050 Ghent, Belgium
             Toon.DePessemier@UGent.be                      Simon.Dooms@UGent.be                      Luc1.Martens@UGent.be


ABSTRACT                                                                         providing suggestions thereby considering the tastes of all
Although most recommender systems make suggestions for                           group members. In the literature, group recommendations
individual users, in many circumstances the selected items                       have mostly been generated by one of the following two data
(e.g., movies) are not intended for personal usage but rather                    aggregation strategies [2].
for consumption in group. Group recommendations can                                 The first aggregation strategy (aggregating recommenda-
assist a group of users in finding and selecting interesting                     tions) generates recommendations for each individual user
items thereby considering the tastes of all group members.                       using a general recommendation algorithm. Subsequently,
Traditionally, group recommendations are generated either                        the recommendation lists of all group members are aggre-
by aggregating the group members’ recommendations into a                         gated into a group recommendation list, which (hopefully)
list of group recommendations or by aggregating the group                        satisfies all group members. Different approaches to aggre-
members’ preferences (as expressed by ratings) into a group                      gate the recommendation lists have been proposed during
model, which is then used to calculate group recommenda-                         the last decade, such as least misery and plurality voting [7].
tions. This paper presents a new data aggregation strategy                       Most of them make a decision based on the algorithm’s pre-
for generating group recommendations by combining the two                        diction score, i.e., a prediction of the user’s rating score for
existing aggregation strategies. The proposed aggregation                        the recommended item. One commonly used way to per-
strategy outperforms each individual strategy for different                      form the aggregation is averaging the prediction scores of
sizes of the group and in combination with various recom-                        each member’s recommendation list. The higher the aver-
mendation algorithms.                                                            age prediction score is, the better the match between the
                                                                                 group’s preferences and the recommended item.
                                                                                    The second grouping strategy (aggregating preferences)
Categories and Subject Descriptors                                               combines the users’ preferences into group preferences. This
H.3.3 [Information Search and Retrieval]: Information                            way, the opinions and preferences of individual group mem-
Filtering; H.5.3 [Information Interfaces and Presenta-                           bers constitute a group preference model reflecting the in-
tion]: Group and Organization Interfaces                                         terests of all members. Again, the members’ preferences
                                                                                 can be aggregated in different ways, e.g., by calculating the
General Terms                                                                    rating of the group as the average of the group members’
                                                                                 ratings [7, 1]. After aggregating the members’ preferences,
Algorithms, Experimentation
                                                                                 the group’s preference model is treated as a pseudo user in
                                                                                 order to produce recommendations for the group using a
Keywords                                                                         traditional recommendation algorithm.
group recommendations, aggregation strategy, combining tech-                        This paper presents a new data aggregation strategy, which
niques                                                                           combines the two existing strategies and outperforms each of
                                                                                 them in terms of accuracy. For both individual data aggre-
1.    INTRODUCTION                                                               gation strategies, we used the average function to combine
                                                                                 the individual preferences or recommendations. Although a
   Although the majority of the currently deployed recom-
                                                                                 switching scheme between both aggregation strategies has
mender systems are designed to generate personal sugges-
                                                                                 already been investigated [2], the proposed combined strat-
tions for individual users, in many cases content is selected
                                                                                 egy is the first to generate group recommendations by using
and consumed by groups of users rather than by individu-
                                                                                 both aggregation strategies at once, thereby making a more
als. This strengthens the need for group recommendations,
                                                                                 informed decision.

                                                                                 2.   EVALUATING GROUP RECOMMENDA-
                                                                                      TIONS
RecSys’13, October 12–16, 2013, Hong Kong, China.                                  A major issue in the domain of group recommender sys-
Paper presented at the 2013 Decisions@RecSys workshop in conjunc-                tems is the evaluation of the accuracy, i.e., comparing the
tion with the 7th ACM conference on Recommender Systems. Copyright               generated recommendations for a group with the true pref-
 c 2013 for the individual papers by the papers’ authors. Copying permit-
ted for private and academic purposes. This volume is published and copy-        erences of the group. Performing online evaluations or inter-
righted by its editors.                                                          viewing groups can be partial solutions but are not feasible




                                                                            36
on a large scale or to extensively test alternative configura-           tering (CF) is based on the work of Breese et al [3]. This
tions. For example, in Section 5, five recommendation algo-              nearest neighbor CF uses the Pearson correlation metric for
rithms in combination with two data aggregation strategies               discovering similar users in the user-based approach (UBCF)
are evaluated for twelve different group sizes, thereby leading          or similar items in the item-based approach (IBCF) based
to 120 different setups of the experiment. Therefore, we are             on the rating behavior of the users. As Content-Based
forced to perform an offline evaluation, in which synthetic              recommender (CB) the InterestLMS predictor of the open
groups are sampled from the users of a traditional single-               source implementation of the Duine framework [9] is adopted
user data set. Since movies are often watched in group, we               (and extended to consider extra metadata attributes). Based
used the MovieLens (100K) data set for this evaluation.                  on the actors, directors, and genres of the content items and
   In the literature, group recommendations have been evalu-             the user’s ratings for these items, the recommender builds
ated several times by using a data set with simulated groups             a profile model for every user. This profile contains an es-
of users. Baltrunas et al. [1] used the MovieLens data set               timation of the user’s preference for each genre, actor, and
to simulate groups of different sizes (2, 3, 4, 8) and different         director that is assigned to a rated item, and is used to pre-
degrees of similarity (high, random) with the aim of eval-               dict the user’s preference for unseen media items by match-
uating the effectiveness of group recommendations. Chen                  ing the metadata of the items with the user’s profile. The
et al. [4] also used the MovieLens data set and simulated                used hybrid recommender (Hybrid) combines the recom-
groups by randomly selecting the members of the group to                 mendations with the highest prediction score of the IBCF
evaluate their proposed group recommendation algorithm.                  and the CB recommender into a new recommendation list.
They simulated group ratings by calculating a weighted av-               The result is an alternating list of the best recommendations
erage of the group members’ ratings based on the users’                  originating from these two algorithms. A user-centric evalu-
opinion importance parameter. Quijano-Sánchez et al. [8]                ation comparing different algorithms based on various char-
used synthetically generated data to simulate groups of peo-             acteristics showed that this straightforward combination of
ple in order to test the accuracy of group recommendations               CF and CB recommendations outperforms both individual
for movies. In addition to this offline evaluation, they con-            algorithms on almost every qualitative metric [6]. As recom-
ducted an experiment with real users to validate the results             mender based on matrix factorization, we opted for the open
obtained with the synthetic groups. One of the main conclu-              source implementation of the SVD recommender (SVD)
sions of their study was that it is possible to realize trustwor-        of the Apache Mahout project [10]. This recommender is
thy experiments with synthetic data, as the online user test             configured to use 19 features, which equals the number of
confirmed the results of the experiment with synthetic data.             genres in the MovieLens data set, and the number of itera-
This conclusion justifies the use of an offline evaluation with          tions is set at 50. To compare the results of the various rec-
synthetic groups to evaluate the group recommendations in                ommenders, the popular recommender was introduced
our experiment.                                                          as a baseline. This recommender generates for every user
   This offline evaluation is based on the traditional proce-            always the same list of most-popular items, which is based
dure of dividing the data set in two parts: the training set,            on the number of received ratings and the mean rating of
which is used as input for the algorithm to generate the rec-            each item.
ommendations, and the test set, which is used to evaluate
the recommendations. In this experiment, we ordered the                  4.   COMBINING STRATEGIES
ratings chronologically and assigned the oldest 60% to the
                                                                            Previous research [5] has shown that the used aggregation
training set and the most recent 40% to the test set, as this
                                                                         strategy in combination with the recommendation algorithm
reflects a realistic scenario the best.
                                                                         has a major influence on the accuracy of the group recom-
   The used evaluation procedure was adopted from Bal-
                                                                         mendations. Certain algorithms (such as CB and UBCF)
trunas et al. [1] and is performed as follows. Firstly, syn-
                                                                         produce more accurate group recommendations when the
thetic groups are composed by selecting random users from
                                                                         aggregating preferences strategy is used, whereas other al-
the data set. All users are assigned to one group of a pre-
                                                                         gorithms (such as IBCF and SVD) obtain a higher accu-
defined size. Secondly, group recommendations are gener-
                                                                         racy in combination with the aggregating recommendations
ated for each of these groups based on the ratings of the
                                                                         strategy. So, the choice of the aggregation strategy is cru-
members in the training set. Since group recommendations
                                                                         cial for each algorithm in order to obtain the best group
are intended to be consumed in group and to suit simul-
                                                                         recommendations. Instead of selecting one individual ag-
taneously the preferences of all members of the group, all
                                                                         gregation strategy, traditional aggregation strategies can be
members receive the same recommendation list. Thirdly,
                                                                         combined with the aim of obtaining group recommendations
since no group ratings are available, the recommendations
                                                                         which outperform the group recommendations of each indi-
are evaluated individually as in the classical single-user case,
                                                                         vidual aggregation strategy. In this context, Berkovsky and
by comparing (the rankings of) the recommendations with
                                                                         Freyne [2] witnessed that the aggregating recommendations
(the rankings of) the items in the test set of the user us-
                                                                         strategy yields a lower MAE (Mean Absolute Error) than
ing the Normalized Discounted Cumulative Gain (nDCG)
                                                                         the aggregating preferences strategy if the user profiles have
at rank 5. The nDCG is a standard information retrieval
                                                                         a low density (i.e., containing a low number of consump-
measure, used to evaluate the recommendation lists [1].
                                                                         tions). In contrast for high-density profiles, the aggregating
                                                                         preferences strategy resulted in the lowest MAE, thereby
3.   RECOMMENDATION ALGORITHMS                                           outperforming the aggregating recommendations strategy in
   The effectiveness of the different aggregation strategies             terms of accuracy. Therefore, Berkovsky and Freyne pro-
is measured for different sizes of the group and in combi-               posed a switching scheme based on the profile density, which
nation with various state-of-the art recommendation algo-                yielded a small accuracy improvement compared to the in-
rithms. The used implementation of Collaborative Fil-                    dividual strategies. However, their results were obtained in




                                                                    37
a very specific setting. They only considered the accuracy         gregating recommendations strategy.
of recommendations generated by a CF algorithm, the MAE               Subsequently, the two recommendation lists are combined
metric was used to estimate the accuracy, and they focused         into one recommendation list by combining the prediction
on the specific use case of recipe recommendations using a         scores of each aggregation strategy per item. In this ex-
rather small data set (approximately 3300 ratings). Because        periment, we opted for the average as method to combine
of these specific settings, we were not able to obtain an ac-      the prediction scores. So in the resulting recommendation
curacy improvement by using such a switching scheme on             list, each item’s prediction score is the average of the item’s
the MovieLens data set.                                            prediction score generated by the aggregating preferences
   Therefore, we propose an advanced data aggregation strat-       strategy and the item’s prediction score produced by the
egy which combines both individual aggregation strategies          aggregating recommendations strategy. Alternative com-
thereby yielding an accuracy gain compared to each individ-        bining methods are also possible, e.g., a weighted average
ual aggregation strategy for different recommendation algo-        of the prediction scores with weights depending on the per-
rithms. This combination of strategies aggregates the pref-        formance of each individual aggregation strategy. Then, the
erences of the users as well as their recommendations with         items are ordered by their new prediction score in order to
the aim of merging the knowledge of the two aggregation            obtain the final list of group recommendations.
strategies into a final group recommendation list. The idea
is that if one of the aggregation strategies comes up with
a less suitable or undesirable group recommendation, the
                                                                   5.   RESULTS
other aggregation strategy can correct this mistake. This             Our combined aggregation strategy is compared to the in-
makes the group recommendations resulting from the com-            dividual aggregation strategies in Figure 1. Since users are
bination of strategies more robust than the group recom-           randomly combined into groups and the accuracy of group
mendations based on a single aggregation strategy.                 recommendations is depending on the composition of the
   The two aggregation strategies are combined as follows.         groups, the accuracy slightly varies for each partitioning of
First, group recommendations are calculated by using the se-       the users into groups. (Except for the partitioning of the
lected recommendation algorithm and the aggregating pref-          users into groups of 1 member, which is only possible in 1
erences strategy. The result is a list of all items, ordered       way.) Therefore, the process of composing groups by taking
according to their prediction score. In case of an individ-        a random selection of users is repeated 30 times and just
ual aggregation strategy, the top-N items on that list are         as much measurements of the accuracy are performed. So,
selected as suggestions for the group. After calculating the       the graph shows the mean accuracy of these measurements
group recommendations using the aggregating preferences            as an estimation of the quality of the group recommenda-
strategy, or in parallel with it, group recommendations are        tions (on the vertical axis), as well as the 95% confidence
generated using the chosen algorithm and the aggregating           interval of the mean value, in relation to the recommen-
recommendations strategy. Again, the result is an ordered          dation algorithm, aggregation strategy, and the group size.
list of items with their corresponding prediction score.           The group size is indicated on the horizonal axis. The ver-
   Both of these lists with group recommendation can still         tical axis crosses the horizontal axis at the quality level of
contain items that are less suitable for the group, even at        the most-popular recommender. The prefix “Combined” of
the top of the list. The next phase will try to eliminate these    the bar series stands for the proposed aggregation strategy
items by comparing the two resulting recommendation lists.         which combines the aggregating preferences and aggregat-
Items that are at the top of both lists are probably interest-     ing recommendations strategy. The prefix “Pref” and “Rec”
ing recommendations, whereas items at the bottom of both           indicate the accuracy of the two individual strategies, re-
lists are usually less suitable for the group. Less certainty      spectively the aggregating preferences and aggregating rec-
exists about the items that are at the top of the recom-           ommendations strategy. For each algorithm, only the most
mendation list that is generated by one of the aggregation         accurate individual strategy is shown: aggregating prefer-
strategies but that are in the middle or even at the bottom        ences for UBCF and CB, aggregating recommendations for
of the recommendation list produced by using the other ag-         SVD, IBCF, and Hybrid [5].
gregation strategy. Therefore, both recommendation lists              The non-overlapping confidence intervals indicate a sig-
are adapted by eliminating these uncertain items in order          nificant improvement of the combined aggregation strategy
to contain only items that appear at the top of both recom-        compared to the best individual aggregation strategy. Ta-
mendation lists, thereby reducing the risk of recommending         ble 1 shows the results of the statistical T-tests comparing
undesirable or less suitable items to the group. So, items         the mean accuracy of the recommendations generated by
that are ranked below a certain threshold position in (at          the best individual aggregation strategy and by the com-
least) one of the recommendation lists generated by the two        bined aggregation strategy for groups with size = 5. (Simi-
aggregation strategies, are removed from both lists. If only       lar results are obtained for other group sizes.) The null hy-
one aggregation strategy is used, identifying uncertain items      pothesis, H0 = the mean accuracy of the recommendations
based on the results of a complementary recommendation             generated by using the best individual aggregation strat-
list is not possible. In this experiment, we opted to exclude      egy is equal to the mean accuracy of the recommendations
these items from the recommendation lists, that are not in         generated by using the combined aggregation strategy. The
the top-5% of both recommendation lists (i.e., the top-84 of       small p-values (all smaller than 0.05) prove the significant
recommended items for the MovieLens data set). As a re-            accuracy improvement of our proposed aggregation strategy.
sult, the recommendation lists contains only items that are
identified as ‘the most suitable’ by both aggregation strate-      6.   CONCLUSIONS
gies, ordered according to the prediction scores calculated
                                                                     This paper presents a new strategy to aggregate the tastes
using either the aggregating preferences strategy or the ag-
                                                                   of multiple users in order to generate group recommenda-




                                                              38
                                  Comparison of the best individual aggregation strategy and the combined aggregation strategy
     0.905
                   RecSVD   CombinedSVD        RecHybrid       CombinedHybrid      RecIBCF    CombinedIBCF   PrefUBCF       CombinedUBCF   PrefCB   CombinedCB


             0.9



     0.895
Mean nDCG




            0.89



     0.885



            0.88



     0.875


                    1         2            3               4            5             6           7          8          9           10         15        20
            0.87
                                                                            Group size (number of users)


Figure 1: The accuracy of the group recommendations calculated using the best individual aggregation
strategy and the combined aggregation strategy


                                                                                                  filtering. In Proceedings of the Fourteenth conference
Table 1: Statistical T-test comparing the best in-                                                on Uncertainty in artificial intelligence, UAI’98, pages
dividual aggregation strategy and the combined ag-                                                43–52, San Francisco, CA, USA, 1998.
gregation strategy for groups with size=5
                                                                                              [4] Y.-L. Chen, L.-C. Cheng, and C.-N. Chuang. A group
              Algorithm t(58) p-value
                                                                                                  recommendation system with consideration of
                 SVD     -4.39  0.00
                                                                                                  interactions among group members. Expert Systems
               Hybrid    -2.53  0.01
                                                                                                  with Applications, 34(3):2082 – 2090, 2008.
                IBCF     -2.33  0.02
                                                                                              [5] T. De Pessemier, S. Dooms, and L. Martens. Design
               UBCF      -2.66  0.01
                                                                                                  and evaluation of a group recommender system. In
                  CB     -3.55  0.00
                                                                                                  Proceedings of the sixth ACM conference on
                                                                                                  Recommender systems, RecSys ’12, pages 225–228,
tions. Both existing data aggregation strategies are com-                                         New York, NY, USA, 2012. ACM.
bined to make a more informed decision hereby reducing                                        [6] S. Dooms, T. De Pessemier, and L. Martens. A
the risk of recommending undesirable or less suitable items                                       user-centric evaluation of recommender algorithms for
to the group. The results show that the combination of ag-                                        an event recommendation system. In Proceedings of
gregation strategies outperforms the individual aggregation                                       the workshop on User-Centric Evaluation of
strategies for various sizes of the group and in combination                                      Recommender Systems and Their Interfaces at ACM
with various recommendation algorithms. The proposed ag-                                          Conference on Recommender Systems (RECSYS),
gregation strategy can be used to increase the accuracy of                                        pages 67–73, 2011.
(commercial) group recommender systems.                                                       [7] J. Masthoff. Group modeling: Selecting a sequence of
                                                                                                  television items to suit a group of viewers. User
7.             REFERENCES                                                                         Modeling and User-Adapted Interaction, 14:37–85,
     [1] L. Baltrunas, T. Makcinskas, and F. Ricci. Group                                         2004.
         recommendations with rank aggregation and                                            [8] L. Quijano-Sanchez, J. A. Recio-Garcia, and
         collaborative filtering. In Proceedings of the fourth                                    B. Diaz-Agudo. Personality and social trust in group
         ACM conference on Recommender systems, RecSys                                            recommendations. In Proceedings of the 2010 22nd
         ’10, pages 119–126, New York, NY, USA, 2010. ACM.                                        IEEE International Conference on Tools with Artificial
     [2] S. Berkovsky and J. Freyne. Group-based recipe                                           Intelligence - Volume 02, ICTAI ’10, pages 121–126,
         recommendations: analysis of data aggregation                                            Washington, DC, USA, 2010. IEEE Computer Society.
         strategies. In Proceedings of the fourth ACM                                         [9] Telematica Instituut / Novay. Duine Framework,
         conference on Recommender systems, RecSys ’10,                                           2009. Available at http://duineframework.org/.
         pages 111–118, New York, NY, USA, 2010. ACM.                                        [10] The Apache Software Foundation. Apache Mahout,
     [3] J. S. Breese, D. Heckerman, and C. Kadie. Empirical                                      2012. Available at http://mahout.apache.org/.
         analysis of predictive algorithms for collaborative




                                                                                     39