=Paper=
{{Paper
|id=None
|storemode=property
|title=An Improved Data Aggregation Strategy for Group Recommendations
|pdfUrl=https://ceur-ws.org/Vol-1050/paper6.pdf
|volume=Vol-1050
|dblpUrl=https://dblp.org/rec/conf/recsys/PessemierDM13
}}
==An Improved Data Aggregation Strategy for Group Recommendations==
An Improved Data Aggregation Strategy for Group Recommendations Toon De Pessemier Simon Dooms Luc Martens iMinds - Ghent University iMinds - Ghent University iMinds - Ghent University G. Crommenlaan 8 box 201 G. Crommenlaan 8 box 201 G. Crommenlaan 8 box 201 B-9050 Ghent, Belgium B-9050 Ghent, Belgium B-9050 Ghent, Belgium Toon.DePessemier@UGent.be Simon.Dooms@UGent.be Luc1.Martens@UGent.be ABSTRACT providing suggestions thereby considering the tastes of all Although most recommender systems make suggestions for group members. In the literature, group recommendations individual users, in many circumstances the selected items have mostly been generated by one of the following two data (e.g., movies) are not intended for personal usage but rather aggregation strategies [2]. for consumption in group. Group recommendations can The first aggregation strategy (aggregating recommenda- assist a group of users in finding and selecting interesting tions) generates recommendations for each individual user items thereby considering the tastes of all group members. using a general recommendation algorithm. Subsequently, Traditionally, group recommendations are generated either the recommendation lists of all group members are aggre- by aggregating the group members’ recommendations into a gated into a group recommendation list, which (hopefully) list of group recommendations or by aggregating the group satisfies all group members. Different approaches to aggre- members’ preferences (as expressed by ratings) into a group gate the recommendation lists have been proposed during model, which is then used to calculate group recommenda- the last decade, such as least misery and plurality voting [7]. tions. This paper presents a new data aggregation strategy Most of them make a decision based on the algorithm’s pre- for generating group recommendations by combining the two diction score, i.e., a prediction of the user’s rating score for existing aggregation strategies. The proposed aggregation the recommended item. One commonly used way to per- strategy outperforms each individual strategy for different form the aggregation is averaging the prediction scores of sizes of the group and in combination with various recom- each member’s recommendation list. The higher the aver- mendation algorithms. age prediction score is, the better the match between the group’s preferences and the recommended item. The second grouping strategy (aggregating preferences) Categories and Subject Descriptors combines the users’ preferences into group preferences. This H.3.3 [Information Search and Retrieval]: Information way, the opinions and preferences of individual group mem- Filtering; H.5.3 [Information Interfaces and Presenta- bers constitute a group preference model reflecting the in- tion]: Group and Organization Interfaces terests of all members. Again, the members’ preferences can be aggregated in different ways, e.g., by calculating the General Terms rating of the group as the average of the group members’ ratings [7, 1]. After aggregating the members’ preferences, Algorithms, Experimentation the group’s preference model is treated as a pseudo user in order to produce recommendations for the group using a Keywords traditional recommendation algorithm. group recommendations, aggregation strategy, combining tech- This paper presents a new data aggregation strategy, which niques combines the two existing strategies and outperforms each of them in terms of accuracy. For both individual data aggre- 1. INTRODUCTION gation strategies, we used the average function to combine the individual preferences or recommendations. Although a Although the majority of the currently deployed recom- switching scheme between both aggregation strategies has mender systems are designed to generate personal sugges- already been investigated [2], the proposed combined strat- tions for individual users, in many cases content is selected egy is the first to generate group recommendations by using and consumed by groups of users rather than by individu- both aggregation strategies at once, thereby making a more als. This strengthens the need for group recommendations, informed decision. 2. EVALUATING GROUP RECOMMENDA- TIONS RecSys’13, October 12–16, 2013, Hong Kong, China. A major issue in the domain of group recommender sys- Paper presented at the 2013 Decisions@RecSys workshop in conjunc- tems is the evaluation of the accuracy, i.e., comparing the tion with the 7th ACM conference on Recommender Systems. Copyright generated recommendations for a group with the true pref- c 2013 for the individual papers by the papers’ authors. Copying permit- ted for private and academic purposes. This volume is published and copy- erences of the group. Performing online evaluations or inter- righted by its editors. viewing groups can be partial solutions but are not feasible 36 on a large scale or to extensively test alternative configura- tering (CF) is based on the work of Breese et al [3]. This tions. For example, in Section 5, five recommendation algo- nearest neighbor CF uses the Pearson correlation metric for rithms in combination with two data aggregation strategies discovering similar users in the user-based approach (UBCF) are evaluated for twelve different group sizes, thereby leading or similar items in the item-based approach (IBCF) based to 120 different setups of the experiment. Therefore, we are on the rating behavior of the users. As Content-Based forced to perform an offline evaluation, in which synthetic recommender (CB) the InterestLMS predictor of the open groups are sampled from the users of a traditional single- source implementation of the Duine framework [9] is adopted user data set. Since movies are often watched in group, we (and extended to consider extra metadata attributes). Based used the MovieLens (100K) data set for this evaluation. on the actors, directors, and genres of the content items and In the literature, group recommendations have been evalu- the user’s ratings for these items, the recommender builds ated several times by using a data set with simulated groups a profile model for every user. This profile contains an es- of users. Baltrunas et al. [1] used the MovieLens data set timation of the user’s preference for each genre, actor, and to simulate groups of different sizes (2, 3, 4, 8) and different director that is assigned to a rated item, and is used to pre- degrees of similarity (high, random) with the aim of eval- dict the user’s preference for unseen media items by match- uating the effectiveness of group recommendations. Chen ing the metadata of the items with the user’s profile. The et al. [4] also used the MovieLens data set and simulated used hybrid recommender (Hybrid) combines the recom- groups by randomly selecting the members of the group to mendations with the highest prediction score of the IBCF evaluate their proposed group recommendation algorithm. and the CB recommender into a new recommendation list. They simulated group ratings by calculating a weighted av- The result is an alternating list of the best recommendations erage of the group members’ ratings based on the users’ originating from these two algorithms. A user-centric evalu- opinion importance parameter. Quijano-Sánchez et al. [8] ation comparing different algorithms based on various char- used synthetically generated data to simulate groups of peo- acteristics showed that this straightforward combination of ple in order to test the accuracy of group recommendations CF and CB recommendations outperforms both individual for movies. In addition to this offline evaluation, they con- algorithms on almost every qualitative metric [6]. As recom- ducted an experiment with real users to validate the results mender based on matrix factorization, we opted for the open obtained with the synthetic groups. One of the main conclu- source implementation of the SVD recommender (SVD) sions of their study was that it is possible to realize trustwor- of the Apache Mahout project [10]. This recommender is thy experiments with synthetic data, as the online user test configured to use 19 features, which equals the number of confirmed the results of the experiment with synthetic data. genres in the MovieLens data set, and the number of itera- This conclusion justifies the use of an offline evaluation with tions is set at 50. To compare the results of the various rec- synthetic groups to evaluate the group recommendations in ommenders, the popular recommender was introduced our experiment. as a baseline. This recommender generates for every user This offline evaluation is based on the traditional proce- always the same list of most-popular items, which is based dure of dividing the data set in two parts: the training set, on the number of received ratings and the mean rating of which is used as input for the algorithm to generate the rec- each item. ommendations, and the test set, which is used to evaluate the recommendations. In this experiment, we ordered the 4. COMBINING STRATEGIES ratings chronologically and assigned the oldest 60% to the Previous research [5] has shown that the used aggregation training set and the most recent 40% to the test set, as this strategy in combination with the recommendation algorithm reflects a realistic scenario the best. has a major influence on the accuracy of the group recom- The used evaluation procedure was adopted from Bal- mendations. Certain algorithms (such as CB and UBCF) trunas et al. [1] and is performed as follows. Firstly, syn- produce more accurate group recommendations when the thetic groups are composed by selecting random users from aggregating preferences strategy is used, whereas other al- the data set. All users are assigned to one group of a pre- gorithms (such as IBCF and SVD) obtain a higher accu- defined size. Secondly, group recommendations are gener- racy in combination with the aggregating recommendations ated for each of these groups based on the ratings of the strategy. So, the choice of the aggregation strategy is cru- members in the training set. Since group recommendations cial for each algorithm in order to obtain the best group are intended to be consumed in group and to suit simul- recommendations. Instead of selecting one individual ag- taneously the preferences of all members of the group, all gregation strategy, traditional aggregation strategies can be members receive the same recommendation list. Thirdly, combined with the aim of obtaining group recommendations since no group ratings are available, the recommendations which outperform the group recommendations of each indi- are evaluated individually as in the classical single-user case, vidual aggregation strategy. In this context, Berkovsky and by comparing (the rankings of) the recommendations with Freyne [2] witnessed that the aggregating recommendations (the rankings of) the items in the test set of the user us- strategy yields a lower MAE (Mean Absolute Error) than ing the Normalized Discounted Cumulative Gain (nDCG) the aggregating preferences strategy if the user profiles have at rank 5. The nDCG is a standard information retrieval a low density (i.e., containing a low number of consump- measure, used to evaluate the recommendation lists [1]. tions). In contrast for high-density profiles, the aggregating preferences strategy resulted in the lowest MAE, thereby 3. RECOMMENDATION ALGORITHMS outperforming the aggregating recommendations strategy in The effectiveness of the different aggregation strategies terms of accuracy. Therefore, Berkovsky and Freyne pro- is measured for different sizes of the group and in combi- posed a switching scheme based on the profile density, which nation with various state-of-the art recommendation algo- yielded a small accuracy improvement compared to the in- rithms. The used implementation of Collaborative Fil- dividual strategies. However, their results were obtained in 37 a very specific setting. They only considered the accuracy gregating recommendations strategy. of recommendations generated by a CF algorithm, the MAE Subsequently, the two recommendation lists are combined metric was used to estimate the accuracy, and they focused into one recommendation list by combining the prediction on the specific use case of recipe recommendations using a scores of each aggregation strategy per item. In this ex- rather small data set (approximately 3300 ratings). Because periment, we opted for the average as method to combine of these specific settings, we were not able to obtain an ac- the prediction scores. So in the resulting recommendation curacy improvement by using such a switching scheme on list, each item’s prediction score is the average of the item’s the MovieLens data set. prediction score generated by the aggregating preferences Therefore, we propose an advanced data aggregation strat- strategy and the item’s prediction score produced by the egy which combines both individual aggregation strategies aggregating recommendations strategy. Alternative com- thereby yielding an accuracy gain compared to each individ- bining methods are also possible, e.g., a weighted average ual aggregation strategy for different recommendation algo- of the prediction scores with weights depending on the per- rithms. This combination of strategies aggregates the pref- formance of each individual aggregation strategy. Then, the erences of the users as well as their recommendations with items are ordered by their new prediction score in order to the aim of merging the knowledge of the two aggregation obtain the final list of group recommendations. strategies into a final group recommendation list. The idea is that if one of the aggregation strategies comes up with a less suitable or undesirable group recommendation, the 5. RESULTS other aggregation strategy can correct this mistake. This Our combined aggregation strategy is compared to the in- makes the group recommendations resulting from the com- dividual aggregation strategies in Figure 1. Since users are bination of strategies more robust than the group recom- randomly combined into groups and the accuracy of group mendations based on a single aggregation strategy. recommendations is depending on the composition of the The two aggregation strategies are combined as follows. groups, the accuracy slightly varies for each partitioning of First, group recommendations are calculated by using the se- the users into groups. (Except for the partitioning of the lected recommendation algorithm and the aggregating pref- users into groups of 1 member, which is only possible in 1 erences strategy. The result is a list of all items, ordered way.) Therefore, the process of composing groups by taking according to their prediction score. In case of an individ- a random selection of users is repeated 30 times and just ual aggregation strategy, the top-N items on that list are as much measurements of the accuracy are performed. So, selected as suggestions for the group. After calculating the the graph shows the mean accuracy of these measurements group recommendations using the aggregating preferences as an estimation of the quality of the group recommenda- strategy, or in parallel with it, group recommendations are tions (on the vertical axis), as well as the 95% confidence generated using the chosen algorithm and the aggregating interval of the mean value, in relation to the recommen- recommendations strategy. Again, the result is an ordered dation algorithm, aggregation strategy, and the group size. list of items with their corresponding prediction score. The group size is indicated on the horizonal axis. The ver- Both of these lists with group recommendation can still tical axis crosses the horizontal axis at the quality level of contain items that are less suitable for the group, even at the most-popular recommender. The prefix “Combined” of the top of the list. The next phase will try to eliminate these the bar series stands for the proposed aggregation strategy items by comparing the two resulting recommendation lists. which combines the aggregating preferences and aggregat- Items that are at the top of both lists are probably interest- ing recommendations strategy. The prefix “Pref” and “Rec” ing recommendations, whereas items at the bottom of both indicate the accuracy of the two individual strategies, re- lists are usually less suitable for the group. Less certainty spectively the aggregating preferences and aggregating rec- exists about the items that are at the top of the recom- ommendations strategy. For each algorithm, only the most mendation list that is generated by one of the aggregation accurate individual strategy is shown: aggregating prefer- strategies but that are in the middle or even at the bottom ences for UBCF and CB, aggregating recommendations for of the recommendation list produced by using the other ag- SVD, IBCF, and Hybrid [5]. gregation strategy. Therefore, both recommendation lists The non-overlapping confidence intervals indicate a sig- are adapted by eliminating these uncertain items in order nificant improvement of the combined aggregation strategy to contain only items that appear at the top of both recom- compared to the best individual aggregation strategy. Ta- mendation lists, thereby reducing the risk of recommending ble 1 shows the results of the statistical T-tests comparing undesirable or less suitable items to the group. So, items the mean accuracy of the recommendations generated by that are ranked below a certain threshold position in (at the best individual aggregation strategy and by the com- least) one of the recommendation lists generated by the two bined aggregation strategy for groups with size = 5. (Simi- aggregation strategies, are removed from both lists. If only lar results are obtained for other group sizes.) The null hy- one aggregation strategy is used, identifying uncertain items pothesis, H0 = the mean accuracy of the recommendations based on the results of a complementary recommendation generated by using the best individual aggregation strat- list is not possible. In this experiment, we opted to exclude egy is equal to the mean accuracy of the recommendations these items from the recommendation lists, that are not in generated by using the combined aggregation strategy. The the top-5% of both recommendation lists (i.e., the top-84 of small p-values (all smaller than 0.05) prove the significant recommended items for the MovieLens data set). As a re- accuracy improvement of our proposed aggregation strategy. sult, the recommendation lists contains only items that are identified as ‘the most suitable’ by both aggregation strate- 6. CONCLUSIONS gies, ordered according to the prediction scores calculated This paper presents a new strategy to aggregate the tastes using either the aggregating preferences strategy or the ag- of multiple users in order to generate group recommenda- 38 Comparison of the best individual aggregation strategy and the combined aggregation strategy 0.905 RecSVD CombinedSVD RecHybrid CombinedHybrid RecIBCF CombinedIBCF PrefUBCF CombinedUBCF PrefCB CombinedCB 0.9 0.895 Mean nDCG 0.89 0.885 0.88 0.875 1 2 3 4 5 6 7 8 9 10 15 20 0.87 Group size (number of users) Figure 1: The accuracy of the group recommendations calculated using the best individual aggregation strategy and the combined aggregation strategy filtering. In Proceedings of the Fourteenth conference Table 1: Statistical T-test comparing the best in- on Uncertainty in artificial intelligence, UAI’98, pages dividual aggregation strategy and the combined ag- 43–52, San Francisco, CA, USA, 1998. gregation strategy for groups with size=5 [4] Y.-L. Chen, L.-C. Cheng, and C.-N. Chuang. A group Algorithm t(58) p-value recommendation system with consideration of SVD -4.39 0.00 interactions among group members. Expert Systems Hybrid -2.53 0.01 with Applications, 34(3):2082 – 2090, 2008. IBCF -2.33 0.02 [5] T. De Pessemier, S. Dooms, and L. Martens. Design UBCF -2.66 0.01 and evaluation of a group recommender system. In CB -3.55 0.00 Proceedings of the sixth ACM conference on Recommender systems, RecSys ’12, pages 225–228, tions. Both existing data aggregation strategies are com- New York, NY, USA, 2012. ACM. bined to make a more informed decision hereby reducing [6] S. Dooms, T. De Pessemier, and L. Martens. A the risk of recommending undesirable or less suitable items user-centric evaluation of recommender algorithms for to the group. The results show that the combination of ag- an event recommendation system. In Proceedings of gregation strategies outperforms the individual aggregation the workshop on User-Centric Evaluation of strategies for various sizes of the group and in combination Recommender Systems and Their Interfaces at ACM with various recommendation algorithms. The proposed ag- Conference on Recommender Systems (RECSYS), gregation strategy can be used to increase the accuracy of pages 67–73, 2011. (commercial) group recommender systems. [7] J. Masthoff. Group modeling: Selecting a sequence of television items to suit a group of viewers. User 7. REFERENCES Modeling and User-Adapted Interaction, 14:37–85, [1] L. Baltrunas, T. Makcinskas, and F. Ricci. Group 2004. recommendations with rank aggregation and [8] L. Quijano-Sanchez, J. A. Recio-Garcia, and collaborative filtering. In Proceedings of the fourth B. Diaz-Agudo. Personality and social trust in group ACM conference on Recommender systems, RecSys recommendations. In Proceedings of the 2010 22nd ’10, pages 119–126, New York, NY, USA, 2010. ACM. IEEE International Conference on Tools with Artificial [2] S. Berkovsky and J. Freyne. Group-based recipe Intelligence - Volume 02, ICTAI ’10, pages 121–126, recommendations: analysis of data aggregation Washington, DC, USA, 2010. IEEE Computer Society. strategies. In Proceedings of the fourth ACM [9] Telematica Instituut / Novay. Duine Framework, conference on Recommender systems, RecSys ’10, 2009. Available at http://duineframework.org/. pages 111–118, New York, NY, USA, 2010. ACM. [10] The Apache Software Foundation. Apache Mahout, [3] J. S. Breese, D. Heckerman, and C. Kadie. Empirical 2012. Available at http://mahout.apache.org/. analysis of predictive algorithms for collaborative 39