Probabilistic Game Theoretic Algorithms for Group Recommender Systems George Popescu Pearl Pu EPFL - HCI Group EPFL - HCI Group IC IIF Station 14 IC IIF Station 14 1015 Lausanne, Switzerland 1015 Lausanne, Switzerland +41 21 693 1246 +41 21 693 6081 george.popescu@epfl.ch pearl.pu@epfl.ch ABSTRACT distinct strategies based on these. Social choice theory aims to Aggregating users’ individual preference and recommending a offer an answer to “which strategy is most effective and will be common set of items for a group has become a challenging topic most liked by a group of users?” (Hastie and Kameda, 2005). in group recommender systems and social websites. This issue is With the purpose of determining what strategy people actually mainly concerned with the following three objectives: eliciting use, Masthoff (2004) found that individuals use computationally individual users' preferences, suggesting outcomes that maximize simple strategies mentioned above, particularly the average the overall satisfaction for all users and ensuring that the strategy, the average without misery and the least misery strategy. aggregation mechanism is resistant to individual users' However, there is no dominant strategy as people switch between manipulation. Firstly we show how our proposed probabilistic them given a different context. Fairness plays an important role in weighted-sum algorithm (PWS) works and emphasize on its decision making but members do not have a clear strategy for advantages. Then we compare PWS with related approaches applying it. implemented in similar systems using the case of our music Our main research question is to determine “which group recommender, GroupFun. We describe an experiment design to satisfaction rule best satisfy users expectations”. We propose 4 study users’ perceptions of the algorithms, their perceived fairness algorithms and investigate upon: “which algorithm is best suited and incentives to manipulate the final recommendation outcome. to meet users’ expectations” for our music recommender system, We expect our results to show that PWS will be perceived as fair GroupFun and “how users perceive the algorithms’ accuracy”. and diversity- and discovery-driven, thus enhancing the group's Next we present related work, then considered algorithms together satisfaction. Our future work will focus on the actual evaluation of with our implementation and future experiment design. GroupFun using the experiment design presented here. 2. BASELINE AND RELATED WORK Categories and Subject Descriptors H1.2 [User/Machine Systems]: human factors; H5.2 [User 2.1 MusicFX Interfaces]: evaluation/methodology, user-centered design. MusicFX is a music system offering best music match to employees working out in a fitness center (McCarthy and General terms Anagnost, 1998). The algorithm aims at selecting the most Experimentation, Human factors. preferred music genre that maximizes members’ listening pleasure. For this it computes a group preference index and sums Keywords squared individual preferences. Then it lists the most popular Quality measurement, usability evaluation, recommender systems, categories. The system also saves events in its history such as: quality of user experience, post-study questionnaire. member entrance, member exit, individual preference update, system parameter adjustment and maximum station play time 1. INTRODUCTION elapsed. Since some individual preference filters may not change, Group recommender systems use various aggregation strategies to the system opts for a different music configuration according to suggest a common list of items to a group of users. These two criteria: playing more the music which members like most strategies aim at increasing the group’s welfare and are based on and playing less the music which members like least. The users’ votes on items. The social welfare is an aggregate of weighted random selection operator is one strategy used to reduce individual utilities of all group members. Most common used the likelihood of starvation. Another strategy is limiting the period deterministic strategies are: plurality voting, utilitarian, approval of time for one genre to be played – regardless of how popular it voting, least misery, most pleasure, average without misery, is – before the selection algorithm is invoked in order to select a fairness, most respected person, Borda count, Copeland rule or new station. MusicFX has two important advantages: (1) it Kemeny scores (Masthoff, 2005). One can easily create other increases the variety of music and (2) it democratizes the music selection process. Thus it is adaptive to changing preferences of its users also proposing new songs for them. One drawback of the WOMRAD 2011 2ndWorkshop on Music Recommendation and system is that it changes music stations abruptly in the middle of Discovery, colocated with ACM RecSys 2011 (Chicago, US) the songs. Copyright ©. This is an open-access article distributed under the terms of the Creative Commons Attribution License 3.0 Unported, which permits unrestricted use, distribution, and reproduction in any medium, 2.2 PolyLens provided the original author and source are credited. PolyLens is a collaborative filtering recommender system which recommends movies to groups of people based on their individual preferences (O’Connor et al. 2001). It represents a group extension of the MovieLens recommender system with over outcomes are songs si that are selected in the common playlist. 80,000 users and their ratings for more than 3,500 movies (with a total of nearly 5 million votes). Users can create and manage We let each user a j submit a numerical vote score( s i , a j ) for groups, access individual and group recommendations and receive notification alerts for group invitation. The algorithm uses the each song si that reflects their preference for that song. These least misery strategy given that the groups formed to watch a votes are given as ratings on a 5-point Likert scale and normalized movie together tend to be small. As such the group is as happy as so that the scores given by each user sum to 1: its least happy member. The authors mention that “the social  , value function and algorithm are unlikely to work well for large  , =∑ (1) groups”. They further note that “it is still an open research   , question to understand the types of social value functions that best We then assign a joint score to each song that is computed as the satisfy large groups and to implement them algorithmically”. sum of the scores given by the individual users: 2.3 Voting strategies An extensive study on a group of television viewers aiming at    = ∑ ∈  ,  (2) finding which strategy people use was realized by Masthoff To choose the songs to be included in a playlist of length k, a (2005). In the experiment 10 deterministic group voting rules are compared: plurality voting, utilitarian strategy, Borda count, deterministic method is to choose the k songs with the highest Copeland rule, approval voting, least misery strategy, most joint rating: weighted sum (DWS): pleasure strategy, average without misery strategy, fairness    strategy and most respected person strategy. The experiment    = ∑ (3) shows that individuals do not use a clear dominant strategy, but  ∈    average, average without misery, and least misery are all plausible The probabilistic weighted sum (PWS) iteratively selects each of candidates for implementation. In a different experiment addressing how people judge the recommendation results the k songs randomly according to the probability distribution: multiplicative utilitarian strategy is the most promising strategy,    but the other strategies received close scores. In the study of   = ∑ (4) television viewers the hypothesis that social status influences  ∈    selection has no statistical dominance. Non-linear utility suits By comparison, the probabilistic sum (PS) method chooses the k better users’ expectations than a linear one. For instance quadratic songs with equal probability: rating scale is appropriate for implementation. Furthermore, there is strong evidence that human subjects use a series of simple strategies in different judgment contexts they face. For instance, if   = (5) |"| one takes the satisfaction of the group to be the average of the satisfaction of the individuals, then the average strategy performs The least misery (LM) method takes into account the minimum of best. Taking the minimum better corresponds to the predictions ratings for each user: which are made by individuals assessing their own needs. 3. ALGORITHMS min & , ',∀ ∈) (6) In the music domain many users usually form many groups and listen to many songs. Given the fact that the length of one song is 3.2 Example of 3 to 4 minutes users usually select a playlist containing several To illustrate how each algorithm works, we consider the following to lots of songs. This is not the case of movies selection when example. In the next table, user1, user2, and user 3 represent users need to agree on only one or few movie(s) they would like group members. The score distribution normalized to 1 for each of to consume given their limited time and the length of a movie: the users is displayed in the respective row, and the joint scores ~2h. Thus, the music domain presents both opportunities and are shown in the table below. challenges since the recommendation needs to focus on both diversity and accuracy. Table I. Item selection example using the 4 algorithms We propose the following 4 algorithms for comparison: User1 Song1: 0.1 Song2: 0.4 Song3: 0.4 Song4: 0.1 • PS (Probabilistic Sum): select the common playlist’ User2 Song1: __ Song2: 0.2 Song3: __ Song4: 0.8 songs probabilistically, each of them having the same probability to be selected User3 Song1: 0.4 Song2: 0.2 Song3: __ Song4: 0.4 • LM (Least Misery): select songs with the highest Total Song1: 0.5 Song2: 0.8 Song3: 0.4 Song4: 1.3 minimum individual ratings • DWS (Deterministic Weighted Sum): deterministically select songs with the highest score The least misery (LM) will choose song 2 and song 3 (each of them has the minimal rating 0.2). For all other songs the minimum • PWS (Probabilistic Weighted Sum): compute weighted score is 0.1. After normalizing the total scores by the sum of sum and select songs based on their score probabilities. scores, we obtain the following probability distribution for the set of outcomes. 3.1 General framework Let A be the set of all users and S the set of all possible outcomes that can be rated. In our group music recommendation setting, the Table II. Probability distribution 4. GROUPFUN P Song1: 0.16 Song2: 0.26 Song3: 0.13 Song 4: 0.43 GroupFun is a web application that helps a group of friends to agree on a common music playlist for a given event they will Considering the probability as the final score, the deterministic attend, e.g. a birthday party or a graduation ceremony. Firstly, it is weighted sum (DWS) will chose songs 4, 2, 1 and 3. Probabilistic implemented as a Facebook plugin connecting users to their weighted sum (PWS) will choose one song after another using this friends. Secondly, it is a music application that helps individuals probabilistic distribution. Compared to other social choice based to manage and share their favorite music with groups. In algorithms, PWS is incentive compatible. That is, it is to the best GroupFun users can listen to their own collection of songs as well interest of the individual to reveal his/her preferences truthfully. It as their friends’ music. With the collective music database, the application integrates friends’ music tastes and recommends a is in fact equivalent to a random dictator method, where the common playlists to them. Therefore, the application aims at dictator will choose a song randomly with the probabilities given satisfying music tastes of the whole group by aggregating by its degree of preference – a reasonable method since nobody individual preferences through the use of previously presented wants to hear the same song over and over again. This is because algorithms. the probability of a song si to be chosen can be written as:      ,    = || ∑ ∈ || (7) or, in other words, the probability of choosing user a j times the normalized score that user a j has given to song si . Indeed, User3’s preference for song 1 yields a significant probability that this song will be included in the playlist, relative to other songs. 3.3 Discussion The contribution of the PWS algorithm in the paper stands out with respect to group satisfaction. We expect users to be more satisfied using PWS than other algorithms given their expectations to discover the music of other members. Figure 1. “Home” page of GroupFun Advantages of PWS compared with the other algorithms: In the “Home” page users see 4 playlists: one from a recent event, one containing popular songs, one from a party and the last one 1. Users are free to choose the number of songs from an older event. They can listen to each song in each of the 2. Ratings are updated permanently playlists. 3. The algorithm is computationally simple 4. Users can negotiate their ratings and trade utility 5. Incentive-compatible truthful property is observed 6. The algorithm favors music diversity The disadvantages of PWS are: 1. It is difficult to quantify rating differences between distinct users. The weights given by each user cannot be compared with the ones given by another since users have different estimations of their utility. 2. Self-selection effect: most popular songs will receive most votes - not ideal if long tail distribution is desired. Since PWS can be interpreted as similar to the random scheme users have to test it in more recommendation rounds to understand its inner logic. PWS can be further developed to include the group Figure 2. "My Groups" page of GroupFun dynamics. One solution is to consider trust and other members’ comments on the songs rated by one user (e.g. “like”/”dislike”). In the “My Group” page users can create groups, upload and rate their music, invite friends and hear the group’s songs. Finally, in The PWS algorithm stands out with respect to allowing users to the “My Music” page users see their contribution to GroupFun: engage in trustful individual preference elicitation and music for each song is displayed the associated group, the rating and its discovery. By returning to the recommendation list the group will name and artist. Users can also listen to their individual find a different playlist every-time they are would like to listen to preferences in the same interface. One of the most important group music. By considering the probabilistic distribution of characteristics of GroupFun is that it combines music, friends and ratings and an extensive music library the algorithm will mostly groups together. In other words, GroupFun serves as a platform suggest songs strongly liked by most others. Sometimes it will allowing users to conveniently organize their individual music recommend unexpected, rating-wise, serendipitous items library, effectively communicate with friends and actively facilitating music discovery and group enjoyment. participate in social activities. 5. EXPERIMENT DESIGN Outcome change I was interested in changing the To compare how users perceive the 4 algorithms, we plan to carry intention outcome of the algorithm. on a between-groups user study. With the results of the experiment we will be able to make a judgment of the influence of 6. CONCLUSIONS AND FUTURE WORK both the algorithms and the design on users’ satisfaction. We plan In this paper we presented our research work on the algorithmic to collect solid user feedback regarding how an algorithm should development and evaluation of our music recommender system allow group members to arrive at a common decision in a music and Facebook application, GroupFun. The major contribution of recommendation setting. this paper is the demonstration of the applicability of the PWS Our hypotheses are that: (1) users will not reveal their preferences algorithm for group recommendation strategies and negotiation. In strategically as to influence the algorithm’s outcome; (2) they will this context, we analyzed different group recommendation prefer more PWS than DWS given the increased diversity of the approaches w.r.t. group satisfaction and discussed key satisfaction recommendations they receive and (3) the group will perceive issues to be taken into account. The PWS algorithm we proposed more overall satisfaction but less diversity using the LM calculates probabilities for songs to appear in groups’ playlists algorithm compared with PWS. favoring music diversity and discovery. Using PWS users state their preference truthfully. They align their decision to that of the 5.1 Procedure group. Furthermore, our current development of GroupFun allows First we recruit university students, friends who have Facebook users to create groups, rate and share their music profiles with accounts and other users on the Amazon Mechanical Turk their friends. platform. All of them will use their own computers in order to To understand how users’ perceive our algorithms and current connect to the GroupFun application. We consider 4 algorithms interface, we plan to conduct an experiment to compare the 4 implemented for 4 groups of 40 users together with 1 interface. algorithms in a between-subjects study. As such we will evaluate Each of the algorithms displays a common list of 10 songs to all user satisfaction for music group recommendations. Furthermore, group members. Users are asked to contribute their music to only to learn more about the perceived ease of use and perceived one of the groups and fill in an online post-study questionnaire usefulness of our system we plan to invite more members and assessing their satisfaction. The final music outcome is shown to analyze user feedback in terms of design and functionality. We all group members after they have finished their tasks. Users can also intend to develop a new version of the algorithm which will interact with the system in diverse ways such as: upload more or better match users’ behavior and expectations. less songs, change their ratings, see and hear to their friends’ playlists, etc. 7. ACKNOWLEDGMENTS Table III. Evaluation of 4 algorithms using the same interface This research was supported by EPFL and the Swiss National Science Foundation. We thank Elsa Friscira and Laurentiu PWS DWS LM PS Dascalu for their important contribution to the development of Interface/Algorithm GroupFun and GroupFun users for their answers and involvement 40 40 40 40 in the evaluation of our algorithms. Interface 5.2 Measurements 8. REFERENCES The first 40 users see the results of the probabilistic weighted sum [1] Hastie, R. and Kameda, T. 2005. The robust beauty of algorithm, the next those of deterministic weighted sum and so on. majority rules in group decisions. In Psychological Review, Since individuals who upload more songs expect to see their song Vol. 112, no. 2, 494-508. names more often in the final list they would prefer to know a [2] Masthoff, J. 2004. Group modeling: Selecting a sequence of priori the computation rule of the algorithm so that they would television items to suit a group of viewers. User Modeling adjust the number of songs they upload. Given users’ known self- and User-Adapted Interaction, Vol. 14, no. 1, 37-85. ratings and group ratings computed by the algorithms we expected our subjects to identify some differences between the 4 [3] Masthoff, J. 2005. The pursuit of satisfaction: affective state approaches. Some of the questions from the post-study in group recommender systems. In Computer Science, Vol. questionnaire are presented in the table below. They were 3538, 297-306. extracted from a well-known user evaluation model, named [4] McCarthy J.F. and Anagnost, T.D. 1998. MusicFX: an ResQue [7], that our group has developed. arbiter of group preferences for computer supported Table IV. Evaluation questions collaborative workouts. Proc. ACM Computer Supported Cooperative Work, 363-372. Measurements Questions [5] O’Connor, M., Cosley, D., Konstan, J.A., Riedl, J. 2001. Perceived The layout of the system's interface is PolyLens: A recommender system for groups of users. Proc. attractiveness attractive. ACM European Conference on Computer Supported Cooperative Work. Perceived satisfaction The items recommended to me [6] Popescu G. and Pu P. 2010. Group Recommender Systems as matched my interest. a Voting Problem. EPFL Technical report. Perceived helpfulness I took into account the ratings given by [7] Pu P. and Chen L. 2010. A User-Centric Evaluation my friends. Framework of Recommender Systems. In the 3rd ACM Conference on Recommender Systems, Barcelona, Spain