=Paper= {{Paper |id=None |storemode=property |title=Probabilistic Game Theoretic Algorithms for Group Recommender Systems |pdfUrl=https://ceur-ws.org/Vol-793/womrad2011_paper6.pdf |volume=Vol-793 }} ==Probabilistic Game Theoretic Algorithms for Group Recommender Systems== https://ceur-ws.org/Vol-793/womrad2011_paper6.pdf
                       Probabilistic Game Theoretic Algorithms
                          for Group Recommender Systems
                        George Popescu                                                               Pearl Pu
                       EPFL - HCI Group                                                       EPFL - HCI Group
                        IC IIF Station 14                                                      IC IIF Station 14
                   1015 Lausanne, Switzerland                                             1015 Lausanne, Switzerland
                        +41 21 693 1246                                                        +41 21 693 6081
                  george.popescu@epfl.ch                                                      pearl.pu@epfl.ch

ABSTRACT                                                                   distinct strategies based on these. Social choice theory aims to
Aggregating users’ individual preference and recommending a                offer an answer to “which strategy is most effective and will be
common set of items for a group has become a challenging topic             most liked by a group of users?” (Hastie and Kameda, 2005).
in group recommender systems and social websites. This issue is            With the purpose of determining what strategy people actually
mainly concerned with the following three objectives: eliciting            use, Masthoff (2004) found that individuals use computationally
individual users' preferences, suggesting outcomes that maximize           simple strategies mentioned above, particularly the average
the overall satisfaction for all users and ensuring that the               strategy, the average without misery and the least misery strategy.
aggregation mechanism is resistant to individual users'                    However, there is no dominant strategy as people switch between
manipulation. Firstly we show how our proposed probabilistic               them given a different context. Fairness plays an important role in
weighted-sum algorithm (PWS) works and emphasize on its                    decision making but members do not have a clear strategy for
advantages. Then we compare PWS with related approaches                    applying it.
implemented in similar systems using the case of our music                 Our main research question is to determine “which group
recommender, GroupFun. We describe an experiment design to                 satisfaction rule best satisfy users expectations”. We propose 4
study users’ perceptions of the algorithms, their perceived fairness       algorithms and investigate upon: “which algorithm is best suited
and incentives to manipulate the final recommendation outcome.             to meet users’ expectations” for our music recommender system,
We expect our results to show that PWS will be perceived as fair           GroupFun and “how users perceive the algorithms’ accuracy”.
and diversity- and discovery-driven, thus enhancing the group's            Next we present related work, then considered algorithms together
satisfaction. Our future work will focus on the actual evaluation of       with our implementation and future experiment design.
GroupFun using the experiment design presented here.
                                                                           2. BASELINE AND RELATED WORK
Categories and Subject Descriptors
H1.2 [User/Machine Systems]: human factors; H5.2 [User                     2.1 MusicFX
Interfaces]: evaluation/methodology, user-centered design.                 MusicFX is a music system offering best music match to
                                                                           employees working out in a fitness center (McCarthy and
General terms                                                              Anagnost, 1998). The algorithm aims at selecting the most
Experimentation, Human factors.                                            preferred music genre that maximizes members’ listening
                                                                           pleasure. For this it computes a group preference index and sums
Keywords                                                                   squared individual preferences. Then it lists the most popular
Quality measurement, usability evaluation, recommender systems,            categories. The system also saves events in its history such as:
quality of user experience, post-study questionnaire.                      member entrance, member exit, individual preference update,
                                                                           system parameter adjustment and maximum station play time
1. INTRODUCTION                                                            elapsed. Since some individual preference filters may not change,
Group recommender systems use various aggregation strategies to            the system opts for a different music configuration according to
suggest a common list of items to a group of users. These                  two criteria: playing more the music which members like most
strategies aim at increasing the group’s welfare and are based on          and playing less the music which members like least. The
users’ votes on items. The social welfare is an aggregate of               weighted random selection operator is one strategy used to reduce
individual utilities of all group members. Most common used                the likelihood of starvation. Another strategy is limiting the period
deterministic strategies are: plurality voting, utilitarian, approval      of time for one genre to be played – regardless of how popular it
voting, least misery, most pleasure, average without misery,               is – before the selection algorithm is invoked in order to select a
fairness, most respected person, Borda count, Copeland rule or             new station. MusicFX has two important advantages: (1) it
Kemeny scores (Masthoff, 2005). One can easily create other                increases the variety of music and (2) it democratizes the music
                                                                           selection process. Thus it is adaptive to changing preferences of
                                                                           its users also proposing new songs for them. One drawback of the
 WOMRAD 2011 2ndWorkshop on Music Recommendation and                       system is that it changes music stations abruptly in the middle of
 Discovery, colocated with ACM RecSys 2011 (Chicago, US)                   the songs.
 Copyright ©. This is an open-access article distributed under the terms
 of the Creative Commons Attribution License 3.0 Unported, which
 permits unrestricted use, distribution, and reproduction in any medium,
                                                                           2.2 PolyLens
 provided the original author and source are credited.                     PolyLens is a collaborative filtering recommender system which
                                                                           recommends movies to groups of people based on their individual
                                                                           preferences (O’Connor et al. 2001). It represents a group
extension of the MovieLens recommender system with over                 outcomes are songs si that are selected in the common playlist.
80,000 users and their ratings for more than 3,500 movies (with a
total of nearly 5 million votes). Users can create and manage           We let each user a j submit a numerical vote score( s i , a j ) for
groups, access individual and group recommendations and receive
notification alerts for group invitation. The algorithm uses the        each song si that reflects their preference for that song. These
least misery strategy given that the groups formed to watch a           votes are given as ratings on a 5-point Likert scale and normalized
movie together tend to be small. As such the group is as happy as       so that the scores given by each user sum to 1:
its least happy member. The authors mention that “the social                                                        ,
value function and algorithm are unlikely to work well for large                       ,         =∑                                (1)
groups”. They further note that “it is still an open research                                                       ,
question to understand the types of social value functions that best    We then assign a joint score to each song that is computed as the
satisfy large groups and to implement them algorithmically”.            sum of the scores given by the individual users:
2.3 Voting strategies
An extensive study on a group of television viewers aiming at
                                                                                      = ∑ ∈  ,                        (2)
finding which strategy people use was realized by Masthoff
                                                                        To choose the songs to be included in a playlist of length k, a
(2005). In the experiment 10 deterministic group voting rules are
compared: plurality voting, utilitarian strategy, Borda count,          deterministic method is to choose the k songs with the highest
Copeland rule, approval voting, least misery strategy, most             joint rating: weighted sum (DWS):
pleasure strategy, average without misery strategy, fairness
                                                                                                                 
strategy and most respected person strategy. The experiment                                = ∑                                    (3)
shows that individuals do not use a clear dominant strategy, but                                               ∈   
average, average without misery, and least misery are all plausible
                                                                        The probabilistic weighted sum (PWS) iteratively selects each of
candidates for implementation. In a different experiment
addressing how people judge the recommendation results                  the k songs randomly according to the probability distribution:
multiplicative utilitarian strategy is the most promising strategy,                                         
but the other strategies received close scores. In the study of                              = ∑                                      (4)
television viewers the hypothesis that social status influences                                           ∈   
selection has no statistical dominance. Non-linear utility suits        By comparison, the probabilistic sum (PS) method chooses the k
better users’ expectations than a linear one. For instance quadratic
                                                                        songs with equal probability:
rating scale is appropriate for implementation. Furthermore, there
is strong evidence that human subjects use a series of simple
strategies in different judgment contexts they face. For instance, if                                 =                               (5)
                                                                                                                   |"|
one takes the satisfaction of the group to be the average of the
satisfaction of the individuals, then the average strategy performs     The least misery (LM) method takes into account the minimum of
best. Taking the minimum better corresponds to the predictions          ratings for each user:
which are made by individuals assessing their own needs.

3. ALGORITHMS                                                                         min & ,              ',∀      ∈)             (6)
In the music domain many users usually form many groups and
listen to many songs. Given the fact that the length of one song is     3.2 Example
of 3 to 4 minutes users usually select a playlist containing several    To illustrate how each algorithm works, we consider the following
to lots of songs. This is not the case of movies selection when         example. In the next table, user1, user2, and user 3 represent
users need to agree on only one or few movie(s) they would like         group members. The score distribution normalized to 1 for each of
to consume given their limited time and the length of a movie:          the users is displayed in the respective row, and the joint scores
~2h. Thus, the music domain presents both opportunities and             are shown in the table below.
challenges since the recommendation needs to focus on both
diversity and accuracy.                                                     Table I. Item selection example using the 4 algorithms
We propose the following 4 algorithms for comparison:                   User1     Song1: 0.1     Song2: 0.4        Song3: 0.4      Song4: 0.1
    •   PS (Probabilistic Sum): select the common playlist’
                                                                        User2     Song1: __      Song2: 0.2        Song3: __       Song4: 0.8
        songs probabilistically, each of them having the same
        probability to be selected                                      User3     Song1: 0.4     Song2: 0.2        Song3: __       Song4: 0.4
    •   LM (Least Misery): select songs with the highest                Total     Song1: 0.5     Song2: 0.8        Song3: 0.4      Song4: 1.3
        minimum individual ratings
    •   DWS (Deterministic Weighted Sum): deterministically
        select songs with the highest score                             The least misery (LM) will choose song 2 and song 3 (each of
                                                                        them has the minimal rating 0.2). For all other songs the minimum
    •   PWS (Probabilistic Weighted Sum): compute weighted
                                                                        score is 0.1. After normalizing the total scores by the sum of
        sum and select songs based on their score probabilities.        scores, we obtain the following probability distribution for the set
                                                                        of outcomes.
3.1 General framework
Let A be the set of all users and S the set of all possible outcomes
that can be rated. In our group music recommendation setting, the
                Table II. Probability distribution                        4. GROUPFUN
 P     Song1: 0.16      Song2: 0.26     Song3: 0.13       Song 4: 0.43    GroupFun is a web application that helps a group of friends to
                                                                          agree on a common music playlist for a given event they will
Considering the probability as the final score, the deterministic         attend, e.g. a birthday party or a graduation ceremony. Firstly, it is
weighted sum (DWS) will chose songs 4, 2, 1 and 3. Probabilistic          implemented as a Facebook plugin connecting users to their
weighted sum (PWS) will choose one song after another using this          friends. Secondly, it is a music application that helps individuals
probabilistic distribution. Compared to other social choice based         to manage and share their favorite music with groups. In
algorithms, PWS is incentive compatible. That is, it is to the best       GroupFun users can listen to their own collection of songs as well
interest of the individual to reveal his/her preferences truthfully. It   as their friends’ music. With the collective music database, the
                                                                          application integrates friends’ music tastes and recommends a
is in fact equivalent to a random dictator method, where the              common playlists to them. Therefore, the application aims at
dictator will choose a song randomly with the probabilities given         satisfying music tastes of the whole group by aggregating
by its degree of preference – a reasonable method since nobody            individual preferences through the use of previously presented
wants to hear the same song over and over again. This is because          algorithms.
the probability of a song si to be chosen can be written as:

                                         , 
           =        ||
                                      ∑ ∈
                                                    ||
                                                                    (7)

or, in other words, the probability of choosing user a j times the

normalized score that user      a j has given to song si . Indeed,
User3’s preference for song 1 yields a significant probability that
this song will be included in the playlist, relative to other songs.

3.3 Discussion
The contribution of the PWS algorithm in the paper stands out
with respect to group satisfaction. We expect users to be more
satisfied using PWS than other algorithms given their expectations
to discover the music of other members.                                                 Figure 1. “Home” page of GroupFun
Advantages of PWS compared with the other algorithms:                     In the “Home” page users see 4 playlists: one from a recent event,
                                                                          one containing popular songs, one from a party and the last one
1.   Users are free to choose the number of songs                         from an older event. They can listen to each song in each of the
2.   Ratings are updated permanently                                      playlists.
3.   The algorithm is computationally simple
4.   Users can negotiate their ratings and trade utility
5.   Incentive-compatible truthful property is observed
6.   The algorithm favors music diversity
The disadvantages of PWS are:
1.   It is difficult to quantify rating differences between distinct
     users. The weights given by each user cannot be compared
     with the ones given by another since users have different
     estimations of their utility.
2.   Self-selection effect: most popular songs will receive most
     votes - not ideal if long tail distribution is desired.
Since PWS can be interpreted as similar to the random scheme
users have to test it in more recommendation rounds to understand
its inner logic. PWS can be further developed to include the group                   Figure 2. "My Groups" page of GroupFun
dynamics. One solution is to consider trust and other members’
comments on the songs rated by one user (e.g. “like”/”dislike”).          In the “My Group” page users can create groups, upload and rate
                                                                          their music, invite friends and hear the group’s songs. Finally, in
The PWS algorithm stands out with respect to allowing users to            the “My Music” page users see their contribution to GroupFun:
engage in trustful individual preference elicitation and music            for each song is displayed the associated group, the rating and its
discovery. By returning to the recommendation list the group will         name and artist. Users can also listen to their individual
find a different playlist every-time they are would like to listen to     preferences in the same interface. One of the most important
group music. By considering the probabilistic distribution of             characteristics of GroupFun is that it combines music, friends and
ratings and an extensive music library the algorithm will mostly          groups together. In other words, GroupFun serves as a platform
suggest songs strongly liked by most others. Sometimes it will            allowing users to conveniently organize their individual music
recommend unexpected, rating-wise, serendipitous items                    library, effectively communicate with friends and actively
facilitating music discovery and group enjoyment.                         participate in social activities.
5. EXPERIMENT DESIGN                                                    Outcome         change    I was interested in changing the
To compare how users perceive the 4 algorithms, we plan to carry        intention                 outcome of the algorithm.
on a between-groups user study. With the results of the
experiment we will be able to make a judgment of the influence of      6. CONCLUSIONS AND FUTURE WORK
both the algorithms and the design on users’ satisfaction. We plan     In this paper we presented our research work on the algorithmic
to collect solid user feedback regarding how an algorithm should       development and evaluation of our music recommender system
allow group members to arrive at a common decision in a music          and Facebook application, GroupFun. The major contribution of
recommendation setting.                                                this paper is the demonstration of the applicability of the PWS
Our hypotheses are that: (1) users will not reveal their preferences   algorithm for group recommendation strategies and negotiation. In
strategically as to influence the algorithm’s outcome; (2) they will   this context, we analyzed different group recommendation
prefer more PWS than DWS given the increased diversity of the          approaches w.r.t. group satisfaction and discussed key satisfaction
recommendations they receive and (3) the group will perceive           issues to be taken into account. The PWS algorithm we proposed
more overall satisfaction but less diversity using the LM              calculates probabilities for songs to appear in groups’ playlists
algorithm compared with PWS.                                           favoring music diversity and discovery. Using PWS users state
                                                                       their preference truthfully. They align their decision to that of the
5.1 Procedure                                                          group. Furthermore, our current development of GroupFun allows
First we recruit university students, friends who have Facebook        users to create groups, rate and share their music profiles with
accounts and other users on the Amazon Mechanical Turk                 their friends.
platform. All of them will use their own computers in order to         To understand how users’ perceive our algorithms and current
connect to the GroupFun application. We consider 4 algorithms          interface, we plan to conduct an experiment to compare the 4
implemented for 4 groups of 40 users together with 1 interface.        algorithms in a between-subjects study. As such we will evaluate
Each of the algorithms displays a common list of 10 songs to all       user satisfaction for music group recommendations. Furthermore,
group members. Users are asked to contribute their music to only       to learn more about the perceived ease of use and perceived
one of the groups and fill in an online post-study questionnaire       usefulness of our system we plan to invite more members and
assessing their satisfaction. The final music outcome is shown to      analyze user feedback in terms of design and functionality. We
all group members after they have finished their tasks. Users can      also intend to develop a new version of the algorithm which will
interact with the system in diverse ways such as: upload more or       better match users’ behavior and expectations.
less songs, change their ratings, see and hear to their friends’
playlists, etc.                                                        7. ACKNOWLEDGMENTS
Table III. Evaluation of 4 algorithms using the same interface         This research was supported by EPFL and the Swiss National
                                                                       Science Foundation. We thank Elsa Friscira and Laurentiu
                           PWS        DWS         LM       PS          Dascalu for their important contribution to the development of
Interface/Algorithm
                                                                       GroupFun and GroupFun users for their answers and involvement
                           40         40          40       40          in the evaluation of our algorithms.
Interface
5.2 Measurements                                                       8. REFERENCES
The first 40 users see the results of the probabilistic weighted sum   [1] Hastie, R. and Kameda, T. 2005. The robust beauty of
algorithm, the next those of deterministic weighted sum and so on.         majority rules in group decisions. In Psychological Review,
Since individuals who upload more songs expect to see their song           Vol. 112, no. 2, 494-508.
names more often in the final list they would prefer to know a         [2] Masthoff, J. 2004. Group modeling: Selecting a sequence of
priori the computation rule of the algorithm so that they would            television items to suit a group of viewers. User Modeling
adjust the number of songs they upload. Given users’ known self-           and User-Adapted Interaction, Vol. 14, no. 1, 37-85.
ratings and group ratings computed by the algorithms we expected
our subjects to identify some differences between the 4                [3] Masthoff, J. 2005. The pursuit of satisfaction: affective state
approaches. Some of the questions from the post-study                      in group recommender systems. In Computer Science, Vol.
questionnaire are presented in the table below. They were                  3538, 297-306.
extracted from a well-known user evaluation model, named               [4] McCarthy J.F. and Anagnost, T.D. 1998. MusicFX: an
ResQue [7], that our group has developed.                                  arbiter of group preferences for computer supported
                  Table IV. Evaluation questions                           collaborative workouts. Proc. ACM Computer Supported
                                                                           Cooperative Work, 363-372.
     Measurements                          Questions
                                                                       [5] O’Connor, M., Cosley, D., Konstan, J.A., Riedl, J. 2001.
 Perceived                 The layout of the system's interface is         PolyLens: A recommender system for groups of users. Proc.
 attractiveness            attractive.                                     ACM European Conference on Computer Supported
                                                                           Cooperative Work.
 Perceived satisfaction    The items recommended           to   me
                                                                       [6] Popescu G. and Pu P. 2010. Group Recommender Systems as
                           matched my interest.                            a Voting Problem. EPFL Technical report.
 Perceived helpfulness     I took into account the ratings given by    [7] Pu P. and Chen L. 2010. A User-Centric Evaluation
                           my friends.                                     Framework of Recommender Systems. In the 3rd ACM
                                                                           Conference on Recommender Systems, Barcelona, Spain