The Demographics of Cool
                          Popularity and Recommender Performance for Different Groups of Users

                                                   Michael D. Ekstrand and Maria Soledad Pera
                                                      People and Information Research Team
                                        Dept. of Computer Science, Boise State University, Boise, Idaho, USA
                                                    {michaelekstrand,solepera}@boisestate.edu

ABSTRACT                                                                    encourage the selection of algorithms that perform well on the
Typical recommender evaluations treat users as an homogeneous               largest subgroup’s tastes.
unit. However, user subgroups often differ in their tastes, which              Our central research question is this: what changes about our
can result more broadly in diverse recommender needs. Thus, these           assessment of relative or absolute recommender effectiveness when
groups may have different degrees of satisfaction with the pro-             we consider performance for different subgroups of users– basically
vided recommendations. We explore the offline top-N performance             when we consider all subgroups’ satisfaction to be equally impor-
of collaborative filtering algorithms across two domains. We find           tant? Does popularity bias exacerbate demographic bias effects?
that several strategies achieve higher accuracy for dominant de-            How do popularity bias mitigations affect the demographic bias?
mographic groups, thus increasing the overall performance for the
strategy, without providing increased benefits for other users.             2    INITIAL ANALYSIS
                                                                            We answer these questions with an offline analysis using LensKit [4]
CCS CONCEPTS                                                                1 and two datasets that provide user demographics of some form.

• Information systems → Recommender systems;                                MovieLens-1M 2 [6] contains 1M 5-star ratings of 3,900 movies by
                                                                            6,040 users who joined MovieLens through 2000. Each user has self-
KEYWORDS                                                                    reported age, gender, occupation, and zip code. LastFM contains data
                                                                            of 359,347 users who played 294,015 unique artists. The main record
collaborative filtering, evaluation popularity bias
                                                                            set consists of 17,559,530 tuples of the form ⟨user , artist, playCount⟩.
                                                                            For most users, gender, age, country, and sign-up date are pro-
1     INTRODUCTION                                                          vided. We employed several classical and widely-used collaborative
Recommender system evaluation—offline and online —typically fo-             filtering algorithms: (1) Popular (Pop), recommending the most
cuses on the system’s effectiveness, in aggregate over the entire user      frequently rated or played items; (2) Item-Item (II), an item-based
population. While individual user characteristics are sometimes             collaborative filter using 20 neighbors and cosine similarity; (3)
taken into account, as in demographic-informed recommendation,              User-User (UU), a user-based collaborative filter configured to use
evaluations typically still aggregate over all users [8]. In this work,     30 neighbors and cosine similarity; and (4) FunkSVD (MF), which is
we connect recent work leveraging user demographics to deepen               based on gradient descent matrix factorization technique with 40
understanding of different users’ satisfaction with search engines          latent features and 150 training iterations per feature. Each algo-
[7], with the work of Bellogin et al. [1] measuring recommenders’           rithm is tagged with its variant: ‘-E’ are explicit-feedback recom-
performance for different items to examine recommender system               menders (applicable only to MovieLens); ‘-B’ are implicit-feedback
accuracy for users in different demographic groups in an offline            recommenders that only consider whether an item was rated or
setting. This attention is necessary because, by default, the largest       played, disregarding its rating value or play count; ‘-C’ are implicit-
subgroup of users will dominate overall statistics; if other subgroups      feedback recommenders that consider the number of times an artist
have different needs, their satisfaction will carry less weight in the      was played as repeated implicit feedback (LastFM only). We applied
final analysis. This can result in an incomplete picture of the per-        5-fold cross-validation, using two methods: (1) LensKit’s default
formance of the system and and obscure the need to identify how             strategy and (2) Bellogin’s UAR method [1] for neutralizing popular-
to better serve specific demographic groups. To the well-known              ity bias; this works like the default, except it picks test sets of items
problems of popularity bias [2] and misclassified decoys [3, 5] (a          instead of users. An initial experiment revealed that regardless of
good item recommendation counted as a error given that the user             the metric, i.e., Recall, Mean Reciprocal Rank (MRR), and Mean
has yet to interact with the item in available data), we add a third        Average Precision, the algorithms exhibit similar behavior, thus we
consideration: demographic bias, where the satisfaction (approxi-           report our results using MRR.
mated in offline settings by top-N accuracy) of some demographic                Demographic distribution and its impact on evaluation.
groups is weighted more heavily than others. Demographic bias               Figure 1 shows user gender distribution; with the majority of users
also has a complex expected interaction with popularity bias: the           reporting as male. The age distribution reveals some differences:
most active and numerous users will have a greater impact on popu-          the largest block of MovieLens users belong to the [25-35] group,
larity than other users, so popularity bias in evaluation will further      whereas a plurality of LastFM users belong to the [18-24] group.3
                                                                            1 Code and scripts are available at https://doi.org/10.18122/B2ND8P
RecSys 2017 Poster Proceedings, August 27–31, Como, Italy.
                                                                            2 Later MovieLens dataset do not include demographic information.
© 2017 Copyright held by the owner/author(s).
                                                                            3 For consistency, we binned LastFM users into the same groups used in MovieLens-1M.
RecSys 2017 Poster Proceedings, August 27–31, Como, Italy.                                                                                                                                                                                                        Ekstrand and Pera

                                                                  Age                                               Gender                                                                                   Age                                             Gender
                                                                                                                                                                                                                                                                                          Algorithm
                                                                                                                                                                 0.15
             0.6                                                                                                                                                                                                                                                                             II−B


                                                                                                                                                                                                                                                                              LastFM.UI
Proportion


                                                                                                                                              DataSet            0.10                                                                                                                        II−C
             0.4                                                                                                                                  LastFM                                                                                                                                     II−E
                                                                                                                                                  ML−1M          0.05
                                                                                                                                                                                                                                                                                             Mean−E
             0.2
                                                                                                                                                                                                                                                                                             MF−B


                                                                                                                                                           MRR
                                                                                                                                                                 0.00
             0.0                                                                                                                                                                                                                                                                             MF−C
                                                                                                                                                                 0.04
                                                                                                                                                                                                                                                                                             MF−E
                    1


                               18


                                               25


                                                          35


                                                                        45


                                                                                 50


                                                                                        56


                                                                                             A


                                                                                                            F


                                                                                                                     M


                                                                                                                                 A
                                                                                             N


                                                                                                                                 N


                                                                                                                                                                                                                                                                              ML−1M.UI
                                                                   Demographic Characteristic                                                                    0.03                                                                                                                        Pop−B
                                                                                                                                                                 0.02                                                                                                                        Pop−C

                   Figure 1: User distribution based on age and gender                                                                                           0.01                                                                                                                        UU−B
                                                                                                                                                                                                                                                                                             UU−E
                                                                                                                                                                 0.00


                                                                                                                                                                         l

                                                                                                                                                                               ed


                                                                                                                                                                                         7]


                                                                                                                                                                                                    ]


                                                                                                                                                                                                              ]


                                                                                                                                                                                                                        ]


                                                                                                                                                                                                                                  ]

                                                                                                                                                                                                                                        +]


                                                                                                                                                                                                                                             A


                                                                                                                                                                                                                                                 l

                                                                                                                                                                                                                                                        ed


                                                                                                                                                                                                                                                              F


                                                                                                                                                                                                                                                                      M


                                                                                                                                                                                                                                                                          A
                                                                                                                                                                        Al


                                                                                                                                                                                                                                                 Al
                                                                                                                                                                                                   24


                                                                                                                                                                                                             34


                                                                                                                                                                                                                       44


                                                                                                                                                                                                                                 55


                                                                                                                                                                                                                                             N


                                                                                                                                                                                                                                                                          N
                                                                                                                                                                                     −1


                                                                                                                                                                                                                                         6
                                                                                                                                                                               et


                                                                                                                                                                                                                                                        et
                                                                                                                                                                                               8−


                                                                                                                                                                                                         5−


                                                                                                                                                                                                                   5−


                                                                                                                                                                                                                             5−


                                                                                                                                                                                                                                      [5
                                                                                                                                                                             ck


                                                                                                                                                                                                                                                      ck
                                                                                                                                                                                    [1

                                                                                                                                                                                              [1


                                                                                                                                                                                                        [2


                                                                                                                                                                                                                  [3


                                                                                                                                                                                                                            [4
                                                                                                                                                                         Bu


                                                                                                                                                                                                                                                  Bu
                                                                                                                                                                                                                   Demographic Characteristic

   Standard Results. Figure 2 shows the MRR achieved by each
                                                                                                                                                                                         Figure 3: Results of UAR experiment
algorithm, grouped by demographic group. For each demographic
characteristic, All is the accuracy achieved by averaging across
all users, and Bucketed is the result of first averaging within each                                                                                       3            DISCUSSION AND FUTURE WORK
demographic group, and then averaging the groups’ results (thus                                                                                            Our analysis showed that, unsurprisingly, a number of recommen-
giving each group equal weight, instead of each user). The results                                                                                         dation strategies achieve moderately higher accuracy metric values
across subgroups are broadly similar for both data sets, though the                                                                                        for dominant demographic groups. This can cause an algorithm’s
All analysis tracks most closely with the dominant group. How-                                                                                             performance to increase without delivering benefit to smaller sub-
ever, if a decision is to be made based on “performs best", then the                                                                                       groups of the user population. In other words, the perceived sat-
small differences become non-trivial, as they will affect the final                                                                                        isfaction with a recommender may not be the same for the “cool”
decisions. One example case emerges from our analysis: on LastFM,                                                                                          users—in the dominant group—as it is for those in smaller groups.
II performs better using play counts (“-C”) for some age groups,                                                                                               Demographic bias in accuracy metric results also has a complex
while the “-B” variant is more effective for other age groups.                                                                                             interaction with mitigation strategies for other offline evaluation
   While we cannot conclude, based on this ongoing study, which is                                                                                         ailments such as popularity bias. A uniform item strategy results in
the right decision, our preliminary analysis demonstrates the need                                                                                         disproportionately higher accuracy values for users in some smaller
for further exploration from a demographic perspective.                                                                                                    subgroups. Further work is needed to understand which paradigm
                                                                                                                                                           maps most closely to actual user experience or response.
                                                         Age                                               Gender
                                                                                                                                                               Our findings highlight the need for careful and multi-faceted
             0.3                                                                                                                              Algorithm    consideration of recommender system behavior across a range of
                                                                                                                                                           both users and items. As prior work has found that recommenders
                                                                                                                                                 II−B
                                                                                                                                     LastFM


             0.2                                                                                                                                 II−C


             0.1
                                                                                                                                                 II−E      are not equally good at recommending for all items, we find that
                                                                                                                                                           recommenders are not equally good for all users in predictable
                                                                                                                                                 Mean−E
                                                                                                                                                 MF−B
MRR


             0.0
                                                                                                                                                 MF−C
                                                                                                                                                           and socially-relevant ways. While the full social and business ram-
                                                                                                                                                 MF−E
                                                                                                                                                           ifications of our findings have yet to be explored, we encourage
             0.3
                                                                                                                                                 Pop−B
                                                                                                                                     ML−1M


             0.2

             0.1
                                                                                                                                                 Pop−C
                                                                                                                                                           researchers and practitioners to pay attention to which users receive
                                                                                                                                                 UU−B

             0.0
                                                                                                                                                 UU−E      how much benefit from a particular recommender.
                    l

                          ed


                                      ]


                                                ]


                                                         ]


                                                                   ]


                                                                             ]

                                                                                   +]


                                                                                        A


                                                                                             l

                                                                                                      ed


                                                                                                            F


                                                                                                                    M


                                                                                                                             A
                   Al


                                                                                             Al
                                 17


                                             24


                                                       34


                                                                 44


                                                                           55


                                                                                        N


                                                                                                                         N
                                                                                    6
                          et


                                                                                                    et
                                  −

                                           8−


                                                     5−


                                                               5−


                                                                         5−


                                                                                 [5


                                                                                                                                                           ACKNOWLEDGMENTS
                        ck


                                                                                                  ck
                               [1

                                          [1


                                                    [2


                                                             [3


                                                                        [4
                    Bu


                                                                                                 Bu


                                                               Demographic Characteristic

                                                                                                                                                           We thank Ion Madrazo for helping with analysis, and the People
                                Figure 2: Results of basic run of results                                                                                  and Information Research Team (PIReT) for their support.

                                                                                                                                                           REFERENCES
   Popularity Bias Mitigating Results. We also seek to under-                                                                                              [1] A. Bellogin. Performance prediction and evaluation in Recommender Systems: an
stand how demographic bias interacts with mitigation techniques                                                                                                Information Retrieval perspective. PhD thesis, UAM, 2012.
for other issues, such as popularity bias. To that end, we performed                                                                                       [2] A. Bellogin, P. Castells, and I. Cantador. Precision-oriented evaluation of recom-
                                                                                                                                                               mender systems: an algorithmic comparison. In Proc. ACM RecSys ’11, 2011.
a version of our analysis using Bellogin’s UAR technique [1]. We                                                                                           [3] P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms
see (in Figure 3) that several of the smaller user groups have sub-                                                                                            on top-n recommendation tasks. In ACM RecSys, pages 39–46, 2010.
stantially higher accuracy measures than larger groups, particularly                                                                                       [4] M. D. Ekstrand, M. Ludwig, J. A. Konstan, and J. T. Riedl. Rethinking the recom-
                                                                                                                                                               mender research ecosystem: reproducibility, openness, and lenskit. In Proc. ACM
on age. An analysis using this method would find that the recom-                                                                                               RecSys ’11, 2011.
mender is delivering better recommendations to these groups.                                                                                               [5] M. D. Ekstrand and V. Mahant. Sturgeon and the cool kids: Problems with Top-N
                                                                                                                                                               recommender evaluation. In Proc. FLAIRS 30. AAAI Press, 22 May 2017.
   The differences obtained using UAR or traditional evaluations                                                                                           [6] F. M. Harper and J. A. Konstan. The movielens datasets: History and context.
show that mitigating popularity bias comes with the cost of signifi-                                                                                           Trans. Interact. Intel. Sys., 5(4):19, 2016.
cantly changing the distribution of measured accuracy across user                                                                                          [7] R. Mehrotra, A. Anderson, F. Diaz, A. Sharma, H. Wallach, and E. Yilmaz. Auditing
                                                                                                                                                               search engines for differential satisfaction across demographics. In Proc. WWW
subgroups. (Analysis using 1R [1] did not produce results signifi-                                                                                             ’17 Companion, 2017.
cantly different from Figure 2.) Which evaluation strategy better                                                                                          [8] G. Shani and A. Gunawardana. Evaluating recommendation systems. In Recom-
reflects actual user experience is still up for debate.                                                                                                        mender systems handbook, pages 257–297. Springer, 2011.