Recommender Popularity Controls: An Observational Study
                                                             F. Maxwell Harper
                                                            University of Minnesota
                                                                max@umn.edu
ABSTRACT
We describe an observational study of a recommender system that
provides users with direct control over their personalization. Specif-
ically, we allow users to tune a movie recommender towards more
or less popular content. We report on 14 months of usage, which
includes 6,846 users who visited the interface at least once. We find,
surprisingly, that the popularity of items a user has interacted with
historically is a poor predictor of their use of this interface.

KEYWORDS
recommender systems; popularity; observational study.

1    INTRODUCTION
In prior work [3], we reported on a lab study of how users interact
with unlabelled controls that modify the popularity or recency of a        Figure 1: Screenshot of the popularity tuner interface. When
movie recommender system. Among several results, we found that             a user clicks “more popular” or “less popular”, the list of 20
users who adjusted popularity reported much higher satisfaction            movies under “top picks” is immediately refreshed and his
with their recommendations. While most users tuned their recom-            or her preferences are saved (and reflected site-wide).
mender towards more popular content, there was a wide range of
preferences among users. Therefore, we built a popularity tuner
interface into the live system to develop a better understanding of        N=6846, median logins=11, median ratings=168). Second, for ex-
how this feature would be used in practice, and to collect more data       amining users’ preferred popularity settings, we look at users who
to experiment with modeling users’ preferred settings.                     took at least one action (interacted_sample, N=4349, median lo-
                                                                           gins=14, median ratings=205). These samples are skewed toward
2    OBSERVATIONAL STUDY                                                   power users as compared with all users that logged in during this
Our platform, MovieLens (http://movielens.org), allows users to            period (N=31371, median logins=3, median ratings=39).
choose among multiple recommendation algorithms [1]; we enable                Over the span of this observational study (or even the span of
the popularity interface on the most popular two: item-item col-           a single session), users might change settings multiple times. In
laborative filtering [4] and Funk SVD [2]. Our recommendation              fact, users in the visited_sample opened the tuning interface 24,657
algorithm in each case (described in detail in [3]) is a linear blend of   times (3.6 times per user); users in the interacted_sample visited
predicted star rating with popularity. We use number of ratings in         the page 19,465 times (4.5 times per user). Unless otherwise noted,
the last year as our popularity metric. The default setting for both       user-level analysis examines the user’s final configuration at the
algorithms is a blend of 95% predicted rating with 5% popularity,          time of analysis. For example, a user might try the tuner interface
which we chose based on the results of our lab study.                      three times and explore the system in between each change; our
   The popularity tuner interface is shown in Figure 1. On each            user-level analysis considers only the most recent configuration.
more/less popular click action, we use a binary search to locate new
weights that will replace 4 of the user’s top 20 recommendations.          2.1    Descriptive Statistics
For example, a user clicking “more popular” triggers a binary search       The distribution of per-user activity in online systems is often
that might yield a new blend of 92.1% predicted rating with 7.9%           modeled by a power law distribution [5]; use of the tuner interface is
popularity. We emphasize the changes visually in the interface to          no exception. We find that a few users use the feature heavily, while
help users understand the result of their actions. To encourage use        the vast majority of users pay it little attention — see Table 1 for an
of the feature, we added a “configure” button next to the user’s list      overview. For instance, 82% of users take 5 or fewer more/less/reset
of top recommendations at the top of the home page.                        actions and 78% of sessions are less than a minute in duration. On
   We deployed the feature on April 11, 2016, and collected data           the other hand, power users have explored the feature thoroughly
through June 15, 2017. We consider two groups for analysis. First,         — one user has taken more than 5,500 actions.
for examining overall use of the popularity tuner, we look at all             Within the interacted_sample, the most common popularity
users who visited the interface one or more times (visited_sample,         setting is the default one (1,390 users, 32.0% of sample). That is,
RecSys 2017 Poster Proceedings, August 27-31, Como, Italy                  after exploring one or more new settings, many users return to
                                                                           the default, either by clicking the “reset” button, or by sending the
RecSys 2017 Poster Proceedings, August 27-31, Como, Italy                                                                                                                       F. Maxwell Harper

  percentile       # actions # sessions session length (sec.)                                       0.30

  25th                     0           1                    3                                       0.25


                                                                        popularity blending value
                                                                                                    0.20
  50th (median)            1           2                    5
                                                                                                    0.15
  80th                     5           5                   67
                                                                                                    0.10
  95th                    14          10                  251
                                                                                                    0.05
  99th                    35          20                 1428                                       0.00
Table 1: Descriptive statistics of activity levels by percentile                                    0.05
in the observational study across users in the visited_sample.                                      0.10
                                                                                                        0.80   0.85   0.90   0.95   1.00 0.80   0.85   0.90   0.95   1.00   0    200 400 600 800 1000
# actions and # sessions are user-level variables, while ses-                                                  mean pp1y, rated                 mean pp1y, clicked                   # ratings
sion length is a session-level variable with possibly multiple
observations per user.                                                  Figure 3: Scatterplots with regression lines showing the rela-
                                                                        tionship between user-level variables and popularity blend-
                                                                        ing values.
                  1400
number of users


                  1200
                  1000
                   800                                                     Our regression model has several statistically significant vari-
                   600
                   400                                                  ables, but overall, it is a poor fit with users’ popularity blending
                   200                                                  choices (adjusted R-squared=0.04); see Table 2 for a summary and
                     0
                         -5         0            5               10     Figure 3 for a visualization of several variables. The model indicates
                                                                        that the popularity of a user’s rated and clicked-on movies is posi-
Figure 2: Histogram of users’ final configurations, repre-              tively correlated with the user’s final configuration value. Also, the
sented as number of clicks from the default (0). E.g., an value         model shows a relationship between a user’s choice of prediction
of two means two net clicks on “more popular”.                          algorithm and their chosen popularity blending weight (means:
                                                                        item-item=0.09, funkSVD=0.15). The model shows that number of
                                                                        ratings is positively correlated with the popularity setting, but with
             variable                   estimate                        a tiny effect size; number of logins and popularity of wishlisted
             intercept                 -1.160000 ***                    content are not statistically significant.
             # logins                   0.000012
             # ratings                  0.000062 ***                    3                             CONCLUSION
             predictor==SVD             0.068710 ***                    Thousands of users have tried the popularity tuner feature to con-
             mean pp1y, rated           0.818100 **                     figure their movie recommender. They have used the feature in very
             mean pp1y, wishlisted 0.064480                             different ways, and it proves difficult to predict their choices based
             mean pp1y, clicked         0.401600 ***                    on behavioral data. The poor overall fit of our regression model
Table 2: Summary of our regression model to predict users’              underscores the potential helpfulness of giving control to users.
choices, R 2 = 0.04. Statistically significant p-values indicated       While it is possible that our model is too preliminary, or that our in-
as *** (p < 0.001) or ** (p < 0.01).                                    terface does not sufficiently elicit users’ ideal preferences, it is also
                                                                        possible that users have fickle preferences for their recommenders
                                                                        that are difficult to get right through modeling alone.

same number of “more popular” and “less popular” actions. More          REFERENCES
users (1,714, 39.4%) end with a setting more popular than default, as         [1] Michael D. Ekstrand, Daniel Kluver, F. Maxwell Harper, and Joseph A. Konstan.
                                                                                  2015. Letting Users Choose Recommender Algorithms: An Experimental Study. In
compared with the number that end less popular than default (1,245,               Proceedings of the 9th ACM Conference on Recommender Systems (RecSys ’15). ACM,
28.6%). See Figure 2 for a visualization of this data, represented as             New York, NY, USA, 11–18. DOI:http://dx.doi.org/10.1145/2792838.2800195
number of steps from the default.                                             [2] Simon Funk. 2006. Netflix update: Try this at home. (Dec. 2006). http://sifter.org/
                                                                                  ~simon/journal/20061211.html
                                                                              [3] F. Maxwell Harper, Funing Xu, Harmanpreet Kaur, Kyle Condiff, Shuo Chang,
2.2                  Analysis: Predicting Settings                                and Loren Terveen. 2015. Putting Users in Control of Their Recommendations.
                                                                                  In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys ’15).
It is possible that we can predict users’ preferred popularity set-               ACM, New York, NY, USA, 3–10. DOI:http://dx.doi.org/10.1145/2792838.2800179
tings from their behavioral data. To explore this idea, we construct          [4] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based
                                                                                  Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th
regression models using popularity-related predictor variables. In                International Conference on World Wide Web (WWW ’01). ACM, New York, NY,
particular, we include a metric — “one-year popularity percentile”                USA, 285–295. DOI:http://dx.doi.org/10.1145/371920.372071
(pp1y) — that exactly mirrors the variable used in the popular-               [5] Dennis M. Wilkinson. 2008. Strong Regularities in Online Peer Production. In
                                                                                  Proceedings of the 9th ACM Conference on Electronic Commerce (EC ’08). ACM,
ity blending algorithm. We measure pp1y of rated, wishlisted, and                 New York, NY, USA, 302–309. DOI:http://dx.doi.org/10.1145/1386790.1386837
clicked items to capture a user’s propensity to interact with more or
less popular content. In addition, we use variables that capture the
user’s number of logins, number of ratings, and chosen prediction
algorithm.