=Paper= {{Paper |id=Vol-1905/recsys2017_poster15 |storemode=property |title=Recommender Popularity Controls: An Observational Study |pdfUrl=https://ceur-ws.org/Vol-1905/recsys2017_poster15.pdf |volume=Vol-1905 |authors=F. Maxwell Harper |dblpUrl=https://dblp.org/rec/conf/recsys/Harper17 }} ==Recommender Popularity Controls: An Observational Study== https://ceur-ws.org/Vol-1905/recsys2017_poster15.pdf

Recommender Popularity Controls: An Observational Study
F. Maxwell Harper
University of Minnesota
max@umn.edu
ABSTRACT
We describe an observational study of a recommender system that
provides users with direct control over their personalization. Specif-
ically, we allow users to tune a movie recommender towards more
or less popular content. We report on 14 months of usage, which
includes 6,846 users who visited the interface at least once. We find,
surprisingly, that the popularity of items a user has interacted with
historically is a poor predictor of their use of this interface.

KEYWORDS
recommender systems; popularity; observational study.

1 INTRODUCTION
In prior work [3], we reported on a lab study of how users interact
with unlabelled controls that modify the popularity or recency of a Figure 1: Screenshot of the popularity tuner interface. When
movie recommender system. Among several results, we found that a user clicks “more popular” or “less popular”, the list of 20
users who adjusted popularity reported much higher satisfaction movies under “top picks” is immediately refreshed and his
with their recommendations. While most users tuned their recom- or her preferences are saved (and reflected site-wide).
mender towards more popular content, there was a wide range of
preferences among users. Therefore, we built a popularity tuner
interface into the live system to develop a better understanding of N=6846, median logins=11, median ratings=168). Second, for ex-
how this feature would be used in practice, and to collect more data amining users’ preferred popularity settings, we look at users who
to experiment with modeling users’ preferred settings. took at least one action (interacted_sample, N=4349, median lo-
gins=14, median ratings=205). These samples are skewed toward
2 OBSERVATIONAL STUDY power users as compared with all users that logged in during this
Our platform, MovieLens (http://movielens.org), allows users to period (N=31371, median logins=3, median ratings=39).
choose among multiple recommendation algorithms [1]; we enable Over the span of this observational study (or even the span of
the popularity interface on the most popular two: item-item col- a single session), users might change settings multiple times. In
laborative filtering [4] and Funk SVD [2]. Our recommendation fact, users in the visited_sample opened the tuning interface 24,657
algorithm in each case (described in detail in [3]) is a linear blend of times (3.6 times per user); users in the interacted_sample visited
predicted star rating with popularity. We use number of ratings in the page 19,465 times (4.5 times per user). Unless otherwise noted,
the last year as our popularity metric. The default setting for both user-level analysis examines the user’s final configuration at the
algorithms is a blend of 95% predicted rating with 5% popularity, time of analysis. For example, a user might try the tuner interface
which we chose based on the results of our lab study. three times and explore the system in between each change; our
The popularity tuner interface is shown in Figure 1. On each user-level analysis considers only the most recent configuration.
more/less popular click action, we use a binary search to locate new
weights that will replace 4 of the user’s top 20 recommendations. 2.1 Descriptive Statistics
For example, a user clicking “more popular” triggers a binary search The distribution of per-user activity in online systems is often
that might yield a new blend of 92.1% predicted rating with 7.9% modeled by a power law distribution [5]; use of the tuner interface is
popularity. We emphasize the changes visually in the interface to no exception. We find that a few users use the feature heavily, while
help users understand the result of their actions. To encourage use the vast majority of users pay it little attention — see Table 1 for an
of the feature, we added a “configure” button next to the user’s list overview. For instance, 82% of users take 5 or fewer more/less/reset
of top recommendations at the top of the home page. actions and 78% of sessions are less than a minute in duration. On
We deployed the feature on April 11, 2016, and collected data the other hand, power users have explored the feature thoroughly
through June 15, 2017. We consider two groups for analysis. First, — one user has taken more than 5,500 actions.
for examining overall use of the popularity tuner, we look at all Within the interacted_sample, the most common popularity
users who visited the interface one or more times (visited_sample, setting is the default one (1,390 users, 32.0% of sample). That is,
RecSys 2017 Poster Proceedings, August 27-31, Como, Italy after exploring one or more new settings, many users return to
the default, either by clicking the “reset” button, or by sending the
RecSys 2017 Poster Proceedings, August 27-31, Como, Italy F. Maxwell Harper

percentile # actions # sessions session length (sec.) 0.30

25th 0 1 3 0.25

popularity blending value
0.20
50th (median) 1 2 5
0.15
80th 5 5 67
0.10
95th 14 10 251
0.05
99th 35 20 1428 0.00
Table 1: Descriptive statistics of activity levels by percentile 0.05
in the observational study across users in the visited_sample. 0.10
0.80 0.85 0.90 0.95 1.00 0.80 0.85 0.90 0.95 1.00 0 200 400 600 800 1000
# actions and # sessions are user-level variables, while ses- mean pp1y, rated mean pp1y, clicked # ratings
sion length is a session-level variable with possibly multiple
observations per user. Figure 3: Scatterplots with regression lines showing the rela-
tionship between user-level variables and popularity blend-
ing values.
1400
number of users

1200
1000
800 Our regression model has several statistically significant vari-
600
400 ables, but overall, it is a poor fit with users’ popularity blending
200 choices (adjusted R-squared=0.04); see Table 2 for a summary and
0
-5 0 5 10 Figure 3 for a visualization of several variables. The model indicates
that the popularity of a user’s rated and clicked-on movies is posi-
Figure 2: Histogram of users’ final configurations, repre- tively correlated with the user’s final configuration value. Also, the
sented as number of clicks from the default (0). E.g., an value model shows a relationship between a user’s choice of prediction
of two means two net clicks on “more popular”. algorithm and their chosen popularity blending weight (means:
item-item=0.09, funkSVD=0.15). The model shows that number of
ratings is positively correlated with the popularity setting, but with
variable estimate a tiny effect size; number of logins and popularity of wishlisted
intercept -1.160000 *** content are not statistically significant.
# logins 0.000012
# ratings 0.000062 *** 3 CONCLUSION
predictor==SVD 0.068710 *** Thousands of users have tried the popularity tuner feature to con-
mean pp1y, rated 0.818100 ** figure their movie recommender. They have used the feature in very
mean pp1y, wishlisted 0.064480 different ways, and it proves difficult to predict their choices based
mean pp1y, clicked 0.401600 *** on behavioral data. The poor overall fit of our regression model
Table 2: Summary of our regression model to predict users’ underscores the potential helpfulness of giving control to users.
choices, R 2 = 0.04. Statistically significant p-values indicated While it is possible that our model is too preliminary, or that our in-
as *** (p < 0.001) or ** (p < 0.01). terface does not sufficiently elicit users’ ideal preferences, it is also
possible that users have fickle preferences for their recommenders
that are difficult to get right through modeling alone.

same number of “more popular” and “less popular” actions. More REFERENCES
users (1,714, 39.4%) end with a setting more popular than default, as [1] Michael D. Ekstrand, Daniel Kluver, F. Maxwell Harper, and Joseph A. Konstan.
2015. Letting Users Choose Recommender Algorithms: An Experimental Study. In
compared with the number that end less popular than default (1,245, Proceedings of the 9th ACM Conference on Recommender Systems (RecSys ’15). ACM,
28.6%). See Figure 2 for a visualization of this data, represented as New York, NY, USA, 11–18. DOI:http://dx.doi.org/10.1145/2792838.2800195
number of steps from the default. [2] Simon Funk. 2006. Netflix update: Try this at home. (Dec. 2006). http://sifter.org/
~simon/journal/20061211.html
[3] F. Maxwell Harper, Funing Xu, Harmanpreet Kaur, Kyle Condiff, Shuo Chang,
2.2 Analysis: Predicting Settings and Loren Terveen. 2015. Putting Users in Control of Their Recommendations.
In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys ’15).
It is possible that we can predict users’ preferred popularity set- ACM, New York, NY, USA, 3–10. DOI:http://dx.doi.org/10.1145/2792838.2800179
tings from their behavioral data. To explore this idea, we construct [4] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based
Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th
regression models using popularity-related predictor variables. In International Conference on World Wide Web (WWW ’01). ACM, New York, NY,
particular, we include a metric — “one-year popularity percentile” USA, 285–295. DOI:http://dx.doi.org/10.1145/371920.372071
(pp1y) — that exactly mirrors the variable used in the popular- [5] Dennis M. Wilkinson. 2008. Strong Regularities in Online Peer Production. In
Proceedings of the 9th ACM Conference on Electronic Commerce (EC ’08). ACM,
ity blending algorithm. We measure pp1y of rated, wishlisted, and New York, NY, USA, 302–309. DOI:http://dx.doi.org/10.1145/1386790.1386837
clicked items to capture a user’s propensity to interact with more or
less popular content. In addition, we use variables that capture the
user’s number of logins, number of ratings, and chosen prediction
algorithm.