Recommender Popularity Controls: An Observational Study F. Maxwell Harper University of Minnesota max@umn.edu ABSTRACT We describe an observational study of a recommender system that provides users with direct control over their personalization. Specif- ically, we allow users to tune a movie recommender towards more or less popular content. We report on 14 months of usage, which includes 6,846 users who visited the interface at least once. We find, surprisingly, that the popularity of items a user has interacted with historically is a poor predictor of their use of this interface. KEYWORDS recommender systems; popularity; observational study. 1 INTRODUCTION In prior work [3], we reported on a lab study of how users interact with unlabelled controls that modify the popularity or recency of a Figure 1: Screenshot of the popularity tuner interface. When movie recommender system. Among several results, we found that a user clicks “more popular” or “less popular”, the list of 20 users who adjusted popularity reported much higher satisfaction movies under “top picks” is immediately refreshed and his with their recommendations. While most users tuned their recom- or her preferences are saved (and reflected site-wide). mender towards more popular content, there was a wide range of preferences among users. Therefore, we built a popularity tuner interface into the live system to develop a better understanding of N=6846, median logins=11, median ratings=168). Second, for ex- how this feature would be used in practice, and to collect more data amining users’ preferred popularity settings, we look at users who to experiment with modeling users’ preferred settings. took at least one action (interacted_sample, N=4349, median lo- gins=14, median ratings=205). These samples are skewed toward 2 OBSERVATIONAL STUDY power users as compared with all users that logged in during this Our platform, MovieLens (http://movielens.org), allows users to period (N=31371, median logins=3, median ratings=39). choose among multiple recommendation algorithms [1]; we enable Over the span of this observational study (or even the span of the popularity interface on the most popular two: item-item col- a single session), users might change settings multiple times. In laborative filtering [4] and Funk SVD [2]. Our recommendation fact, users in the visited_sample opened the tuning interface 24,657 algorithm in each case (described in detail in [3]) is a linear blend of times (3.6 times per user); users in the interacted_sample visited predicted star rating with popularity. We use number of ratings in the page 19,465 times (4.5 times per user). Unless otherwise noted, the last year as our popularity metric. The default setting for both user-level analysis examines the user’s final configuration at the algorithms is a blend of 95% predicted rating with 5% popularity, time of analysis. For example, a user might try the tuner interface which we chose based on the results of our lab study. three times and explore the system in between each change; our The popularity tuner interface is shown in Figure 1. On each user-level analysis considers only the most recent configuration. more/less popular click action, we use a binary search to locate new weights that will replace 4 of the user’s top 20 recommendations. 2.1 Descriptive Statistics For example, a user clicking “more popular” triggers a binary search The distribution of per-user activity in online systems is often that might yield a new blend of 92.1% predicted rating with 7.9% modeled by a power law distribution [5]; use of the tuner interface is popularity. We emphasize the changes visually in the interface to no exception. We find that a few users use the feature heavily, while help users understand the result of their actions. To encourage use the vast majority of users pay it little attention — see Table 1 for an of the feature, we added a “configure” button next to the user’s list overview. For instance, 82% of users take 5 or fewer more/less/reset of top recommendations at the top of the home page. actions and 78% of sessions are less than a minute in duration. On We deployed the feature on April 11, 2016, and collected data the other hand, power users have explored the feature thoroughly through June 15, 2017. We consider two groups for analysis. First, — one user has taken more than 5,500 actions. for examining overall use of the popularity tuner, we look at all Within the interacted_sample, the most common popularity users who visited the interface one or more times (visited_sample, setting is the default one (1,390 users, 32.0% of sample). That is, RecSys 2017 Poster Proceedings, August 27-31, Como, Italy after exploring one or more new settings, many users return to the default, either by clicking the “reset” button, or by sending the RecSys 2017 Poster Proceedings, August 27-31, Como, Italy F. Maxwell Harper percentile # actions # sessions session length (sec.) 0.30 25th 0 1 3 0.25 popularity blending value 0.20 50th (median) 1 2 5 0.15 80th 5 5 67 0.10 95th 14 10 251 0.05 99th 35 20 1428 0.00 Table 1: Descriptive statistics of activity levels by percentile 0.05 in the observational study across users in the visited_sample. 0.10 0.80 0.85 0.90 0.95 1.00 0.80 0.85 0.90 0.95 1.00 0 200 400 600 800 1000 # actions and # sessions are user-level variables, while ses- mean pp1y, rated mean pp1y, clicked # ratings sion length is a session-level variable with possibly multiple observations per user. Figure 3: Scatterplots with regression lines showing the rela- tionship between user-level variables and popularity blend- ing values. 1400 number of users 1200 1000 800 Our regression model has several statistically significant vari- 600 400 ables, but overall, it is a poor fit with users’ popularity blending 200 choices (adjusted R-squared=0.04); see Table 2 for a summary and 0 -5 0 5 10 Figure 3 for a visualization of several variables. The model indicates that the popularity of a user’s rated and clicked-on movies is posi- Figure 2: Histogram of users’ final configurations, repre- tively correlated with the user’s final configuration value. Also, the sented as number of clicks from the default (0). E.g., an value model shows a relationship between a user’s choice of prediction of two means two net clicks on “more popular”. algorithm and their chosen popularity blending weight (means: item-item=0.09, funkSVD=0.15). The model shows that number of ratings is positively correlated with the popularity setting, but with variable estimate a tiny effect size; number of logins and popularity of wishlisted intercept -1.160000 *** content are not statistically significant. # logins 0.000012 # ratings 0.000062 *** 3 CONCLUSION predictor==SVD 0.068710 *** Thousands of users have tried the popularity tuner feature to con- mean pp1y, rated 0.818100 ** figure their movie recommender. They have used the feature in very mean pp1y, wishlisted 0.064480 different ways, and it proves difficult to predict their choices based mean pp1y, clicked 0.401600 *** on behavioral data. The poor overall fit of our regression model Table 2: Summary of our regression model to predict users’ underscores the potential helpfulness of giving control to users. choices, R 2 = 0.04. Statistically significant p-values indicated While it is possible that our model is too preliminary, or that our in- as *** (p < 0.001) or ** (p < 0.01). terface does not sufficiently elicit users’ ideal preferences, it is also possible that users have fickle preferences for their recommenders that are difficult to get right through modeling alone. same number of “more popular” and “less popular” actions. More REFERENCES users (1,714, 39.4%) end with a setting more popular than default, as [1] Michael D. Ekstrand, Daniel Kluver, F. Maxwell Harper, and Joseph A. Konstan. 2015. Letting Users Choose Recommender Algorithms: An Experimental Study. In compared with the number that end less popular than default (1,245, Proceedings of the 9th ACM Conference on Recommender Systems (RecSys ’15). ACM, 28.6%). See Figure 2 for a visualization of this data, represented as New York, NY, USA, 11–18. DOI:http://dx.doi.org/10.1145/2792838.2800195 number of steps from the default. [2] Simon Funk. 2006. Netflix update: Try this at home. (Dec. 2006). http://sifter.org/ ~simon/journal/20061211.html [3] F. Maxwell Harper, Funing Xu, Harmanpreet Kaur, Kyle Condiff, Shuo Chang, 2.2 Analysis: Predicting Settings and Loren Terveen. 2015. Putting Users in Control of Their Recommendations. In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys ’15). It is possible that we can predict users’ preferred popularity set- ACM, New York, NY, USA, 3–10. DOI:http://dx.doi.org/10.1145/2792838.2800179 tings from their behavioral data. To explore this idea, we construct [4] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th regression models using popularity-related predictor variables. In International Conference on World Wide Web (WWW ’01). ACM, New York, NY, particular, we include a metric — “one-year popularity percentile” USA, 285–295. DOI:http://dx.doi.org/10.1145/371920.372071 (pp1y) — that exactly mirrors the variable used in the popular- [5] Dennis M. Wilkinson. 2008. Strong Regularities in Online Peer Production. In Proceedings of the 9th ACM Conference on Electronic Commerce (EC ’08). ACM, ity blending algorithm. We measure pp1y of rated, wishlisted, and New York, NY, USA, 302–309. DOI:http://dx.doi.org/10.1145/1386790.1386837 clicked items to capture a user’s propensity to interact with more or less popular content. In addition, we use variables that capture the user’s number of logins, number of ratings, and chosen prediction algorithm.