User Segmentation for Controlling Recommendation Diversity Farzad Eskandanian Bamshad Mobasher Robin Burke Center for Web Intelligence, Center for Web Intelligence, Center for Web Intelligence, DePaul University DePaul University DePaul University Chicago, IL 60604 Chicago, IL 60604 Chicago, IL 60604 feskanda@depaul.edu mobasher@cs.depaul.edu rburke@cs.depaul.edu ABSTRACT The quality of recommendations is known to be affected by diversity and novelty in addition to accuracy. Recent work has focused on methods that increase diversity of recommen- dation lists. However, these methods assume the user pref- erence for diversity is constant across all users. In this pa- per, we show that users’ propensity towards diversity varies greatly and argue that the diversity of recommendation lists should be consistent with the level of user interest in di- verse recommendations. We introduce a user segmentation approach in order to personalize recommendation according to user preference for diversity. We show that recommen- dations generated using these segments match the diversity preferences of users in each segment. We also discuss the impact of this segmentation on the novelty of recommenda- Figure 1: ILD Distribution of User Profiles. tions. using any one of a variety of standard recommendation tech- Keywords niques. We show that such recommendations have a level of diversity that matches the interest of the segment’s users. Recommendation diversity, Performance evaluation metrics, Novelty, Collaborative Filtering 2. DEFINITIONS Let U and I be the sets of users and items, respectively. 1. INTRODUCTION The lists of recommendations is denoted as R. Ru is the Although there are many methods in the literature that can recommendation items for user u ∈ U and user profile Iu is be used to increase diversity in recommendations [1], only the list of items that u has rated. Diversity is the measure a few have mentioned the varying degrees of interest users of dissimilarity between items in a set. For this purpose, we have for diverse recommendation results [2]. One can imag- use average pairwise distance of items in a set as Intra-List ine two extreme cases of this interest: one user likes to re- Distance (ILD) [4]. ceive as recommendations only science fiction movies made within the last 10 years; another user likes a more diverse 1 XX set of movies from many genres in her recommendation list. ILD(L) = d(i, j) (1) |L|(|L| − 1) i∈L j∈L Obviously, any attempt to increase the diversity of recom- mendation list is likely to generate poor results for the first In addition to diversity, we can measure the impact of user user with limited interests. segmentation on the novelty or catalog coverage of recom- We measure a user’s preference for diversity as a func- mendation lists. We define novelty as the average distance tion of the diversity of items that the user has rated, and from the items in user profile to the items in recommenda- segment the users into groups based on their scores. Recom- tions. mendations for each group can be generated independently 1 X X N ov(Iu , Ru ) = d(i, j) |Ru ||Iu | − min(|Ru |, |Iu |) i∈R j∈I u u Permission to make digital or hard copies of part or all of this work for personal or (2) classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation We also consider the popularity of items in the recommen- on the first page. Copyrights for third-party components of this work must be honored. dation lists. Popularity of an item i is defined by For all other uses, contact the owner/author(s). |Ui | P op(i) = c 2016 Copyright held by the owner/author(s). RecSys 2016 Poster Preceedings, maxj∈I (|Uj |) September 15–19, 2016, Boston, MA, USA 978-1-4503-2138-9. where Ui is the set of users who have rated item i. 3. EXPERIMENTS AND DISCUSSIONS We used MovieLens 1M1 data set for analysis and evalua- tion of the proposed method. We create a term vector for each movie using the genre information in the dataset and measure the distance between movies, d(i, i), as the cosine of two genre vectors. After the ILD value for each user has been computed, the next step is to define intervals for seg- Table 1: User Segments menting the user profiles. Figure 1 shows the distribution of ILD values across the MovieLens user profiles. The figure shows that there are a relatively small number of users with low ILD values, rising to a peak around 0.74 and falling off rapidly thereafter. We divided the range of ILD values into four segments, shown graphically in Figure 1 and also in Table 1. The figure shows the boundaries of each segment and the mean ILD, µsk , for k = 1, 2, 3, 4. Note that segment 3 is larger than the others, which reflects the large number of users with this range of diversity in their profiles. We generated our re using three recommendation mod- els (two neighborhood based models and one using matrix factorization, BPRMF (Bayesian Personalized Ranking with Matrix Factorization) [3]) using the whole dataset, as well as using each segment separately. Table 2 shows the results of these experiments in terms of precision and recall, diversity, novelty and popularity. We expected to find that diversity would be increased when for Table 2: Recommendation Results segments with higher preference for diversity and that effect is clearly present in ILD values for all the recommendation This work examines the consequences of segmenting user algorithms. As we move from segments with low diversity to populations by diversity, as a means of personalizing user those with higher diversity, the ILD values of the resulting interest in and tolerance for diversity. We show that interest recommendations are monotonically increasing. in diversity varies widely across users, with a distinct peak We expected to find that popularity is monotonically de- and users with preferences both low and high. creasing. That is, the segments containing users with di- Our division of the user population into four segments verse profiles would produce recommendations outside of the is a simple but effective method for increasing diversity for “short head” of highly popular items and the more diverse those segments of the population interested in such diversity the users, the more obscure the recommendations. This ef- and decreasing it for those with less interest. The expected fect is not seen. Instead, popularity increases between seg- effects on diversity and novelty are seen across three different ments 1 and 2 and decreases afterwards. One explanation recommendation algorithms. for this phenomenon is that segment 1 users are actually We plan to explore these effects in future work in addi- niche users with a strong interest in a single movie genre tional datasets and algorithms, as well as alternate methods and as a result, their profiles do not contain many of the for personalizing diversity. typical “blockbuster” films. Outside of segment 1, the ex- pected effect is seen across all remaining segments. We will explore this phenomenon further in future work. 5. REFERENCES A trade-off between precision and recall is observed in [1] G. Adomavicius and Y. Kwon. Improving aggregate Table 2. As ILDSi increases, P recisionSi increases and recommendation diversity using ranking-based RecallSi decreases. Increase in ILDu suggests that a user u techniques. IEEE Transactions on Knowledge and Data is interested in movies from a variety of genres. The num- Engineering, 24(5):896–911, 2012. ber of hits (items in the recommendation list) for this user [2] K. Kapoor, V. Kumar, L. Terveen, J. A. Konstan, and also increases because more movies are considered relevant P. Schrater. I like to explore sometimes: Adapting to recommendations. However, there are more movies in the dynamic user novelty preferences. In Proceedings of the catalog that can match the user’s interests so achieving good 9th ACM Conference on Recommender Systems, pages recall of just those items that the user rated is more difficult. 19–26. ACM, 2015. Table 2 also shows that the novelty of recommended items [3] S. Rendle, C. Freudenthaler, Z. Gantner, and increase along with the increase in diversity. So, segmenta- L. Schmidt-Thieme. Bpr: Bayesian personalized tion based on diversity, not only preserves the user’s propen- ranking from implicit feedback. In Proceedings of the sity towards diverse recommendations, but also results in a twenty-fifth conference on uncertainty in artificial corresponding change in the the level of recommendation intelligence, pages 452–461. AUAI Press, 2009. novelty. [4] C.-N. Ziegler, S. M. McNee, J. A. Konstan, and G. Lausen. Improving recommendation lists through 4. CONCLUSIONS topic diversification. In Proceedings of the 14th international conference on World Wide Web, pages 1 http://grouplens.org/datasets/movielens/ 22–32. ACM, 2005.