Correcting Popularity Bias by Enhancing Recommendation Neutrality Toshihiro Kamishima, Shotaro Akaho, Jun Sakuma and Hideki Asoh University of Tsukuba National Institute of Advanced Industrial Science 1-1-1 Tennodai, Tsukuba, 305-8577 Japan and Technology (AIST) jun@cs.tsukuba.ac.jp AIST Tsukuba Central 2, Umezono 1-1-1, Tsukuba, Ibaraki, 305-8568 Japan mail@kamishima.net, s.akaho@aist.go.jp, h.asoh@aist.go.jp ABSTRACT INRS to avoid a well-known popularity bias, which is the In this paper, we attempt to correct a popularity bias, which tendency for popular items to be recommended more fre- is the tendency for popular items to be recommended more quently [1]. When users have no interest in the popularity frequently, by enhancing recommendation neutrality. Rec- of items and wish to ignore this information, they can obtain ommendation neutrality involves excluding specified infor- recommendations that are neutral with respect to the popu- mation from the prediction process of recommendation. This larity of items by specifying the volume of their consumption neutrality was formalized as the statistical independence be- as a viewpoint. tween a recommendation result and the specified informa- The popularity bias has previously been corrected by di- tion, and we developed a recommendation algorithm that versifying recommended items [5]. Specifically, instead of satisfies this independence constraint. We correct the popu- the most popular and preferred items, slightly less preferred larity bias by enhancing neutrality with respect to informa- and diverse kind of items are recommended. This diversifi- tion regarding whether candidate items are popular or not. cation approach is different from our approach of enhancing We empirically show that a popularity bias in the predicted recommendation neutrality. While diversity is a property of preference scores can be corrected. a set of recommendations, neutrality is a relation between recommendations and a specified viewpoint. Many notions of diversity have been proposed, but all of them target a set Keywords of recommendations; thus, it is impossible to correct a bias recommender system, neutrality, fairness, popularity bias, with a single recommendation. On the other hand, a single probabilistic matrix factorization, information theory recommendation can be neutral in its prediction of ratings with respect to a specified viewpoint. This is useful, for ex- ample, when attaching a list of items with predicted ratings 1. RECOMMENDATION NEUTRALITY that match a user’s query. Therefore, our INRS can be used AND POPULARITY BIAS for correcting the popularity bias in each predicted score. We proposed the notion of recommendation neutrality with respect to a specified viewpoint if no information about the viewpoint is exploited when generating the recommendation 2. EXPERIMENTS results [3]. If we use terms of information theory, this notion can be formalized as the condition that the mutual informa- We applied our INRS, mean-match [4], to show that our tion between a recommendation result and a viewpoint is approach is effective in correcting a popularity bias. Sim- zero, and it further implies statistical independence between ply speaking, this algorithm is a variant of the probabilistic them. We developed information-neutral recommender sys- matrix factorization model [6] that adopts a constraint term tems (INRS) that predict users’ preference scores while sat- for enhancing neutrality. isfying the constraint of statistical independence [3, 4]. This We evaluated our experimental results in terms of predic- INRS could be useful for the avoidance of biased recommen- tion errors and degree of neutrality. Prediction errors were dation, fair treatment of content providers, or adherence to measured by the mean absolute error (MAE). This index laws and regulations. In this paper, we use the proposed was defined as the mean of the absolute difference between the observed rating values and predicted rating values. A smaller value of this index indicates better prediction accu- racy. To measure the degree of neutrality, we adopted nor- malized mutual information (NMI) [4]. The NMI is defined as mutual information between the predicted ratings and viewpoint values, normalized into the range [0, 1]. A smaller NMI indicates a higher level of neutrality. Note that the dis- tribution of scores is modeled by a multinomial distribution Copyright is held by the author/owner(s). RecSys 2014 Poster Proceedings, October 6–10, 2014, Foster City, Silicon after discretizing prediction scores. We performed a five-fold Valley, USA. cross-validation procedure to obtain evaluation indices. 0.01 0.70 0.005 MAE NMI 0.65 0.002 0.60 0.001 dislike like dislike like 0.01 0.1 1 10 100 0.01 0.1 1 10 100 η η (a) Prediction error (MAE) (b) Degree of neutrality (NMI) (a) standard (b) neutrality enhanced Figure 1: Changes in the accuracy and degree of neutrality Figure 2: Distribution of the predicted ratings for short-head accompanying an increase in the neutrality parameter and long-tail items highly rated. After correcting the popularity bias (η = 100) The data set was the Flixster data set1 [2]. The total as in Figure 2(b), the distributions of ratings for short-head numbers of users and movies were 147,612 and 48,794, re- and long-tail items become much closer; that is to say, the spectively, and the data set consisted of 8,196,077 ratings. predicted ratings are less influenced by items’ popularity. It Ratings are represented by a ten-point-scale whose domain follows from this figure that our INRS successfully corrected is 0.5 to 5.0 in 0.5 increments. To correct a popularity bias, a popularity bias. we adopted the popularity of items as a viewpoint. Candi- date movies were first sorted by the number of users who rated the movie in a descending order, and a viewpoint rep- 3. CONCLUSIONS resented whether or not a movie was in the top 1% of this We corrected a popularity bias by enhancing recommen- list. We called the group of top 1% items the short-head dation neutrality and empirically showed the effectiveness items, and the group containing the rest the long-tail items. of our approach. We plan to improve the efficiency of our Figure 1(a) shows the change of prediction errors mea- information-neutral recommendation algorithm and to adopt sured by the MAE in a linear scale. Figure 1(b) shows a more sophisticated model for expressing popularity. the change in NMI in a logarithmic scale. The X-axes of these figures represent the values of a neutrality parameter, 4. ACKNOWLEDGMENTS η, which balances the prediction of accuracy and neutral- We would like to thank for providing a data set ity. These parameters were changed from 0.01, at which the for Dr. Mohsen Jamali. This work is supported neutrality term was almost completely ignored, to 100, at by MEXT/JSPS KAKENHI Grant Number 16700157, which neutrality was strongly enhanced. 21500154, 24500194, and 25540094. We first compared these with two baseline results. The MAE was 0.871 when the rating being offered was held con- stant at 3.61, which is the mean rating over all sample rat- 5. REFERENCES ings in the training data. This approximately simulated the [1] Ò. Celma and P. Cano. From hits to niches?: or how case of randomly recommending items, and can be consid- popular artists can bias music recommendation and ered the most unbiased and neutral recommendation. How- discovery. In Proc. of the 2nd KDD Workshop on ever, this prediction error was clearly worse than those in Large-Scale Recommender Systems and the Netflix Figure 1(a). On the other hand, when the original prob- Prize Competition, 2008. abilistic matrix factorization model was applied, the MAE [2] M. Jamali and M. Ester. A matrix factorization was 0.652. Although the trade-off for enhancing neutrality technique with trust propagation for recommendation generally worsened prediction accuracy, the errors in 1(a) in social networks. In Proc. of the 4th ACM Conf. on were not significantly worse. This was very positive, indi- Recommender Systems, pages 135–142, 2010. cating that prediction accuracies were not degraded even if [3] T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma. a popularity bias was corrected. Enhancement of the neutrality in recommendation. In We then observed the changes of MAE and NMI accom- Proc. of the 2nd Workshop on Human Decision Making panying an increase in the neutrality parameter, η. Overall, in Recommender Systems, pages 8–14, 2012. the increase of MAEs as increase of η was not great. Turn- [4] T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma. ing to Figure 1(b), we see that recommendation neutrality Efficiency improvement of neutrality-enhanced was successfully enhanced. This means that predicted scores recommendation. In Proc. of the 3rd Workshop on were less influenced by the factor of whether candidate items Human Decision Making in Recommender Systems, were short-head or long-tail. In summary, our INRS success- pages 1–8, 2013. fully corrected a popularity bias without seriously sacrificing [5] M. Levy and K. Bosteels. Music recommendation and prediction accuracy. the long tail. In WOMRAD 2010: RecSys 2010 To illustrate the influence of correcting a popularity bias, Workshop on Music Recommendation and Discovery, Figure 2 shows the distributions of predicted ratings for 2010. short-head and long-tail items. Black and white bars show [6] R. Salakhutdinov and A. Mnih. Probabilistic matrix the distributions of ratings for short-head and long-tail items, factorization. In Advances in Neural Information respectively. In Figure 2(a), ratings are predicted by a stan- Processing Systems 20, pages 1257–1264, 2008. dard recommendation algorithm, and short-head items are 1 http://www.sfu.ca/~sja25/datasets/