Correcting Popularity Bias
                     by Enhancing Recommendation Neutrality

           Toshihiro Kamishima, Shotaro Akaho,                                              Jun Sakuma
                      and Hideki Asoh                                                 University of Tsukuba
         National Institute of Advanced Industrial Science                  1-1-1 Tennodai, Tsukuba, 305-8577 Japan
                      and Technology (AIST)                                           jun@cs.tsukuba.ac.jp
            AIST Tsukuba Central 2, Umezono 1-1-1,
               Tsukuba, Ibaraki, 305-8568 Japan
                  mail@kamishima.net,
           s.akaho@aist.go.jp, h.asoh@aist.go.jp

ABSTRACT                                                                   INRS to avoid a well-known popularity bias, which is the
In this paper, we attempt to correct a popularity bias, which              tendency for popular items to be recommended more fre-
is the tendency for popular items to be recommended more                   quently [1]. When users have no interest in the popularity
frequently, by enhancing recommendation neutrality. Rec-                   of items and wish to ignore this information, they can obtain
ommendation neutrality involves excluding specified infor-                 recommendations that are neutral with respect to the popu-
mation from the prediction process of recommendation. This                 larity of items by specifying the volume of their consumption
neutrality was formalized as the statistical independence be-              as a viewpoint.
tween a recommendation result and the specified informa-                      The popularity bias has previously been corrected by di-
tion, and we developed a recommendation algorithm that                     versifying recommended items [5]. Specifically, instead of
satisfies this independence constraint. We correct the popu-               the most popular and preferred items, slightly less preferred
larity bias by enhancing neutrality with respect to informa-               and diverse kind of items are recommended. This diversifi-
tion regarding whether candidate items are popular or not.                 cation approach is different from our approach of enhancing
We empirically show that a popularity bias in the predicted                recommendation neutrality. While diversity is a property of
preference scores can be corrected.                                        a set of recommendations, neutrality is a relation between
                                                                           recommendations and a specified viewpoint. Many notions
                                                                           of diversity have been proposed, but all of them target a set
Keywords                                                                   of recommendations; thus, it is impossible to correct a bias
recommender system, neutrality, fairness, popularity bias,                 with a single recommendation. On the other hand, a single
probabilistic matrix factorization, information theory                     recommendation can be neutral in its prediction of ratings
                                                                           with respect to a specified viewpoint. This is useful, for ex-
                                                                           ample, when attaching a list of items with predicted ratings
1.    RECOMMENDATION NEUTRALITY                                            that match a user’s query. Therefore, our INRS can be used
      AND POPULARITY BIAS                                                  for correcting the popularity bias in each predicted score.
   We proposed the notion of recommendation neutrality with
respect to a specified viewpoint if no information about the
viewpoint is exploited when generating the recommendation                  2.   EXPERIMENTS
results [3]. If we use terms of information theory, this notion
can be formalized as the condition that the mutual informa-                   We applied our INRS, mean-match [4], to show that our
tion between a recommendation result and a viewpoint is                    approach is effective in correcting a popularity bias. Sim-
zero, and it further implies statistical independence between              ply speaking, this algorithm is a variant of the probabilistic
them. We developed information-neutral recommender sys-                    matrix factorization model [6] that adopts a constraint term
tems (INRS) that predict users’ preference scores while sat-               for enhancing neutrality.
isfying the constraint of statistical independence [3, 4]. This               We evaluated our experimental results in terms of predic-
INRS could be useful for the avoidance of biased recommen-                 tion errors and degree of neutrality. Prediction errors were
dation, fair treatment of content providers, or adherence to               measured by the mean absolute error (MAE). This index
laws and regulations. In this paper, we use the proposed                   was defined as the mean of the absolute difference between
                                                                           the observed rating values and predicted rating values. A
                                                                           smaller value of this index indicates better prediction accu-
                                                                           racy. To measure the degree of neutrality, we adopted nor-
                                                                           malized mutual information (NMI) [4]. The NMI is defined
                                                                           as mutual information between the predicted ratings and
                                                                           viewpoint values, normalized into the range [0, 1]. A smaller
                                                                           NMI indicates a higher level of neutrality. Note that the dis-
                                                                           tribution of scores is modeled by a multinomial distribution
Copyright is held by the author/owner(s).
RecSys 2014 Poster Proceedings, October 6–10, 2014, Foster City, Silicon   after discretizing prediction scores. We performed a five-fold
Valley, USA.                                                               cross-validation procedure to obtain evaluation indices.
                                                0.01
      0.70


                                               0.005
MAE


                                         NMI
      0.65


                                               0.002


      0.60
                                               0.001                                dislike                  like   dislike                     like
        0.01   0.1   1     10      100             0.01   0.1   1   10   100
                     η                                          η


      (a) Prediction error (MAE)         (b) Degree of neutrality (NMI)                       (a) standard            (b) neutrality enhanced

Figure 1: Changes in the accuracy and degree of neutrality                     Figure 2: Distribution of the predicted ratings for short-head
accompanying an increase in the neutrality parameter                           and long-tail items


                                                                               highly rated. After correcting the popularity bias (η = 100)
   The data set was the Flixster data set1 [2]. The total
                                                                               as in Figure 2(b), the distributions of ratings for short-head
numbers of users and movies were 147,612 and 48,794, re-
                                                                               and long-tail items become much closer; that is to say, the
spectively, and the data set consisted of 8,196,077 ratings.
                                                                               predicted ratings are less influenced by items’ popularity. It
Ratings are represented by a ten-point-scale whose domain
                                                                               follows from this figure that our INRS successfully corrected
is 0.5 to 5.0 in 0.5 increments. To correct a popularity bias,
                                                                               a popularity bias.
we adopted the popularity of items as a viewpoint. Candi-
date movies were first sorted by the number of users who
rated the movie in a descending order, and a viewpoint rep-                    3.         CONCLUSIONS
resented whether or not a movie was in the top 1% of this                        We corrected a popularity bias by enhancing recommen-
list. We called the group of top 1% items the short-head                       dation neutrality and empirically showed the effectiveness
items, and the group containing the rest the long-tail items.                  of our approach. We plan to improve the efficiency of our
   Figure 1(a) shows the change of prediction errors mea-                      information-neutral recommendation algorithm and to adopt
sured by the MAE in a linear scale. Figure 1(b) shows                          a more sophisticated model for expressing popularity.
the change in NMI in a logarithmic scale. The X-axes of
these figures represent the values of a neutrality parameter,                  4.         ACKNOWLEDGMENTS
η, which balances the prediction of accuracy and neutral-
                                                                                 We would like to thank for providing a data set
ity. These parameters were changed from 0.01, at which the
                                                                               for Dr. Mohsen Jamali.        This work is supported
neutrality term was almost completely ignored, to 100, at
                                                                               by MEXT/JSPS KAKENHI Grant Number 16700157,
which neutrality was strongly enhanced.
                                                                               21500154, 24500194, and 25540094.
   We first compared these with two baseline results. The
MAE was 0.871 when the rating being offered was held con-
stant at 3.61, which is the mean rating over all sample rat-                   5.         REFERENCES
ings in the training data. This approximately simulated the                    [1] Ò. Celma and P. Cano. From hits to niches?: or how
case of randomly recommending items, and can be consid-                            popular artists can bias music recommendation and
ered the most unbiased and neutral recommendation. How-                            discovery. In Proc. of the 2nd KDD Workshop on
ever, this prediction error was clearly worse than those in                        Large-Scale Recommender Systems and the Netflix
Figure 1(a). On the other hand, when the original prob-                            Prize Competition, 2008.
abilistic matrix factorization model was applied, the MAE                      [2] M. Jamali and M. Ester. A matrix factorization
was 0.652. Although the trade-off for enhancing neutrality                         technique with trust propagation for recommendation
generally worsened prediction accuracy, the errors in 1(a)                         in social networks. In Proc. of the 4th ACM Conf. on
were not significantly worse. This was very positive, indi-                        Recommender Systems, pages 135–142, 2010.
cating that prediction accuracies were not degraded even if                    [3] T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma.
a popularity bias was corrected.                                                   Enhancement of the neutrality in recommendation. In
   We then observed the changes of MAE and NMI accom-                              Proc. of the 2nd Workshop on Human Decision Making
panying an increase in the neutrality parameter, η. Overall,                       in Recommender Systems, pages 8–14, 2012.
the increase of MAEs as increase of η was not great. Turn-                     [4] T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma.
ing to Figure 1(b), we see that recommendation neutrality                          Efficiency improvement of neutrality-enhanced
was successfully enhanced. This means that predicted scores                        recommendation. In Proc. of the 3rd Workshop on
were less influenced by the factor of whether candidate items                      Human Decision Making in Recommender Systems,
were short-head or long-tail. In summary, our INRS success-                        pages 1–8, 2013.
fully corrected a popularity bias without seriously sacrificing
                                                                               [5] M. Levy and K. Bosteels. Music recommendation and
prediction accuracy.
                                                                                   the long tail. In WOMRAD 2010: RecSys 2010
   To illustrate the influence of correcting a popularity bias,
                                                                                   Workshop on Music Recommendation and Discovery,
Figure 2 shows the distributions of predicted ratings for
                                                                                   2010.
short-head and long-tail items. Black and white bars show
                                                                               [6] R. Salakhutdinov and A. Mnih. Probabilistic matrix
the distributions of ratings for short-head and long-tail items,
                                                                                   factorization. In Advances in Neural Information
respectively. In Figure 2(a), ratings are predicted by a stan-
                                                                                   Processing Systems 20, pages 1257–1264, 2008.
dard recommendation algorithm, and short-head items are

1
      http://www.sfu.ca/~sja25/datasets/