1. Introduction

Collaborative Filtering for Popularity-Aware Recom mendation

(Discussion Paper)

Ettore Ritacco

ettore.ritacco@icar.cnr.it 0

Luciano Caroprese

luciano.caroprese@icar.cnr.it 0

Giuseppe Manco

giuseppe.manco@icar.cnr.it 0

Marco Minici

marco.minici@icar.cnr.it 0

Francesco Sergio Pisani

francescosergio.pisani@icar.cnr.it 0

Workshop Proceedings

Recommender Systems, Collaborative Filtering, Deep Learning, Big Data

0 ICAR-CNR , Via Bucci, 8-9c, Rende , Italy

We analyze the behavior of recommender systems relative to the popularity of the items to recommend. Our findings show that most popular ranking-based recommenders are biased towards popular items, thus afecting the quality of recommendation. Based on these observations, we propose a new deep learning architecture with an improved learning strategy that significantly improves the performance of such recommenders on low-popular items. The proposed technique is based on two main aspects: resampling of negatives and ensembling of multiple instances of the algorithm. Experimental results on traditional benchmark datasets show that the proposed approach substantially improves the recommendation ability by balancing accurate contributions almost independently from the popularity of the items to recommend.

1. Introduction

The massive amount of information available to users in the form of digital catalogs and online services allows users to access and consume online content, but at the same time it poses a choice paradox. Purchasing items on e-commerce sites, selecting movies on streaming platforms or connecting with other peers on social networks implies making choices among a huge amount of elements. Recommender Systems play a crucial role within this context since they provide users with a better experience through the recommendation of new content and data that users are likely to appreciate.

Recommendations can have a disruptive impact: if on one side, they assist users in their choices, on the other side they can influence the choices themselves. Indeed, Recommender systems can favor particular categories of items or particular brands over others, thus introducing a bias within the catalog [1]. A typical bias is intrinsic to the nature of recommendation. In systems involving interaction between significant amounts of users and items, we observe the presence of very few very popular items and many less popular others. This distribution follows the so-called 80-20 rule that refers to the fact that 80% of users express preferences for only 20% of the available items. In practice, user preferences follow a long tail distribution. Recommender Systems based on collaborative filtering [ 2], who tend to characterize user preferences from interactions, are afected by a relevant problem: they tend to suggest very popular items and neglect less popular ones. This way, there is a reinforced efect because the most popular items will become more and more popular than the less known ones. This phenomenon is called popularity bias.

The quality of the recommendations is inevitably afected by the popularity bias, since most recommendations are prone to contain objects that the user will consider trivial and well known. By contrast, the capability of recommending items belonging to the long tail (i.e., less popular) can disclose new perspectives: niche items, little-known objects, hidden gems that satisfy the user’s preferences can greatly improve user engagement, provided that the recommended items are consistent with user’s taste.

The current literature has extensively studied the problems of fairness and bias [3, 4, 5] within recommender systems, with particular emphasis to popularity bias [6, 7, 8]. Notably, in [9], it is shown that unfair recommendations are concentrated on groups of users interested in long-tail and less popular items. In this paper we focus our attention on recommender sytems based on ranking [10, 11], which are extremely flexible and as a consequence particularly interesting to unbias. Typical approaches [12, 13] consider techniques to increase the representation of less popular items, by either post-processing techniques or constraining the recommendation score. Despite such recent eforts, unbiasing the recommendation from popular items is still an open problem.

In this paper we devise a ranking-based recommender system for implicit feedback (RVAE), based on a variational autoencoder architecture. The proposed model is a substantial extension of the MVAE model proposed in [14] and in fact it inherits its accuracy and computational eficiency. We analyze the behavior of RVAE and show that the model is characterized by the popularity bias problem. We then propose an experimental study of specific techniques to overcome it. The proposed technique is based on two main concepts: resampling/reweighting items and ensembling of multiple instances of the algorithm. Our experiments show that these simple strategies allow to unbias the algorithm and hence provide more efective recommendations.

2. Basic Framework

We start by setting the notation that we shall use throughout the paper. In the context of collaborative filtering, ∈ = {1, … , } indexes a user and , ∈ = {1, … , } index items for which the user can express a preference. We model implicit feedback, thus assuming a preference matrix X ∈ {0, 1 }× , so that , = 1 whenever user expressed a preference for item , and , = 0 otherwise. Also, x is the (binary) row indexed at , representing all the item preferences for user . Given x , we define = { ∈ | , = 1} (with = | |). The preference matrix induces a natural ordering between items: ≺ has the meaning that prefers to , i.e. , > , in the rating matrix. Our objective is to devise a model for such an ordering. Preference Modeling. We consider a general framework where preferences are modeled as the efect of latent factors ultimately characterizing users and/or items. We shall consider two basic instantiations of this general idea, and will provide a unified framework.

The first situation we consider is the Multinomial Variational Autoencoder (MVAE) framework proposed in [14]. Within this framework, for a given user the related x is modeled as the efect of a multinomial distribution governed by a prior z, i.e.

x ∼ Discrete ( ( z)) ( z) ∝ exp { (z)} ( x ) = ∫ ( x |z) ( z) dz Here, (⋅) represents a neural network parameterized by . The latent variable z is modeled by a prior ( z) (typically a gaussian distribution). Thus, the probability of preferences for a given user can be expressed as Due to the intractability of the above integral, [15] devise a variational approach based on a proposal ( z|x ) that approximates the posterior distribution. Again, is modeled as a gaussian distribution

( z|x ) = ( z; , ), where , = (x ) and is a neural network parameterized by . By exploiting the inequality log ( x ) ≥ z∼(⋅| x ) [log ( x |z) − ( z)] we can finally learn the , parameters by optimizing the loss

ℓ ∑ { z∼ (⋅|x ) [log (x |z)] − [ The overall framework is based hence on regularized encoder-decoder scheme, where (z|x ) represents the encoder, (x |z) represents the decoder and the term z∼ (⋅|x ) [ ( z)] acts as a regularizer. In the training phase, for each a latent variable z ∼ (⋅|x ) is devised. Next, z is exploited to devise the probability (x |z). Users with low probability are penalized within the loss and the network parameters can be updated accordingly.

Prediction for new items is accomplished by resorting to the learned functions and : given a (partial) user history x , we compute z = and then devise the probabilities for the whole item set through ( z). Unseen items can then be ranked according to their associated probabilities.

The second formulation we consider is inspired by the Bayesian Personalized Ranking (BPR) model introduced in [10]. The idea underlying this model is that a preference ≺ can be directly explained as closeness in a latent space where both items and users can be mapped. Mathematically this can be devised by computing a factorization rank p q for each pair (, ) , and modeling preferences by means of a Bernoulli process: ≺ ∼ Bernoulli()

= (p (q − q )) where () =

(1 + − )−1 is the logistic function. The optimal embeddings P and Q can hence be obtained by opimizing the loss ℓ (P, Q) ≈ ∑ ∑ log (p (q − q ) ,

≺

We combine the two frameworks by adapting the BPR loss to the MVAE model. In particular, instead of modeling ( x |z), we directly model ( ≺ | z) within a similar variational framework. In short, the current preferences are encoded within a latent variable z that is further exploited to decode all ranks: ≺ ∼ Bernoulli( , ) , = ( − )

= (z) Here, represents the output of a neural network parameterized by . For a given item , the value represents then the associated rank which can be used sort all preferences. The model can be obtained by optimizing the loss: , ≺ ℓ it is usually customary to only consider a subset ⊂ {(, )|, ∈ ; ≺ } . The sampling of is critical for determining the behavior of any predictive model; the most used approach in literature is to uniformly sample, for each user and item (called positive item), a fixed number of items { 1, … } ⊂ − with the underlying assumption that ∀ ∶ ≺ . Thus, Eq. (1) can be rewritten as: ℓ ∑ { z∼ (⋅|x ) [ ∑ log ( ≺ | z)] − [

3. The Impact of Popularity

We start our analysis on the following popular benchmark datasets: i) Movielens, a time series dataset containing user-item ratings pairs along with the corresponding timestamp; ii) Pinterest, based on the social media that allows users to save or pin an image (item) to their board. The dataset denotes as 1 the pinned images for each user; iii) CiteUlike, a dataset obtained from the homonymous service which provides a digital catalog to save and share (1) (2) 1007 low-popular items (avg pop: 6) 103 1243627tompi-dp-oppouplualrairteitmems(sa(vagvgpoppo:p1:213694)) 103 71766270 lmowid--ppooppuullaarriitteemmss((aavvggppoopp::1185)3)

529 top-popular items (avg pop: 568) academic papers. Within fig. 1 we plot all the items within the datasets, by increasing popularity. For each dataset we identify three classes: low, mid and high popular items.

We then study the behavior of the RVAE model with respect to the popularity classes defined. To do so, we adopt the following protocol. For each dataset, 70% of users are randomly sampled with all user’s items. Each such user is associated with x and a set of positive/negative item pairs. In particular, we consider all positive items within x , and for each positive item we sample = 4 negative items. The remaining 30% users are uniformly split into validation and test. In particular, for each user we consider a random subset ⊂ representing the 30% of the positive items, and represents a subset of 100 negative items sampled from − . The vector x is masked to remove all elements in . We then feed the masked x to obtain the score vector . Now, for a given cutof value , let us consider the − 1 negative items for for which RVAE gives the highest score, and, among them the item having the minimum score. There is an hit with cutof , for the user and the item ∈ , if , ≥ , . Let the number of hits for the user with cutof . We define the Hit-Rate at on as HR@ = ∑∈ . ∑∈ | | We can trivially specialize this definition for items within the low, medium and high popularity, by considering only items in that belong to the specific class. The results of the evaluation are summarized in table 1a. We can see that the model sufers mainly on low-popular items. As a matter of fact, the overexposure of popular items is predominant and the model learn to predict essentially those items. The fact is that popular items are easy to predict. However, it is in the mid and especially on low popular items that the most interesting predictions can take place: niche items are dificult to discover by an end user and hence their accurate suggestions can greatly improve user engagement. The research question is hence: how can we boost the model to improve the performance on low-popular items?

4. Unbiased Recommendation

In a simple experiment, we retrain the model by only considering pairs (, ) ∈ such that is in the low popular class. We call this model RVAE . Compared to the results in table 1a, the results for this restricted model (shown in table 1b) show that if attention is placed on low popular items, their response on prediction accuracy can be improved. Similar results can be observed for the mid popular items. Thus, in order to unbias the model we need to rebalance Low 0.00 Low 0.01 Low 0.05 achieve this goal, we study three diferent strategies.

The first strategy consists in weighting, for each pair (, ) ∈ , the contribution to the loss with a factor inversely proportional to the popularity of the item : = (1+ ( −−1) )−1+1 +1 item: = ⋅ ⌈ ma x(f)⌉.

Here, represents the number of occurrences of , and , , the parameters representing the steep, center and scale of the decay of high popular items. We experiment with = 0.01 , representing the average frequency of mid-popular items and = 100 and call this variant RVAE . The above strategy has the advantage of reweighing the contributions of low and mid popular items with respect to high popular ones: the ratio between the most popular and the lowest popular is approximately 1/ . However, this weighting scheme has a main disadvantage. The weight is relative to a positive item , but it is associated with pairs (, ) ∈ . That is, besides weighting the contribution of , this scheme also overexposes (or dually underexposes) the contribution of the negative item . To avoid this, an alternative strategy consists in changing the sampling scheme that produces . In practice, rather than uniformly sampling, for each positive item, a fixed number of negative items, we can apply an inversely stratified sampling where negatives are sampled, with being inversely proportional to the popularity of the

The ratio with the above formula is to provide the same visibility to each positive item in the loss function. Thus, the most popular item will be associated with exactly pairs. By contrast, low and mid popular items will be overexposed in the comparison. We call this variant RVAE .

The third strategy consists in combining the baseline RVAE model with RVAE . For a user , let the the score vector produced by RVAE, and the score produced by RVAE . We define RVAE and the model that produces the score defined as = Softmax ( ) + m ⋅ Softmax ( ).

The vector m masks all items but the low popular. The scores are normalized (via the Softmax function) to make the two models comparable. Finally, is a weight aimed at tuning the boost for the low popular items, as devised by RVAE . We experimentally found an optimal tuning with = 0.4 .

s RVAE le RVAE n e io RVAE v M RVAE t RVAE se RVAE r iten RVAE P RVAE e RVAE lik RVAE i RVAE U e t C RVAE considerably improve the response of the model to the low popular items. However, RVAE has a low response on the high popular items. By contrast, RVAE succeeds in boosting performance on both the low popular and the mid popular items. Overall, the ensemble RVAE provides the best response, by boosting low popular items without substantially degrading over the other classes.

5. Conclusions

The approach proposed in this paper is a preliminary study: We introduce a Ranking collaborative filtering algorithm (RVAE) and study how the algorithm is afected by popularity bias. Next, we show how simple techniques based on reweighting/resampling and/or ensembling can recalibrate the recommendations. There are several aspects that are worth further investigation. First of all, both the weighting and the inverse stratified sampling schemes are based on hyperparameters that need to be carefully tuned. Also, the ensemble strategies are simple and more complex schemes that also take into account other model instantiations can be studied. We reserve the attention to these challenges in a future work. 330–347.

1153–1162. [1] V. Tsintzou, E. Pitoura, P. Tsaparas, Bias disparity in recommendation systems, CoRR abs/1811.01461 (2018). URL: http://arxiv.org/abs/1811.01461. [2] C. C. Aggarwal, Recommender Systems, Springer, 2016. [3] B. Friedman, H. Nissenbaum, Bias in computer systems, ACM Trans. Inf. Syst. 14 (1996) [4] Z. Zhu, X. Hu, J. Caverlee, Fairness-aware tensor-based recommendation, in: ACM

International Conference on Information and Knowledge Management, CIKM ’18, 2018, p. [5] Y. Deldjoo, V. W. Anelli, H. Zamani, A. Bellogin, T. Di Noia, A flexible framework for evaluating user and item fairness in recommender systems, User Modeling and UserAdapted Interaction (2021). [6] O. Celma, P. Cano, From hits to niches?: Or how popular artists can bias music recommendation and discovery, in: 2nd KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition, 2008. [7] H. Steck, Item popularity and recommendation accuracy, in: Proceedings of the Fifth

ACM Conference on Recommender Systems, RecSys ’11, 2011, p. 125–132. [8] R. Borges, K. Stefanidis, On measuring popularity bias in collaborative filtering data, in:

EDBT Workshop on BigVis 2020: Big Data Visual Exploration and Analytics, 2020. [9] H. Abdollahpouri, M. Mansoury, R. Burke, B. Mobasher, The unfairness of popularity bias in recommendation, CoRR abs/1907.13286 (2019). URL: http://arxiv.org/abs/1907.13286. [10] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, Bpr: Bayesian personalized ranking from implicit feedback, in: Conf. on Uncertainty in Artificial Intelligence, UAI ’09, 2009, pp. 452–461. [11] T. Ebesu, B. Shen, Y. Fang, Collaborative memory network for recommendation systems, in: ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’18, 2018. [12] Z. Zhu, J. Wang, J. Caverlee, Measuring and mitigating item under-recommendation bias in personalized ranking systems, in: ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, 2020, p. 449–458. [13] H. Abdollahpouri, R. Burke, B. Mobasher, Managing popularity bias in recommender systems with personalized re-ranking, CoRR abs/1901.07555 (2019). URL: http://arxiv.org/ abs/1901.07555. [14] D. Liang, R. G. Krishnan, M. Hofman, T. Jebara, Variational autoencoders for collaborative ifltering, in: ACM Conf, on World Wide Web, WWW ’18, 2018, pp. 689–698. [15] D. P. Kingma, M. Welling, Auto-encoding variational bayes, in: 2nd International Conference on Learning Representations, ICLR’14, 2014. a r X i v : h t t p : / / a r x i v . o r g / a b s / 1 3 1 2 . 6 1 1 4 v 1 0 .