Unbiasing Collaborative Filtering for Popularity-Aware Recommendation (Discussion Paper) Luciano Caroprese1 , Giuseppe Manco1 , Marco Minici1 , Francesco Sergio Pisani1 and Ettore Ritacco1 1 ICAR-CNR, Via Bucci, 8-9c, Rende (Italy) Abstract We analyze the behavior of recommender systems relative to the popularity of the items to recommend. Our findings show that most popular ranking-based recommenders are biased towards popular items, thus affecting the quality of recommendation. Based on these observations, we propose a new deep learning architecture with an improved learning strategy that significantly improves the performance of such recommenders on low-popular items. The proposed technique is based on two main aspects: resampling of negatives and ensembling of multiple instances of the algorithm. Experimental results on traditional benchmark datasets show that the proposed approach substantially improves the recommendation ability by balancing accurate contributions almost independently from the popularity of the items to recommend. Keywords Recommender Systems, Collaborative Filtering, Deep Learning, Big Data 1. Introduction The massive amount of information available to users in the form of digital catalogs and online services allows users to access and consume online content, but at the same time it poses a choice paradox. Purchasing items on e-commerce sites, selecting movies on streaming platforms or connecting with other peers on social networks implies making choices among a huge amount of elements. Recommender Systems play a crucial role within this context since they provide users with a better experience through the recommendation of new content and data that users are likely to appreciate. Recommendations can have a disruptive impact: if on one side, they assist users in their choices, on the other side they can influence the choices themselves. Indeed, Recommender systems can favor particular categories of items or particular brands over others, thus introducing a bias within the catalog [1]. A typical bias is intrinsic to the nature of recommendation. In systems involving interaction between significant amounts of users and items, we observe the presence of very few very popular items and many less popular others. This distribution follows the so-called 80-20 rule that refers to the fact that 80% of users express preferences for only 20% SEBD 2021: The 29th Italian Symposium on Advanced Database Systems, September 5-9, 2021, Pizzo Calabro (VV), Italy Envelope-Open luciano.caroprese@icar.cnr.it (L. Caroprese); giuseppe.manco@icar.cnr.it (G. Manco); marco.minici@icar.cnr.it (M. Minici); francescosergio.pisani@icar.cnr.it (F. S. Pisani); ettore.ritacco@icar.cnr.it (E. Ritacco) Β© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) of the available items. In practice, user preferences follow a long tail distribution. Recommender Systems based on collaborative filtering [2], who tend to characterize user preferences from interactions, are affected by a relevant problem: they tend to suggest very popular items and neglect less popular ones. This way, there is a reinforced effect because the most popular items will become more and more popular than the less known ones. This phenomenon is called popularity bias. The quality of the recommendations is inevitably affected by the popularity bias, since most recommendations are prone to contain objects that the user will consider trivial and well known. By contrast, the capability of recommending items belonging to the long tail (i.e., less popular) can disclose new perspectives: niche items, little-known objects, hidden gems that satisfy the user’s preferences can greatly improve user engagement, provided that the recommended items are consistent with user’s taste. The current literature has extensively studied the problems of fairness and bias [3, 4, 5] within recommender systems, with particular emphasis to popularity bias [6, 7, 8]. Notably, in [9], it is shown that unfair recommendations are concentrated on groups of users interested in long-tail and less popular items. In this paper we focus our attention on recommender sytems based on ranking [10, 11], which are extremely flexible and as a consequence particularly interesting to unbias. Typical approaches [12, 13] consider techniques to increase the representation of less popular items, by either post-processing techniques or constraining the recommendation score. Despite such recent efforts, unbiasing the recommendation from popular items is still an open problem. In this paper we devise a ranking-based recommender system for implicit feedback (RVAE), based on a variational autoencoder architecture. The proposed model is a substantial extension of the MVAE model proposed in [14] and in fact it inherits its accuracy and computational efficiency. We analyze the behavior of RVAE and show that the model is characterized by the popularity bias problem. We then propose an experimental study of specific techniques to overcome it. The proposed technique is based on two main concepts: resampling/reweighting items and ensembling of multiple instances of the algorithm. Our experiments show that these simple strategies allow to unbias the algorithm and hence provide more effective recommendations. 2. Basic Framework We start by setting the notation that we shall use throughout the paper. In the context of collaborative filtering, 𝑒 ∈ π‘ˆ = {1, … , 𝑀} indexes a user and 𝑖, 𝑗 ∈ 𝐼 = {1, … , 𝑁 } index items for which the user can express a preference. We model implicit feedback, thus assuming a preference matrix X ∈ {0, 1}𝑀×𝑁 , so that π‘₯𝑒,𝑖 = 1 whenever user 𝑒 expressed a preference for item 𝑖, and π‘₯𝑒,𝑖 = 0 otherwise. Also, x𝑒 is the (binary) row indexed at 𝑒, representing all the item preferences for user 𝑒. Given x𝑒 , we define 𝐼𝑒 = {𝑖 ∈ 𝐼 |π‘₯𝑒,𝑖 = 1} (with 𝑁𝑒 = |𝐼𝑒 |). The preference matrix induces a natural ordering between items: 𝑖 ≺𝑒 𝑗 has the meaning that 𝑒 prefers 𝑖 to 𝑗, i.e. π‘₯𝑒,𝑖 > π‘₯𝑒,𝑗 in the rating matrix. Our objective is to devise a model for such an ordering. Preference Modeling. We consider a general framework where preferences are modeled as the effect of latent factors ultimately characterizing users and/or items. We shall consider two basic instantiations of this general idea, and will provide a unified framework. The first situation we consider is the Multinomial Variational Autoencoder (MVAE) framework proposed in [14]. Within this framework, for a given user 𝑒 the related x𝑒 is modeled as the effect of a multinomial distribution governed by a prior z, i.e. x𝑒 ∼ Discrete (πœ‹(z)) πœ‹(z) ∝ exp {π‘“πœ™ (z)} Here, π‘“πœ™ (β‹…) represents a neural network parameterized by πœ™. The latent variable z is modeled by a prior 𝑃(z) (typically a gaussian distribution). Thus, the probability of preferences for a given user can be expressed as 𝑃(x𝑒 ) = ∫ 𝑃(x𝑒 |z)𝑃(z) dz Due to the intractability of the above integral, [15] devise a variational approach based on a proposal 𝑄(z|x𝑒 ) that approximates the posterior distribution. Again, 𝑄 is modeled as a gaussian distribution 𝑄(z|x𝑒 ) = 𝒩 (z; πœ‡π‘’ , πœŽπ‘’ ), where πœ‡π‘’ , πœŽπ‘’ = π‘”πœƒ (x𝑒 ) and π‘”πœƒ is a neural network parameterized by πœƒ. By exploiting the inequality log 𝑃(x𝑒 ) β‰₯ 𝔼zβˆΌπ‘„(β‹…|x𝑒 ) [log 𝑃(x𝑒 |z) βˆ’ 𝑃(z)] we can finally learn the πœ™, πœƒ parameters by optimizing the loss ℓ𝑀𝑉 𝐴𝐸 (πœ™, πœƒ) = βˆ‘ {𝔼zβˆΌπ‘„πœƒ (β‹…|x𝑒 ) [log π‘ƒπœ™ (x𝑒 |z)] βˆ’ 𝕂𝕃[π‘„πœƒ (z|x𝑒 )‖𝑃(z)]} 𝑒 The overall framework is based hence on regularized encoder-decoder scheme, where π‘„πœƒ (z|x𝑒 ) represents the encoder, π‘ƒπœƒ (x𝑒 |z) represents the decoder and the term 𝔼zβˆΌπ‘„πœƒ (β‹…|x𝑒 ) [𝑃(z)] acts as a regularizer. In the training phase, for each 𝑒 a latent variable z ∼ π‘„πœƒ (β‹…|x𝑒 ) is devised. Next, z is exploited to devise the probability π‘ƒπœ™ (x𝑒 |z). Users with low probability are penalized within the loss and the network parameters can be updated accordingly. Prediction for new items is accomplished by resorting to the learned functions π‘ƒπœ™ and π‘„πœƒ : given a (partial) user history x𝑒 , we compute z = πœ‡π‘’ and then devise the probabilities for the whole item set through πœ‹(z). Unseen items can then be ranked according to their associated probabilities. The second formulation we consider is inspired by the Bayesian Personalized Ranking (BPR) model introduced in [10]. The idea underlying this model is that a preference 𝑖 ≺𝑒 𝑗 can be directly explained as closeness in a latent space where both items and users can be mapped. Mathematically this can be devised by computing a factorization rank p𝑇𝑒 q𝑖 for each pair (𝑒, 𝑖), and modeling preferences by means of a Bernoulli process: 𝑖 ≺𝑒 𝑗 ∼ Bernoulli(𝑝) 𝑝 = 𝜎 (p𝑇𝑒 (q𝑖 βˆ’ q𝑗 )) where 𝜎(π‘Ž) = (1 + 𝑒 βˆ’π‘Ž )βˆ’1 is the logistic function. The optimal embeddings P and Q can hence be obtained by opimizing the loss ℓ𝐡𝑃𝑅 (P, Q) β‰ˆ βˆ‘ βˆ‘ log 𝜎 (p𝑇𝑒 (q𝑖 βˆ’ q𝑗 ) 𝑒 𝑖,𝑗 𝑖≺𝑒 𝑗 We combine the two frameworks by adapting the BPR loss to the MVAE model. In particular, instead of modeling 𝑃(x𝑒 |z), we directly model 𝑃(𝑖 ≺𝑒 𝑗|z) within a similar variational framework. In short, the current preferences are encoded within a latent variable z that is further exploited to decode all ranks: 𝑖 ≺𝑒 𝑗 ∼ Bernoulli(𝑝𝑖,𝑗 ) 𝑝𝑖,𝑗 = 𝜎 (πœπ‘– βˆ’ πœπ‘— ) 𝜁 = π‘“πœ™ (z) Here, 𝜁 represents the output of a neural network parameterized by πœ™. For a given item 𝑖, the value πœπ‘– represents then the associated rank which can be used sort all preferences. The model can be obtained by optimizing the loss: ℓ𝑅𝑉 𝐴𝐸 (πœ™, πœƒ) = βˆ‘ {𝔼zβˆΌπ‘„πœƒ (β‹…|x𝑒 ) [βˆ‘ log π‘ƒπœ™ (𝑖 ≺𝑒 𝑗|z)] βˆ’ 𝕂𝕃[π‘„πœƒ (z|x𝑒 )‖𝑃(z)]} (1) 𝑒 𝑖,𝑗 𝑖≺𝑒 𝑗 We call this model Ranking Variational Autoencoder (RVAE). Learning by Negative Sampling. In the above formulation there are some details worth further discussion. When learning the RVAE model, optimizing the likelihood requires that all pairs of items are considered within Eq. (1). This is unrealistic with large item bases, and it is usually customary to only consider a subset 𝒫𝑒 βŠ‚ {(𝑖, 𝑗)|𝑖, 𝑗 ∈ 𝐼 ; 𝑖 ≺𝑒 𝑗}. The sampling of π’Ÿπ‘’ is critical for determining the behavior of any predictive model; the most used approach in literature is to uniformly sample, for each user 𝑒 and item 𝑖 (called positive item), a fixed number of items {𝑗1 , … 𝑗𝑛 } βŠ‚ 𝐼 βˆ’ 𝐼𝑒 with the underlying assumption that βˆ€π‘˜ ∢ 𝑖 ≺𝑒 π‘—π‘˜ . Thus, Eq. (1) can be rewritten as: ℓ𝑅𝑉 𝐴𝐸 (πœ™, πœƒ) = βˆ‘ {𝔼zβˆΌπ‘„πœƒ (β‹…|x𝑒 ) [ βˆ‘ log π‘ƒπœ™ (𝑖 ≺𝑒 𝑗|z)] βˆ’ 𝕂𝕃[π‘„πœƒ (z|x𝑒 )‖𝑃(z)]} (2) 𝑒 (𝑖,𝑗)βˆˆπ’Ÿπ‘’ This will be the basis loss upon which we develop our study next. 3. The Impact of Popularity We start our analysis on the following popular benchmark datasets: i) Movielens, a time series dataset containing user-item ratings pairs along with the corresponding timestamp; ii) Pinterest, based on the social media that allows users to save or pin an image (item) to their board. The dataset denotes as 1 the pinned images for each user; iii) CiteUlike, a dataset obtained from the homonymous service which provides a digital catalog to save and share 1007 low-popular items (avg pop: 6) 1767 low-popular items (avg pop: 18) 2367 mid-popular items (avg pop: 164) 103 7620 mid-popular items (avg pop: 153) 103 142 top-popular items (avg pop: 1239) 529 top-popular items (avg pop: 568) 102 3783 low-popular items (avg pop: 4) 12693 mid-popular items (avg pop: 12) 504 top-popular items (avg pop: 61) 102 Popularity Popularity Popularity 102 101 101 101 100 100 100 0 0.5k 1.0k 1.5k 2.0k 2.5k 3.0k 3.5k 0 2k 4k 6k 8k 10k 0 2k 4k 6k 8k 10k 12k 14k 16k Item Item Item (a) ML1M (b) Pinterest (c) CiteUlike Figure 1: Item popularity distributions. academic papers. Within fig. 1 we plot all the items within the datasets, by increasing popularity. For each dataset we identify three classes: low, mid and high popular items. We then study the behavior of the RVAE model with respect to the popularity classes defined. To do so, we adopt the following protocol. For each dataset, 70% of users are randomly sampled with all user’s items. Each such user is associated with x𝑒 and a set π’Ÿπ‘’ of positive/negative item pairs. In particular, we consider all positive items within x𝑒 , and for each positive item 𝑖 we sample 𝑛 = 4 negative items. The remaining 30% users are uniformly split into validation and test. In particular, for each user 𝑒 we consider a random subset 𝑃𝑒 βŠ‚ 𝐼𝑒 representing the 30% of the positive items, and 𝑁𝑒 represents a subset of 100 negative items sampled from 𝐼 βˆ’ 𝐼𝑒 . The vector x𝑒 is masked to remove all elements in 𝑃𝑒 . We then feed the masked x𝑒 to obtain the score vector πœπ‘’ . Now, for a given cutoff value 𝑐, let us consider the 𝑐 βˆ’ 1 negative items for 𝑒 for which RVAE gives the highest score, and, among them the item 𝑗 having the minimum score. There is an hit with cutoff 𝑐, for the user 𝑒 and the item 𝑖 ∈ 𝑃𝑒 , if πœπ‘’,𝑖 β‰₯ πœπ‘’,𝑗 . Let 𝐻𝑒𝑐 the βˆ‘ 𝐻𝑐 number of hits for the user 𝑒 with cutoff 𝑐. We define the Hit-Rate at 𝑐 on 𝑇 as HR@𝑐 = βˆ‘ π‘’βˆˆπ‘‡ |𝑃 𝑒| . π‘’βˆˆπ‘ˆ 𝑒 We can trivially specialize this definition for items within the low, medium and high popularity, by considering only items in 𝑃𝑒 that belong to the specific class. The results of the evaluation are summarized in table 1a. We can see that the model suffers mainly on low-popular items. As a matter of fact, the overexposure of popular items is predominant and the model learn to predict essentially those items. The fact is that popular items are easy to predict. However, it is in the mid and especially on low popular items that the most interesting predictions can take place: niche items are difficult to discover by an end user and hence their accurate suggestions can greatly improve user engagement. The research question is hence: how can we boost the model to improve the performance on low-popular items? 4. Unbiased Recommendation In a simple experiment, we retrain the model by only considering pairs (𝑖, 𝑗) ∈ π’Ÿπ‘’ such that 𝑖 is in the low popular class. We call this model RVAE 𝐿 . Compared to the results in table 1a, the results for this restricted model (shown in table 1b) show that if attention is placed on low popular items, their response on prediction accuracy can be improved. Similar results can be observed for the mid popular items. Thus, in order to unbias the model we need to rebalance HR@1 HR@5 HR@10 Dataset Global Low Med High Global Low Med High Global Low Med High Movielens 0.2510 0.00 0.16 0.43 0.5853 0.01 0.47 0.83 0.7443 0.05 0.65 0.94 Pinterest 0.2731 0.13 0.23 0.46 0.6992 0.46 0.66 0.86 0.8764 0.65 0.86 0.95 CiteULike 0.2875 0.07 0.22 0.67 0.6021 0.29 0.57 0.90 0.7638 0.48 0.75 0.95 (a) RVAE HR@1 HR@5 HR@10 Dataset Global Low Global Low Global Low Movielens 0.0012 0.15 0.0042 0.50 0.0063 0.74 Pinterest 0.0089 0.43 0.0166 0.81 0.0192 0.93 CiteULike 0.0254 0.32 0.0561 0.71 0.0703 0.89 (b) RVAE 𝐿 Table 1 Predictive accuracy. the contribution on low (and mid) popular items with regards to the high popular ones. To achieve this goal, we study three different strategies. The first strategy consists in weighting, for each pair (𝑖, 𝑗) ∈ π’Ÿπ‘’ , the contribution to the loss with a factor inversely proportional to the popularity of the item 𝑖: βˆ’1 𝛾(1+𝑒 𝛼(𝑓𝑖 βˆ’π›½βˆ’1) ) +1 𝑀𝑖 = 𝛾 +1 Here, 𝑓𝑖 represents the number of occurrences of 𝑖, and 𝛼, 𝛽, 𝛾 the parameters representing the steep, center and scale of the decay of high popular items. We experiment with 𝛼 = 0.01, 𝛽 representing the average frequency of mid-popular items and 𝛾 = 100 and call this variant RVAE π‘Š . The above strategy has the advantage of reweighing the contributions of low and mid popular items with respect to high popular ones: the ratio between the most popular and the lowest popular is approximately 1/𝛾. However, this weighting scheme has a main disadvantage. The weight is relative to a positive item 𝑖, but it is associated with pairs (𝑖, 𝑗) ∈ 𝑃𝑒 . That is, besides weighting the contribution of 𝑖, this scheme also overexposes (or dually underexposes) the contribution of the negative item 𝑗. To avoid this, an alternative strategy consists in changing the sampling scheme that produces π’Ÿπ‘’ . In practice, rather than uniformly sampling, for each positive item, a fixed number 𝑛 of negative items, we can apply an inversely stratified sampling where 𝑛𝑖 negatives are sampled, with 𝑛𝑖 being inversely proportional to the popularity of the max(f) item: 𝑛𝑖 = 𝑛 β‹… ⌈ 𝑓 βŒ‰. 𝑖 The ratio with the above formula is to provide the same visibility to each positive item in the loss function. Thus, the most popular item will be associated with exactly 𝑛 pairs. By contrast, low and mid popular items will be overexposed in the comparison. We call this variant RVAE 𝑆 . The third strategy consists in combining the baseline RVAE model with RVAE 𝐿 . For a user 𝑒, let 𝜁𝐡 the the score vector produced by RVAE, and 𝜁𝐿 the score produced by RVAE 𝐿 . We define RVAE 𝐸 and the model that produces the score 𝜁𝐸 defined as 𝜁𝐸 = Softmax(𝜁𝐡 ) + 𝛿m β‹… Softmax(𝜁𝐿 ). The vector m masks all items but the low popular. The scores are normalized (via the Softmax function) to make the two models comparable. Finally, 𝛿 is a weight aimed at tuning the boost for the low popular items, as devised by RVAE 𝐿 . We experimentally found an optimal tuning with 𝛿 = 0.4. HR@1 HR@5 HR@10 Global Low Med High Global Low Med High Global Low Med High RVAE 0.2510 0.00 0.16 0.43 0.5853 0.01 0.47 0.83 0.7443 0.05 0.65 0.94 Movielens RVAE π‘Š 0.0854 0.01 0.13 0.01 0.2958 0.07 0.40 0.10 0.4661 0.11 0.59 0.23 RVAE 𝑆 0.1245 0.06 0.12 0.13 0.3875 0.16 0.38 0.40 0.5825 0.28 0.57 0.61 RVAE 𝐸 0.2466 0.05 0.16 0.42 0.5506 0.22 0.43 0.80 0.6965 0.37 0.59 0.91 RVAE 0.2731 0.13 0.23 0.46 0.6992 0.46 0.66 0.86 0.8764 0.65 0.86 0.95 Pinterest RVAE π‘Š 0.2196 0.29 0.23 0.18 0.6544 0.68 0.66 0.64 0.8592 0.80 0.86 0.87 RVAE 𝑆 0.2482 0.11 0.23 0.35 0.6770 0.45 0.65 0.80 0.8664 0.64 0.86 0.93 RVAE 𝐸 0.2726 0.26 0.22 0.46 0.6954 0.63 0.65 0.86 0.8620 0.80 0.84 0.94 RVAE 0.2875 0.07 0.22 0.67 0.6021 0.29 0.57 0.90 0.7638 0.48 0.75 0.95 CiteUlike RVAE π‘Š 0.2711 0.09 0.25 0.46 0.6156 0.37 0.61 0.76 0.7698 0.55 0.77 0.87 RVAE 𝑆 0.3890 0.32 0.39 0.43 0.7264 0.61 0.73 0.76 0.8399 0.73 0.84 0.90 RVAE 𝐸 0.2805 0.16 0.21 0.66 0.5634 0.50 0.50 0.87 0.6941 0.71 0.64 0.92 Table 2 Comparative analysis. Table 2 summarizes the results of the evaluation. We see that, in general, all the strategies considerably improve the response of the model to the low popular items. However, RVAE π‘Š has a low response on the high popular items. By contrast, RVAE 𝑆 succeeds in boosting performance on both the low popular and the mid popular items. Overall, the ensemble RVAE 𝐸 provides the best response, by boosting low popular items without substantially degrading over the other classes. 5. Conclusions The approach proposed in this paper is a preliminary study: We introduce a Ranking collabo- rative filtering algorithm (RVAE) and study how the algorithm is affected by popularity bias. Next, we show how simple techniques based on reweighting/resampling and/or ensembling can recalibrate the recommendations. There are several aspects that are worth further investi- gation. First of all, both the weighting and the inverse stratified sampling schemes are based on hyperparameters that need to be carefully tuned. Also, the ensemble strategies are simple and more complex schemes that also take into account other model instantiations can be studied. We reserve the attention to these challenges in a future work. References [1] V. Tsintzou, E. Pitoura, P. Tsaparas, Bias disparity in recommendation systems, CoRR abs/1811.01461 (2018). URL: http://arxiv.org/abs/1811.01461. [2] C. C. Aggarwal, Recommender Systems, Springer, 2016. [3] B. Friedman, H. Nissenbaum, Bias in computer systems, ACM Trans. Inf. Syst. 14 (1996) 330–347. [4] Z. Zhu, X. Hu, J. Caverlee, Fairness-aware tensor-based recommendation, in: ACM International Conference on Information and Knowledge Management, CIKM ’18, 2018, p. 1153–1162. [5] Y. Deldjoo, V. W. Anelli, H. Zamani, A. Bellogin, T. Di Noia, A flexible framework for evaluating user and item fairness in recommender systems, User Modeling and User- Adapted Interaction (2021). [6] O. Celma, P. Cano, From hits to niches?: Or how popular artists can bias music recom- mendation and discovery, in: 2nd KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition, 2008. [7] H. Steck, Item popularity and recommendation accuracy, in: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ’11, 2011, p. 125–132. [8] R. Borges, K. Stefanidis, On measuring popularity bias in collaborative filtering data, in: EDBT Workshop on BigVis 2020: Big Data Visual Exploration and Analytics, 2020. [9] H. Abdollahpouri, M. Mansoury, R. Burke, B. Mobasher, The unfairness of popularity bias in recommendation, CoRR abs/1907.13286 (2019). URL: http://arxiv.org/abs/1907.13286. [10] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, Bpr: Bayesian personalized ranking from implicit feedback, in: Conf. on Uncertainty in Artificial Intelligence, UAI ’09, 2009, pp. 452–461. [11] T. Ebesu, B. Shen, Y. Fang, Collaborative memory network for recommendation systems, in: ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’18, 2018. [12] Z. Zhu, J. Wang, J. Caverlee, Measuring and mitigating item under-recommendation bias in personalized ranking systems, in: ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, 2020, p. 449–458. [13] H. Abdollahpouri, R. Burke, B. Mobasher, Managing popularity bias in recommender systems with personalized re-ranking, CoRR abs/1901.07555 (2019). URL: http://arxiv.org/ abs/1901.07555. [14] D. Liang, R. G. Krishnan, M. Hoffman, T. Jebara, Variational autoencoders for collaborative filtering, in: ACM Conf, on World Wide Web, WWW ’18, 2018, pp. 689–698. [15] D. P. Kingma, M. Welling, Auto-encoding variational bayes, in: 2nd International Conference on Learning Representations, ICLR’14, 2014. arXiv:http://arxiv.org/abs/1312.6114v10.