<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Collaborative Filtering for Popularity-Aware Recom mendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>(Discussion Paper)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ettore Ritacco</string-name>
          <email>ettore.ritacco@icar.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luciano Caroprese</string-name>
          <email>luciano.caroprese@icar.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Manco</string-name>
          <email>giuseppe.manco@icar.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Minici</string-name>
          <email>marco.minici@icar.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Sergio Pisani</string-name>
          <email>francescosergio.pisani@icar.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop Proceedings</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Recommender Systems, Collaborative Filtering, Deep Learning, Big Data</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ICAR-CNR</institution>
          ,
          <addr-line>Via Bucci, 8-9c, Rende</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We analyze the behavior of recommender systems relative to the popularity of the items to recommend. Our findings show that most popular ranking-based recommenders are biased towards popular items, thus afecting the quality of recommendation. Based on these observations, we propose a new deep learning architecture with an improved learning strategy that significantly improves the performance of such recommenders on low-popular items. The proposed technique is based on two main aspects: resampling of negatives and ensembling of multiple instances of the algorithm. Experimental results on traditional benchmark datasets show that the proposed approach substantially improves the recommendation ability by balancing accurate contributions almost independently from the popularity of the items to recommend.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The massive amount of information available to users in the form of digital catalogs and online
services allows users to access and consume online content, but at the same time it poses a choice
paradox. Purchasing items on e-commerce sites, selecting movies on streaming platforms or
connecting with other peers on social networks implies making choices among a huge amount
of elements. Recommender Systems play a crucial role within this context since they provide
users with a better experience through the recommendation of new content and data that users
are likely to appreciate.</p>
      <p>Recommendations can have a disruptive impact: if on one side, they assist users in their
choices, on the other side they can influence the choices themselves. Indeed, Recommender
systems can favor particular categories of items or particular brands over others, thus introducing
a bias within the catalog [1]. A typical bias is intrinsic to the nature of recommendation. In
systems involving interaction between significant amounts of users and items, we observe the
presence of very few very popular items and many less popular others. This distribution follows
the so-called 80-20 rule that refers to the fact that 80% of users express preferences for only 20%
of the available items. In practice, user preferences follow a long tail distribution. Recommender
Systems based on collaborative filtering [ 2], who tend to characterize user preferences from
interactions, are afected by a relevant problem: they tend to suggest very popular items and
neglect less popular ones. This way, there is a reinforced efect because the most popular items
will become more and more popular than the less known ones. This phenomenon is called
popularity bias.</p>
      <p>The quality of the recommendations is inevitably afected by the popularity bias, since most
recommendations are prone to contain objects that the user will consider trivial and well known.
By contrast, the capability of recommending items belonging to the long tail (i.e., less popular)
can disclose new perspectives: niche items, little-known objects, hidden gems that satisfy the
user’s preferences can greatly improve user engagement, provided that the recommended items
are consistent with user’s taste.</p>
      <p>The current literature has extensively studied the problems of fairness and bias [3, 4, 5] within
recommender systems, with particular emphasis to popularity bias [6, 7, 8]. Notably, in [9], it is
shown that unfair recommendations are concentrated on groups of users interested in long-tail
and less popular items. In this paper we focus our attention on recommender sytems based on
ranking [10, 11], which are extremely flexible and as a consequence particularly interesting to
unbias. Typical approaches [12, 13] consider techniques to increase the representation of less
popular items, by either post-processing techniques or constraining the recommendation score.
Despite such recent eforts, unbiasing the recommendation from popular items is still an open
problem.</p>
      <p>In this paper we devise a ranking-based recommender system for implicit feedback (RVAE),
based on a variational autoencoder architecture. The proposed model is a substantial extension of
the MVAE model proposed in [14] and in fact it inherits its accuracy and computational eficiency.
We analyze the behavior of RVAE and show that the model is characterized by the popularity
bias problem. We then propose an experimental study of specific techniques to overcome it.
The proposed technique is based on two main concepts: resampling/reweighting items and
ensembling of multiple instances of the algorithm. Our experiments show that these simple
strategies allow to unbias the algorithm and hence provide more efective recommendations.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Basic Framework</title>
      <p>We start by setting the notation that we shall use throughout the paper. In the context of
collaborative filtering,  ∈  = {1, … ,  } indexes a user and ,  ∈  = {1, … ,  } index items
for which the user can express a preference. We model implicit feedback, thus assuming a
preference matrix X ∈ {0, 1 }× , so that  , = 1 whenever user  expressed a preference for
item  , and  , = 0 otherwise. Also, x is the (binary) row indexed at  , representing all the item
preferences for user  . Given x , we define   = { ∈  | , = 1} (with   = |  |). The preference
matrix induces a natural ordering between items:  ≺   has the meaning that  prefers  to  , i.e.
 , &gt;  , in the rating matrix. Our objective is to devise a model for such an ordering.
Preference Modeling. We consider a general framework where preferences are modeled as
the efect of latent factors ultimately characterizing users and/or items. We shall consider two
basic instantiations of this general idea, and will provide a unified framework.</p>
      <p>The first situation we consider is the Multinomial Variational Autoencoder (MVAE) framework
proposed in [14]. Within this framework, for a given user  the related x is modeled as the
efect of a multinomial distribution governed by a prior z, i.e.</p>
      <p>x ∼ Discrete ( ( z))
 ( z) ∝ exp {  (z)}
 ( x ) = ∫  ( x |z) ( z) dz
Here,   (⋅) represents a neural network parameterized by  . The latent variable z is modeled by
a prior  ( z) (typically a gaussian distribution). Thus, the probability of preferences for a given
user can be expressed as
Due to the intractability of the above integral, [15] devise a variational approach based on a
proposal ( z|x ) that approximates the posterior distribution. Again,  is modeled as a gaussian
distribution</p>
      <p>( z|x ) =  ( z;   ,   ),
where   ,   =   (x ) and   is a neural network parameterized by  . By exploiting the inequality
log  ( x ) ≥  z∼(⋅| x ) [log  ( x |z) −  ( z)]
we can finally learn the ,  parameters by optimizing the loss</p>
      <p>ℓ 
∑ { z∼  (⋅|x ) [log   (x |z)] − [
The overall framework is based hence on regularized encoder-decoder scheme, where   (z|x )
represents the encoder,   (x |z) represents the decoder and the term  z∼  (⋅|x ) [ ( z)] acts as a
regularizer. In the training phase, for each  a latent variable z ∼   (⋅|x ) is devised. Next, z is
exploited to devise the probability   (x |z). Users with low probability are penalized within the
loss and the network parameters can be updated accordingly.</p>
      <p>Prediction for new items is accomplished by resorting to the learned functions   and   :
given a (partial) user history x , we compute z =   and then devise the probabilities for the
whole item set through  ( z). Unseen items can then be ranked according to their associated
probabilities.</p>
      <p>The second formulation we consider is inspired by the Bayesian Personalized Ranking (BPR)
model introduced in [10]. The idea underlying this model is that a preference  ≺   can be
directly explained as closeness in a latent space where both items and users can be mapped.
Mathematically this can be devised by computing a factorization rank p q for each pair (, ) ,
and modeling preferences by means of a Bernoulli process:
 ≺   ∼ Bernoulli()</p>
      <p>=  (p (q − q ))
where  () =</p>
      <p>(1 +  − )−1 is the logistic function. The optimal embeddings P and Q can hence
be obtained by opimizing the loss
ℓ
(P, Q) ≈ ∑ ∑ log  (p (q − q )
 ,</p>
      <p>≺</p>
      <p>We combine the two frameworks by adapting the BPR loss to the MVAE model. In particular,
instead of modeling  ( x |z), we directly model  ( ≺  | z) within a similar variational framework.
In short, the current preferences are encoded within a latent variable z that is further exploited
to decode all ranks:
 ≺   ∼ Bernoulli( , )
 , =  (  −   )</p>
      <p>=   (z)
Here,  represents the output of a neural network parameterized by  . For a given item  , the
value   represents then the associated rank which can be used sort all preferences. The model
can be obtained by optimizing the loss:

,
≺  
ℓ
it is usually customary to only consider a subset   ⊂ {(, )|,  ∈  ;  ≺
 } . The sampling of
  is critical for determining the behavior of any predictive model; the most used approach
in literature is to uniformly sample, for each user  and item  (called positive item), a fixed
number of items { 1, …   } ⊂  −   with the underlying assumption that ∀ ∶  ≺    . Thus, Eq.
(1) can be rewritten as:
ℓ 
∑ { z∼  (⋅|x ) [ ∑ log   ( ≺  | z)] − [</p>
    </sec>
    <sec id="sec-3">
      <title>3. The Impact of Popularity</title>
      <p>We start our analysis on the following popular benchmark datasets: i) Movielens, a time
series dataset containing user-item ratings pairs along with the corresponding timestamp; ii)
Pinterest, based on the social media that allows users to save or pin an image (item) to their
board. The dataset denotes as 1 the pinned images for each user; iii) CiteUlike, a dataset
obtained from the homonymous service which provides a digital catalog to save and share
(1)
(2)
1007 low-popular items (avg pop: 6)
103 1243627tompi-dp-oppouplualrairteitmems(sa(vagvgpoppo:p1:213694))
103 71766270 lmowid--ppooppuullaarriitteemmss((aavvggppoopp::1185)3)</p>
      <p>529 top-popular items (avg pop: 568)
academic papers. Within fig. 1 we plot all the items within the datasets, by increasing popularity.
For each dataset we identify three classes: low, mid and high popular items.</p>
      <p>We then study the behavior of the RVAE model with respect to the popularity classes defined.
To do so, we adopt the following protocol. For each dataset, 70% of users are randomly sampled
with all user’s items. Each such user is associated with x and a set   of positive/negative
item pairs. In particular, we consider all positive items within x , and for each positive item 
we sample  = 4 negative items. The remaining 30% users are uniformly split into validation
and test. In particular, for each user  we consider a random subset   ⊂   representing the
30% of the positive items, and   represents a subset of 100 negative items sampled from  −   .
The vector x is masked to remove all elements in   . We then feed the masked x to obtain
the score vector   . Now, for a given cutof value  , let us consider the  − 1 negative items for
 for which RVAE gives the highest score, and, among them the item  having the minimum
score. There is an hit with cutof  , for the user  and the item  ∈   , if  , ≥  , . Let    the
number of hits for the user  with cutof  . We define the Hit-Rate at  on  as HR@ = ∑∈    .
∑∈ |  |
We can trivially specialize this definition for items within the low, medium and high popularity,
by considering only items in   that belong to the specific class. The results of the evaluation
are summarized in table 1a. We can see that the model sufers mainly on low-popular items.
As a matter of fact, the overexposure of popular items is predominant and the model learn to
predict essentially those items. The fact is that popular items are easy to predict. However, it is
in the mid and especially on low popular items that the most interesting predictions can take
place: niche items are dificult to discover by an end user and hence their accurate suggestions
can greatly improve user engagement. The research question is hence: how can we boost the
model to improve the performance on low-popular items?</p>
    </sec>
    <sec id="sec-4">
      <title>4. Unbiased Recommendation</title>
      <p>In a simple experiment, we retrain the model by only considering pairs (, ) ∈   such that
 is in the low popular class. We call this model RVAE . Compared to the results in table 1a,
the results for this restricted model (shown in table 1b) show that if attention is placed on low
popular items, their response on prediction accuracy can be improved. Similar results can be
observed for the mid popular items. Thus, in order to unbias the model we need to rebalance
Low
0.00
Low
0.01
Low
0.05
achieve this goal, we study three diferent strategies.</p>
      <p>The first strategy consists in weighting, for each pair (, ) ∈   , the contribution to the loss
with a factor inversely proportional to the popularity of the item  :
  =
 (1+ (  −−1) )−1+1
 +1
item:   =  ⋅ ⌈ ma x(f)⌉.</p>
      <p>Here,   represents the number of occurrences of  , and , , 
the parameters representing the
steep, center and scale of the decay of high popular items. We experiment with  = 0.01 , 
representing the average frequency of mid-popular items and  = 100 and call this variant
RVAE . The above strategy has the advantage of reweighing the contributions of low and mid
popular items with respect to high popular ones: the ratio between the most popular and the
lowest popular is approximately 1/ . However, this weighting scheme has a main disadvantage.
The weight is relative to a positive item  , but it is associated with pairs (, ) ∈  
. That is,
besides weighting the contribution of  , this scheme also overexposes (or dually underexposes)
the contribution of the negative item  . To avoid this, an alternative strategy consists in changing
the sampling scheme that produces   . In practice, rather than uniformly sampling, for each
positive item, a fixed number  of negative items, we can apply an inversely stratified sampling
where   negatives are sampled, with   being inversely proportional to the popularity of the</p>
      <p>The ratio with the above formula is to provide the same visibility to each positive item in the
loss function. Thus, the most popular item will be associated with exactly  pairs. By contrast,
low and mid popular items will be overexposed in the comparison. We call this variant RVAE .</p>
      <p>The third strategy consists in combining the baseline RVAE model with RVAE . For a user  ,
let   the the score vector produced by RVAE, and   the score produced by RVAE . We define
RVAE and the model that produces the score   defined as 
 = Softmax ( ) +  m ⋅ Softmax (  ).</p>
      <p />
      <p>The vector m masks all items but the low popular. The scores are normalized (via the Softmax
function) to make the two models comparable. Finally,  is a weight aimed at tuning the boost
for the low popular items, as devised by RVAE . We experimentally found an optimal tuning
with  = 0.4 .</p>
      <p>s RVAE
le RVAE
n
e
io RVAE
v
M RVAE
t RVAE
se RVAE
r
iten RVAE
P RVAE
e RVAE
lik RVAE
i RVAE
U
e
t
C RVAE
considerably improve the response of the model to the low popular items. However, RVAE has
a low response on the high popular items. By contrast, RVAE succeeds in boosting performance
on both the low popular and the mid popular items. Overall, the ensemble RVAE provides the
best response, by boosting low popular items without substantially degrading over the other
classes.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>The approach proposed in this paper is a preliminary study: We introduce a Ranking
collaborative filtering algorithm (RVAE) and study how the algorithm is afected by popularity bias.
Next, we show how simple techniques based on reweighting/resampling and/or ensembling
can recalibrate the recommendations. There are several aspects that are worth further
investigation. First of all, both the weighting and the inverse stratified sampling schemes are based on
hyperparameters that need to be carefully tuned. Also, the ensemble strategies are simple and
more complex schemes that also take into account other model instantiations can be studied.
We reserve the attention to these challenges in a future work.
330–347.</p>
      <p>1153–1162.
[1] V. Tsintzou, E. Pitoura, P. Tsaparas, Bias disparity in recommendation systems, CoRR
abs/1811.01461 (2018). URL: http://arxiv.org/abs/1811.01461.
[2] C. C. Aggarwal, Recommender Systems, Springer, 2016.
[3] B. Friedman, H. Nissenbaum, Bias in computer systems, ACM Trans. Inf. Syst. 14 (1996)
[4] Z. Zhu, X. Hu, J. Caverlee, Fairness-aware tensor-based recommendation, in: ACM</p>
      <p>International Conference on Information and Knowledge Management, CIKM ’18, 2018, p.
[5] Y. Deldjoo, V. W. Anelli, H. Zamani, A. Bellogin, T. Di Noia, A flexible framework for
evaluating user and item fairness in recommender systems, User Modeling and
UserAdapted Interaction (2021).
[6] O. Celma, P. Cano, From hits to niches?: Or how popular artists can bias music
recommendation and discovery, in: 2nd KDD Workshop on Large-Scale Recommender Systems
and the Netflix Prize Competition, 2008.
[7] H. Steck, Item popularity and recommendation accuracy, in: Proceedings of the Fifth</p>
      <p>ACM Conference on Recommender Systems, RecSys ’11, 2011, p. 125–132.
[8] R. Borges, K. Stefanidis, On measuring popularity bias in collaborative filtering data, in:</p>
      <p>EDBT Workshop on BigVis 2020: Big Data Visual Exploration and Analytics, 2020.
[9] H. Abdollahpouri, M. Mansoury, R. Burke, B. Mobasher, The unfairness of popularity bias
in recommendation, CoRR abs/1907.13286 (2019). URL: http://arxiv.org/abs/1907.13286.
[10] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, Bpr: Bayesian personalized
ranking from implicit feedback, in: Conf. on Uncertainty in Artificial Intelligence, UAI ’09,
2009, pp. 452–461.
[11] T. Ebesu, B. Shen, Y. Fang, Collaborative memory network for recommendation systems,
in: ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR
’18, 2018.
[12] Z. Zhu, J. Wang, J. Caverlee, Measuring and mitigating item under-recommendation bias in
personalized ranking systems, in: ACM SIGIR Conference on Research and Development
in Information Retrieval, SIGIR ’20, 2020, p. 449–458.
[13] H. Abdollahpouri, R. Burke, B. Mobasher, Managing popularity bias in recommender
systems with personalized re-ranking, CoRR abs/1901.07555 (2019). URL: http://arxiv.org/
abs/1901.07555.
[14] D. Liang, R. G. Krishnan, M. Hofman, T. Jebara, Variational autoencoders for collaborative
ifltering, in: ACM Conf, on World Wide Web, WWW ’18, 2018, pp. 689–698.
[15] D. P. Kingma, M. Welling, Auto-encoding variational bayes, in: 2nd
International Conference on Learning Representations, ICLR’14, 2014.
a r X i v : h t t p : / / a r x i v . o r g / a b s / 1 3 1 2 . 6 1 1 4 v 1 0 .</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>