Exploring social network effects on popularity biases
                    in recommender systems
                                             Rocío Cañamares and Pablo Castells
        Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática
                                   {rocio.canamares,pablo.castells}@uam.es


ABSTRACT                                                                  mendation, b) what causes, factors and conditions determine and
Recommending items ranked by popularity has been found to be a            explain such effectiveness, and c) whether the apparent effective-
fairly competitive approach in the top-N recommendation task. In          ness actually reflects true effectiveness, or is the result of a distor-
this paper we explore whether popularity should always be ex-             tion of some sort in the evaluation methodologies. We address
pected to be an effective approach for recommendation, and what           such questions in this paper.
causes, factors and conditions determine and explain such effec-          Popularity-based recommendation exploits biases in the distribu-
tiveness. We focus on two fundamental potential sources of biases         tion of available observed ratings among items –or equivalently, of
in rating data which determine the answer to these questions: item        the distribution of missing ratings. Thus studying the properties of
discovery by users, and rating decision. We research the role of          popularity is essentially the same as studying the characteristics of
social communication as a major source of item discovery biases           rating distributions, and their biases. In unbiased situations (where
(and therefore rating biases). We undertake the study by defining a       ratings are uniformly distributed), popularity is equivalent to ran-
probabilistic model of such factors, and running simulations where        dom recommendation and makes no particular sense as a recom-
we analyze the relationships between the effectiveness of populari-       mendation strategy. Popularity therefore makes sense when rating
ty and different configurations of social behavior.                       data is biased or, in other words, missing not at random [7,11].

Keywords                                                                  In this paper we focus on two fundamental potential sources of
                                                                          biases in rating data: item discovery biases, and rating decision
Popularity, social networks, evaluation, viral propagation.
                                                                          biases. The latter refers to the factors that determine whether or not
1. INTRODUCTION                                                           a user decides to rate an item he has interacted with; for instance,
Recommending items ranked by popularity has been found to be a            in many cases users may be typically more prone to rate items they
fairly competitive approach in the top-N recommendation task [5].         have liked than items they have not liked. The former refers to the
It may be to some initial surprise that a trivial and non-                fact that in order to be rated by a user, the user needs first to be-
personalized recommendation method can be this effective, some-           come aware that the item exists. Biases in item discovery distribu-
what contradicting the implicit intuition underlying the recom-           tion then naturally result in biases in the items that ultimately get
mender systems field that personalized recommendations should             more ratings or less.
have the potential to maximize overall user satisfaction, by achiev-      Discovery biases are determined by the sources by which users
ing an optimal fit of users’ needs on an individual basis, as op-         discover items. People get to know items through a variety of
posed to a one-size-fits-all approach.                                    channels such as direct user searches, advertisement from provid-
Some authors have analyzed this issue recently [8,11,12,13,14], and       ers, random encounter, suggestions from a recommender system,
have proposed specific techniques to consider the biases in the           etc. Beyond this and foremost, our social environment is a key
distribution of missing ratings, in both the recommendation algo-         source of information and discovery for which people have a par-
rithms and the evaluation methodology and metrics. The question           ticular reliance and trust compared to other channels. The perspec-
has also been addressed from the perspective of the actual utility of     tive of the role word of mouth has in the distribution of ratings
recommendation: recommending popular items has the obvious                connects the problem at hand to an issue of network propagation:
shortcoming of a lack of surprise for the user, approximating (by         the items that propagate faster and farther in the social network
definition) the worst possible results in terms of the novelty dimen-     will tend to get more ratings.
sion [4]. Despite this obvious shortcoming, popular recommenda-           Propagation phenomena have been extensively studied in the area
tions appear to be reasonably effective in practice (e.g. as a fallback   of complex networks, and social networks in particular (for diseas-
option), item popularity is actually an (intentional or accidental)       es, rumors, viral effects, etc.) [1,6,9,10], but with scarce exceptions
ingredient of many state of the art recommendation algorithms, and        [3] the connection to biases in user rating distribution have been
commercial applications seem to be using it among other signals in        barely examined before. Yet we find that network effects can be a
recommendation functionalities. However, we barely find in the            major explanatory factor for recommendation data biases and
literature a clear analysis of the causes and characteristics of the      popularity effects.
popularity biases, and the relationship between the popularity dis-       In this paper we address this perspective. We posit in particular the
tribution and the potential consequences in the performance and           following potential key factors in creating popularity biases, de-
evaluation of recommendation algorithms.                                  termining whether popularity becomes or not a good strategy to
It is natural to wonder a) whether popularity should always be            achieve recommendation effectiveness:
expected to be an effective approach (or partial signal) for recom-
                                                                            User behavior in their communication with peers, in particular
                   th
Proceedings of the 6 Workshop on Recommender Systems and the Social          the biases towards positive or negative experiences when shar-
Web (RSWeb 2014), collocated with ACM RecSys 2014, 10/06/2014,               ing one’s experiences with others, and the overall frequency
Foster City, CA, USA. Copyright held by the authors.                         with which users intercommunicate in social networks.
  User behavior in rating decisions, in particular, biases towards      2.1 Random variables and parameters
   rating positive or negative experiences.                              We formally model the described process in terms of a set of bina-
  Social network structure, in particular link density and cluster-     ry random variables defined upon different sample spaces combin-
   ing structure.                                                        ing the set 𝒰 of all users, the set ℐ of all items, and the set 𝒯 of all
We undertake this study by representing the involved factors in a        time points we may consider in the model, as follows:
probabilistic model defined by random variables subject to inter-          Rating: 𝑟𝑎𝑡𝑒𝑑: ℐ × 𝒰 × 𝒯 → {0,1} takes value 1 for a sampled
dependent distributions. Based on the model, the problem can be             element (𝑖, 𝑢, 𝑡) if user 𝑢 ∈ 𝒰 has rated item 𝑖 ∈ ℐ by or before
approached, complementarily, by a formal analysis, or by empiri-            time 𝑡 ∈ 𝒯, and 0 otherwise.
cal observation through simulations. In this paper we pursue the
latter path. We identify the key variables, parameters and depend-         Relevance: 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡: ℐ × 𝒰 → {0,1} takes value 1 if the sam-
encies describing the factors we aim to focus on and we explore,            pled user likes the sampled item, and 0 otherwise. Notice that
through simulation based on the proposed model, the resulting               this variable is not observed unless it becomes visible to the
effects on the effectiveness of popularity, aiming to identify differ-      system when the user rates the item. Note also that we assume
ent situations and uncover potential explanations thereof.                  as a simplification that relevance is a static condition and does
                                                                            not change with time or context.
2. A SOCIAL RATING GENERATION                                              Discovery: 𝑠𝑒𝑒𝑛: ℐ × 𝒰 × 𝒯 → {0,1} is 1 if the user is aware
MODEL                                                                       the item exists by or before the time point at hand, and 0 oth-
We start our analysis by formalizing the fundamental actions,               erwise. The same as relevance, this variable is not observed un-
events and variables involved in the rating generation process,             less a user rates an item, in which case we know he must have
upon which we will formally identify and formalize the key factors          seen it to begin with.
for the phenomena we aim to observe (user behavior trends and                As mentioned before, we ignore the distinction between know-
related network processes), and their relations to resulting effects         ing an item exists (e.g. we have seen a movie title on a bill-
(data biases and effectiveness variations in popularity-based rec-           board), and actually experiencing the item (e.g. we actually
ommendation), in the form of probabilistic dependencies and                  watch the movie). The difference can be worth being consid-
model parameters.                                                            ered as it involves a decision on the part of the user, but it is
In order to generate input data for an item, a user needs to become          not required for the focus of our present analysis.
aware that the item exists, decide to interact with the item, and then
                                                                           Communication: 𝑡𝑒𝑙𝑙: 𝒰 × ℐ × 𝒰 × 𝒯 → {0,1} is 1 if a user
decide to rate it. Popularity biases in recommender system input
                                                                            tells a given friend about a given item at a given time when
data can be therefore related to two main factors: a) biases in the
                                                                            both friends talk to each other, and 0 otherwise.
items that users discover: some items become known to many
more users than others; and b) biases in the items that users decide         We consider that users only share experiences with people they
to rate (or consume or interact with): once a user experiences an            are connected with in a social network. This does not involve
item, there may be some systematic reason why users decide to                any loss of generality, as we do not make any assumption on
rate certain items and not others.                                           the nature of the network at this point. The simplifying re-
                                                                             striction will be made in our experiments, where we will use or
The primary necessary steps by which a rating value is generated
                                                                             simulate specific social network structures and assume (as a
can be thus identified as follows:
                                                                             simplification) they embody full knowledge of all connections
 1. A user discovers an item, i.e. he becomes aware that the item            between users.
    exists.
                                                                         The key distributions and dependencies which capture the relevant
 2. The user decides to interact with (consume, click, play, etc.)
                                                                         factors in the behavior of the defined model can be expressed in
    the item.
                                                                         terms of conditional probabilities. We would mainly foresee two
 3. The user decides to rate the item.
                                                                         such key dependencies: the propensity of users to rate items they
Moreover, in a social environment, we consider an additional             like vs. items they do not like, and their inclination to share posi-
relevant action by users on items:                                       tive vs. negative experiences. This can be expressed by four condi-
 4. The user shares with some of his friends his experience with         tional distributions:
    the item. This brings to step 1 (discovery) each person in-                           𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑢, 𝑖, 𝑣, 𝑡)
    formed by the user about the item.                                                    𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑢, 𝑖, 𝑣, 𝑡)
The distinction between steps 2 and 3 is not a clear cut or simply                        𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖, 𝑢, 𝑡)
inexistent in common applications, where users do not enter ex-                           𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖, 𝑢, 𝑡)
plicit ratings, and user-item interaction data are used as input in-     If we make the simplifying assumption that the decision to rate and
stead by recommendation algorithms; for this reason and the sake         share mainly depends on the relevance of the item, and we ignore
of simplicity we shall ignore the difference in our model.               for a moment the differences between users in this respect, as well
These steps thus create a cycle by which users become aware of           as the possible variations in user behavior over time, we may
items or, from the item perspective, items progressively traverse        consider the approximation 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖, 𝑢, 𝑡) ∼
the social network of users, becoming known to the users they            𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) –and the same for the other four distribu-
come across, and becoming rated by some of them. How far and             tions– in such a way that we have four conditional probabilities
what regions an item reaches in the network depends on the intrin-       defining two behavioral dimensions, which may act as configura-
sic communication patterns of users in the network, the depend-          tion parameters of the model:
ence of the latter on characteristics of the items, and the shape and      Communication/relevance bias: 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) and
connectivity of the network, which is known to affect the devel-            𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡).
opment of network propagation phenomena [6].                               Rating/relevance bias: 𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) and
                                                                            𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡).
We are interested in studying how these parameters affect the effec-       (once the friend and the item have been sampled) according to
tiveness of popularity-based recommendation. For that purpose, we          𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) if the user likes the item, and
shall simulate an environment the dynamics of which are based on           𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) if he does not.
the proposed model, run recommendations in that environment using          In order to carry out the above simulated actions, it is apparent that
the generated ratings, and measure their effectiveness according to        we should know whether a given user likes a given item at the time
the simulated data. The model defines a few rules that changes of          when this determines the probabilities of the user’s decisions.
state in the environment should be governed by, but in order to            Relevance is in general an unobserved variable for the system
simulate the environment dynamics we need to define a set of trig-         (until a user rates an item) and for the user himself (until he dis-
gering actions and events, and the order in which they take place.         covers an item). We deal with this lack of observation by simulat-
2.2 Model dynamics                                                         ing relevance knowledge as a certain user-item relevance distribu-
We consider the following simplified scenario that represents how          tion. This knowledge will remain hidden to the system (in particu-
items may become known to users and eventually rated. We have a            lar to the recommender systems we will run in our experiments),
population of users and a set of items. Based on the model defined         but will be made “visible” to a) the simulated users when they
in the previous subsection, user-item pairs undergo a sequence of          discover an item, and b) the computation of recommendation
states, from unknown to discovered to rated, in this order, where          effectiveness metrics, as we will explain shortly.
the two latter states may or may not be ever reached. Items are            Our model does not make any assumption about the relevance distri-
discovered by users through friends: at certain points in time, users      bution, but in our experiments, we assume the number of users who
choose a friend and an item they have discovered, and decide               like an item (which is equal to 𝑝(𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡|𝑖) multiplied by the
whether or not to tell the friend about the item. If communication         number of users) has a long-tail distribution shape. This is an arbi-
takes place, the friend discovers the item the user talks about (if he     trary decision in our work at this point, which could be contrasted by
had not discovered the item already). Communication takes place            means of a poll of some kind in a real setting. It does not seem to be
as a dialog, which means that the friend will in turn choose some          a critical aspect of the model in our simulations though. In order to
discovered item and (under the same relevance-based communica-             obtain the long tail shape we use an adjusted power law defined by
tion probability pattern) talk back about it to the first user, who        𝑝𝛼 (𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡|𝑖𝑘 ) = 𝑐1 + 𝛽(𝑘 + 𝑐2 )−𝛼 for the 𝑘-th most liked item.
will then discover this item. In our chosen configuration, users talk      The parameter 𝛼 ∈ [0, ∞) defines the steepness of the relevance
about an item on their own initiative only once at most, but they          distribution, where 𝛼 = 0 gives a uniform distribution. We adjust the
can talk about it any number of times when asked.                          remaining parameters 𝑐1 , 𝑐2 and 𝛽 in such a way that –we omit the
Users thus discover items by communication through the social              details– the distribution adheres to a given prior 𝑝(𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡), and
network. However, initially all items are unknown to all users. In         the extremes of the curve for the most and least liked users (𝑖1 and
order to bootstrap the system, we may either define an initial state       𝑖|ℐ| ) behave as one would expect, that is: lim 𝑝𝛼 (𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡|𝑖1 ) = 1,
                                                                                                                                         𝛼→∞
where an arbitrary set of user-item pairs are in the discovered state       lim 𝑝𝛼 (𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡|𝑖|ℐ|)                =          0,         lim 𝑝𝛼 (𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡|𝑖1 )       =
(e.g. 𝑛 random users have discovered each item), or we include an          𝛼→∞                                                          𝛼→0
additional discovery source, extrinsic to the social network, through      lim 𝑝𝛼 (𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡|𝑖|ℐ|) = 𝑝(𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡). Figure 1 shows the shape of
                                                                           𝛼→0
which items may also become known. In our current implementation           the curve for 𝛼 = 1 and 𝑝(𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) = 0.2 with 3700 items.
we choose the second option. The source may represent e.g. catalog
                                                                           Given a distribution thus defined, we generate the sequence of the
browsing and searching, item advertisement, etc., and can be imple-
                                                                           number of users who like each item, and we assign this number
mented as random sampling (as slow and infrequent as we would
                                                                           randomly to the items. Then for each item, we assign the corre-
desire with respect to the overall simulation time flow) of user-item
                                                                           sponding number of users liking the item by randomly sampling
pairs for discovery, or biased sampling by some arbitrary distribu-
                                                                           the users. We thus create a scenario where each user likes a set of
tion, or even a recommender system. In our case we choose random
                                                                           items beforehand, although he does not know he likes an item until
sampling by a ratio of 0.1% of the simulation time step (that is, on
                                                                           he discovers it. If the user rates the item, the system will also know
average every 1 out of 1,000 simulation steps, users discover an item
                                                                           whether or not the user likes it: if he does, the rating value will be
at random with replacement –i.e. we do not force the sampled item
to be unknown and it may have been discovered already).                                    0.7

The decision to rate items and to share the experience with friends
                                                                                           0.6
can be required of the simulated users in different ways and order.
As a simplification, the decision to rate an item or not is made at the
                                                                                           0.5
time when the user discovers the item. If the user does not rate it, the
decision is not reconsidered anymore. Regarding communication, in
                                                                           p(relevant|i)


                                                                                           0.4
our chosen configuration each user is given a chance to talk about an
item to a friend once every simulation time unit –or inversely, the
                                                                                           0.3
time unit is defined as an iteration where every user is given the
chance to speak to a friend. The item is chosen uniformly at random
                                                                                           0.2
(without replacement if the user took the initiative) among the ones
the user has discovered, and the friend is sampled uniformly at ran-
                                                                                           0.1
dom (with replacement) from all the user’s social contacts.
The rating and sharing decisions, when the user is faced to them as                         0
explained in the previous paragraph, are taken based on the proba-                               0   500   1000       1500            2000     2500   3000   3500
bilistic model described in the previous subsection. That is, when a                                                              i
user discovers an item, he will rate it with probability                   Figure 1. Simulated relevance distribution for 𝜶 = 𝟏 with
𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) if the user likes the item, and with               prior 𝒑(𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕) = 𝟎. 𝟐 for 𝟑𝟕𝟎𝟎 items. Items are sorted
probability 𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) if he does not. Analogous-            from most to least liked in the 𝒙 axis, and the line represents
ly, the decision to talk or not to a friend about an item is taken         the ratio of users who like each item.
positive, and negative otherwise. We thus consider as a simplifica-                   RQ2. How does the observed and true precision of popularity-
tion that relevance is an immutable condition which does not                               based recommendation depend on the user bias towards rat-
change with time nor context.                                                              ing liked vs. non-liked items?
                                                                                      RQ3. Can certain social network characteristics and effects (such
3. ITEM POPULARITY RECOMMENDA-                                                             as its topology and viral phenomena) alter the dependencies
TION AND ITS EFFECTIVENESS                                                                 between user behavior and popularity precision?
Before we get into the empirical analysis of popularity effectiveness,
                                                                                      RQ4. May the observed and true precision disagree in terms of
we cast precise definitions of popularity and the metrics to assess its
                                                                                           how popularity compares to random recommendation, as a
effectiveness in recommendation. In the usual definition of populari-
                                                                                           consequence of particular social behavior patterns?
ty-based recommendation one finds in the literature, items are ranked
by their total number of ratings, regardless of whether the rating                    4.1 Experimental setup
values express a positive or negative preference [5]. Given the role                  For most of the observations we report next, we use the social net-
of relevance we are analyzing in our model, we find it worthy to also                 work data from Facebook made available by J. Leskovec [9], con-
consider a variant of popularity recommendation where only positive                   taining 88,234 social connections among 4,039 users. Taking inspi-
ratings are considered. We therefore study both variants in our ex-                   ration on the order of scale of MovieLens 1M, we take a total set of
periments, the scoring functions of which are defined as:                             3,700 items. We simulate a relevance distribution with prior
    Simple popularity: 𝑠(𝑢, 𝑖) = |{𝑣 ∈ 𝒰|𝑟(𝑣, 𝑖) ≠ ∅}|                                𝑝(𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) = 0.2 and steepness 𝛼 = 1, which results in the distri-
                                                                                      bution shown earlier in Figure 1. For discovery bootstrapping, in
    Relevant popularity: 𝑠(𝑢, 𝑖) = |{𝑣 ∈ 𝒰|𝑟(𝑣, 𝑖) ≥ 𝜏}|
                                                                                      addition to word of mouth, users discover an item at random 1 out of
where 𝑟(𝑣, 𝑖) is the rating assigned to 𝑖 by 𝑣, and 𝜏 is the threshold                every 1,000 simulation steps. We run all simulations until 500,000
value at or above which the rating expresses a positive preference.                   ratings have been generated, which is half the size of MovieLens
As a metric of recommendation effectiveness we shall take preci-                      1M. The reason for not running the simulations longer is to avoid the
sion 𝑃@𝑘, which is a simple to compute and analyze yet repre-                         distorting saturation effects that eventually start to appear as a con-
sentative metric. We shall distinguish between the observed preci-                    sequence of the implicit closed world assumption involved in having
sion 𝑃𝑜𝑏𝑠 , in which a recommended item is considered relevant if it                  a fixed set of users and items all along. E.g. in the limit all users end
has been rated by the target user with a positive value, and true                     up consuming and/or sharing most items regardless of their prefer-
precision 𝑃𝑡𝑟𝑢𝑒 , in which we use all the simulated relevance                         ences and behavior patterns, just by reason of exhaustion of any
knowledge, including that which has not become known to the                           better remaining option. In the results we present here, we run each
system in the form of ratings:                                                        simulation only once, in order to show how the observed patterns
                                                                                      can be perceived even without averaging random effects (having
              𝑃𝑜𝑏𝑠 @𝑘 = avg |{𝑖 ∈ 𝑅𝑢𝑘 |𝑟(𝑢, 𝑖) ≥ 𝜏}|⁄𝑘                                informally checked that the variation is moderate when averaging).
                          𝑢∈𝒰
               𝑃𝑡𝑟𝑢𝑒 @𝑘 = avg |{𝑖 ∈ 𝑅𝑢𝑘 |𝑢 likes 𝑖}|⁄𝑘                                In order to study the effects of the relevance biases in sharing and
                           𝑢∈𝒰                                                        rating user behavior, we shall fix certain parameters and observe
where 𝑅𝑢𝑘 is the set of top 𝑘 items recommended to 𝑢. Naturally 𝑃𝑜𝑏𝑠                  the variation of popularity precision as we vary the others.
is what offline experimental evaluations commonly report. This will
allow us to contrast precision as it is usually measured in offline                   4.2 Communication/relevance bias
recommender system experiments, to the true precision defined by                      To isolate the effect of communication biases, we take a relevance-
the full underlying user preferences which have determined the                        neutral rating behavior by (𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) =
generation of ratings and their distribution in our model.                            𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) = 1, that is, users always rate all
                                                                                      items they discover. Then, we shall vary 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡),
4. SIMULATION-BASED EMPIRICAL OB-                                                     𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) and the prior 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛) as we explain
                                                                                      next.
SERVATIONS
The proposed model allows representing different user behavior                                                                𝒑(𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕|𝒔𝒆𝒆𝒏)
patterns by different model configurations using different parame-                                                                       0.25
                                                                                 1
ter settings. In order to analyze the effect that such configurations                                                                                                          p(tell|seen)
produce on the effectiveness of popularity-based recommendation,                                       0   0 0 0 0 0 0 0 0 0          0
                                                                          p (tell |seen ,¬relevant )


                                                                                                                                                                                      0.9

we implement a simulation framework that runs the model dynam-                                         0   0 0 0 0 0 0 0 0 0          0                                               0.8
                                                                                                       0   0 0 0 0 0 0 0 0 0          0                                               0.7
ics described in the previous section. The framework supports the                                      0   0 0 0 0 0 0 0 0 0          0                                               0.6
integration of an arbitrary set of recommendation algorithms by                                                                          0.2
                                                                                                       0   0 0 0 0 0 0 0 0 0          0                                               0.5
just having them adhere to a simple abstract API. At each simula-                                      0   0 0 0 0 0 0 0 0 0          0                                               0.4
tion time step, the framework generates a temporal split of rating                                     0   0 0 0 0 0 0 0 0 0          0                                               0.3
                                                                                                       0   0 0 0 0 0 0 0 0 0          0                                               0.2
data with a 0.5/0.5 ratio of training/test data, runs all the recom-                                   0   0 0 0 0 0 0 0 0 0          0
                                                                                 0                                                                                                    0.1
mendation algorithms, and evaluates the observed and true preci-                                       0   0 0 0 0 0 0 0 0 0          0 0.15
sion for each of them. This way we can monitor how the perfor-                                         0   0 p0(tell
                                                                                                                 0 |seen
                                                                                                                     0 0 ,relevant
                                                                                                                           0 0 0) 0   1
                                                                                                                                      0      0.1   p(tell|seen,relevant)   1
mance of recommenders evolves along the simulation. For the
                                                                                    Figure 2. Discovery/relevance bias caused by communication/
focus of the present paper we only observe three recommenders:
                                                                                    relevance biases. The color scale on the color map and the 𝒚
popularity, relevant popularity, and random.
                                                                                    axis on the graphic on represent 𝒑(𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕|𝒔𝒆𝒆𝒏), blue be-
The basic research questions we aim to shed light on in our exper-                  ing the maximum and red the minimum value. Note that the
iments are the following:                                                           curve for 𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏) = 𝟎. 𝟗 has no values for
RQ1. How does the observed and true precision of popularity-                        𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏, 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕) < 𝟎. 𝟓, as it is not possible to reach
     based recommendation depend on the users’ social commu-                        such a high prior with lower sharing probabilities on relevant
     nication patterns, and in particular the bias towards sharing                  items. Likewise, 𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏) = 𝟎. 𝟏 has no points for
     positive vs. negative experiences?                                             𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏, 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕) > 𝟎. 𝟓.
                                                                                         𝒑𝒐𝒑                                                                                             𝒑𝒐𝒑
                                                          Observed precision 𝑷𝒐𝒃𝒔 @𝟏𝟎 − 𝑷𝒓𝒏𝒅
                                                                                         𝒐𝒃𝒔 @𝟏𝟎                                                                   True precision 𝑷𝒕𝒓𝒖𝒆 @𝟏𝟎 − 𝑷𝒓𝒏𝒅
                                                                                                                                                                                               𝒕𝒓𝒖𝒆 @𝟏𝟎
                                                                                       0.1                                                                                                0.4
                             1                                                                                                        1

                                0 0 0 0 0 0 0 0 0 0                               0                                                     -0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0
                      p (tell |seen ,¬relevant )

                                                                                                                                                                                          0.3


                                                                                                                               p (tell |seen ,¬relevant )
                                                                                      0.08
Simple popularity


                                0 0 0 0 0 0 0 0 0 0                               0                                                     -0 -0 -0 -0 -0 -0 -0 -0 -0 0 0
                                0 0 0 0 0 0 0 0 0 0                               0   0.06                                              -0 -0 -0 -0 -0 -0 -0 -0 -0 0 0                    0.2
                                0 0 0 0 0 0 0 0 0 0                               0                                                     -0 -0 -0 -0 -0 -0 -0 0 0 0 0
                                0 0 0 0 0 0 0 0 0 0                               0                                                     -0 -0 -0 -0 -0 -0 -0 0 0 0 0
                                                                                      0.04                                                                                                0.1
                               -0 0 0 0 -0 0 0 0 0 0                              0                                                     -0 -0 -0 -0 -0 -0 0 0 0 0 0
                               -0 -0 0 0 0 0 0 0 0 0                              0                                                     -0 -0 -0 -0 -0 0 0 0 0 0 0                                                                   p(tell|seen)
                               -0 -0 0 0 0 0 0 0 0 0                              0   0.02                                              -0 -0 -0 0 0 0 0 0 0 0 0                               0                                            0.9
                                0 -0 0 0 0 0 0 0 0 0                              0                                                     -0 -0 0 0 0 0 0 0 0 0 0                                                                             0.8
                             0                                                                                                        0
                               -0 0 0 0 0 0 0 0 0 0                               0     0                                               -0 -0 0 0 0 0 0 0 0 0 0                           -0.1                                              0.7
                                0 0 p0(tell
                                        0 |seen
                                            0 0 ,relevant
                                                  0 0 0) 0                        1
                                                                                  0          0.1   p(tell|seen,relevant)   1             0 0 p0(tell
                                                                                                                                                  0 |seen
                                                                                                                                                     0 0 ,relevant
                                                                                                                                                           0 0 0) 0 1  0                           0.1   p(tell|seen,relevant)   1          0.6
                                                                                                                                                                                                                                            0.5
                                                                                                                                                                                                                                            0.4
                                                                                       0.1                                                                                                0.4                                               0.3
                             1                                                                                                        1
                                                                                                                                                                                                                                            0.2
Relevant popularity


                                                   0   0 0 0 0 0 0 0 0 0          0                                                     -0 -0 0 0 0 0 0 0 0 0                        0
                      p (tell |seen ,¬relevant )


                                                                                                                               p (tell |seen ,¬relevant )
                                                                                      0.08                                                                                                0.3                                               0.1
                                                   0   0 0 0 0 0 0 0 0 0          0                                                     -0 0 0 0 0 0 0 0 0 0                         0
                                                   0   0 0 0 0 0 0 0 0 0          0   0.06                                              -0 0 0 0 0 0 0 0 0 0                         0    0.2
                                                   0   0 0 0 0 0 0 0 0 0          0                                                      0 0 0 0 0 0 0 0 0 0                         0
                                                   0   0 0 0 0 0 0 0 0 0          0                                                      0 0 0 0 0 0 0 0 0 0                         0
                                                                                      0.04                                                                                                0.1
                                                   0   0 0 0 0 0 0 0 0 0          0                                                      0 0 0 0 0 0 0 0 0 0                         0
                                                   0   0 0 0 0 0 0 0 0 0          0                                                      0 0 0 0 0 0 0 0 0 0                         0
                                                   0   0 0 0 0 0 0 0 0 0          0   0.02                                               0 0 0 0 0 0 0 0 0 0                         0         0
                                                   0   0 0 0 0 0 0 0 0 0          0                                                      0 0 0 0 0 0 0 0 0 0                         0
                             0                                                                                                        0
                                                   0   0 0 0 0 0 0 0 0 0          0     0                                                0 0 0 0 0 0 0 0 0 0                         0    -0.1
                                                   0   0 p0(tell
                                                             0 |seen
                                                                 0 0 ,relevant
                                                                       0 0 0) 0   1
                                                                                  0          0.1   p(tell|seen,relevant)   1             0 0 p0(tell
                                                                                                                                                 0 |seen
                                                                                                                                                     0 0 ,relevant
                                                                                                                                                           0 0 0) 0                  1
                                                                                                                                                                                     0             0.1   p(tell|seen,relevant)   1


    Figure 3. Effect of the communication/relevance biases on popularity-based recommendation precision. We show both the observed (left
    block) and true (right block) precision on two item popularity variants: ranking by the total number of ratings (simple popularity, top),
    and ranking by the number of positive ratings (relevant popularity, bottom). The precision values are actually shown as the difference
    to the precision of random recommendation. For each recmmender/metric combination, a color map shows the resulting precision
    values (blue being the maximum and red the minimum value) for the corresponding values of 𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏, 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕) and
    𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏, ¬𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕). The (𝟎, 𝟎) cell is left empty as no communication takes place in such a setting. Next to each color map, a
    graphic shows a curve for each value of the prior 𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏) from 𝟎. 𝟏 to 𝟎. 𝟗 with 𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏, 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕) in the 𝒙 axis, and the
    difference to random precision in the 𝒚 axis. The same as in Figure 2 and for the same reason, the curves for 𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏) = 𝟎. 𝟗 have
    no values for 𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏, 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕) < 𝟎. 𝟓, and 𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏) = 𝟎. 𝟏 have no points for 𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏, 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕) > 𝟎. 𝟓.

  4.2.1 Discovery/relevance bias                                                                                                                            4.2.2 Effect on popularity
  Before analyzing how the relevance-sharing biases affect popularity                                                                                       We now check the effect of communication biases on popularity
  precision, we check a simple hypothesis that sharing biases result in                                                                                     precision. We do so by the same parameter settings as we just did.
  approximately equivalent discovery biases. This is not as obvious as                                                                                      Figure 3 shows the effects for the two variants of popularity-based
  it might seem, as the fact that people speak about what they like (a                                                                                      recommendation described in section 3 (simple popularity on the top
  relevance bias in communication) does not necessarily imply those                                                                                         and relevant popularity at the bottom), in terms both the observed
  who listen like it as well (a relevance bias in discovery).                                                                                               (left block) and true (right block) precision –more specifically, the
  Figure 2 shows how the communication/relevance bias results in a                                                                                          difference between the precision of popularity and random recom-
  discovery bias in terms of the ratio of discovered items that are liked                                                                                   mendation. Similarly to Figure 2, for each popularity / precision
  by the users who have discovered them, as expressed by                                                                                                    variant we display a) a color map where the color scale shows the
  𝑝(𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡|𝑠𝑒𝑒𝑛). The figure presents the results in two ways: in                                                                                         evolution of precision with respect to the user propensity to rate liked
  the color map on the left, we vary 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) and                                                                                             and non-liked items, and b) a graphic with a set of curves showing
                                                                                                                                                            how precision evolves with the propensity to share liked items for a
  𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) from 0 to 1 by increments of 0.1, and we
  show the resulting 𝑝(𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡|𝑠𝑒𝑒𝑛) in a color scale, where blue is                                                                                       fixed amount of global communication 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛) on each curve.
  the maximum value and red is the minimum. We see that there is                                                                                            We see that popularity is in general better than random recommenda-
  indeed an almost direct relation in the bias towards relevance in                                                                                         tion in most configurations, variants and metrics. However, we see
  discovery and network communication. The right graphic provides a                                                                                         that we may not take this for granted in all cases, and popularity
  complementary view of this trend, where each line corresponds to a                                                                                        becomes a worse than random approach in some circumstances.
  value of 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛), the 𝑥 axis is 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡), and the                                                                                      Comparing the graphics of true and observed precision, we see that
  𝑦 axis is the resulting 𝑝(𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡|𝑠𝑒𝑒𝑛). Again, we see that the                                                                                          the latter shows much lower values than the former. This is because
  discovery/relevance bias grows with the sharing/relevance bias: the                                                                                       observed precision only counts observed relevance in the form of
  curves have monotonic growth with 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡).                                                                                                 ratings, which is a fraction of the total relevance that true precision
                                                                                                                                                            takes into account –we are just reproducing the well-known fact that
  This correspondence between biases is not necessarily trivial, as we
                                                                                                                                                            observed precision is a lower bound of true precision [7]. We may
  just noted earlier. The explanation is that, intuitively, in a relevance-
                                                                                                                                                            also observe that relevant popularity is generally a better option than
  prone sharing situation, items that many users like find more paths
                                                                                                                                                            simple popularity, as one would anticipate. We also see at first sight
  with high traversal probability (through connected friends who all
                                                                                                                                                            that the effects of the communication biases on the two popularity
  like the item), and will therefore travel farther than items fewer users
                                                                                                                                                            variants are almost the same in terms of observed precision, whereas
  like, whereby more-liked items become discovered by more users.
                                                                                                                                                            they differ in terms of true precision, as we shall discuss next.
                                          a) Maximum communication                                                                b) Moderate communication
                               𝒑𝒐𝒑                                      𝒑𝒐𝒑                                           𝒑𝒐𝒑                                           𝒑𝒐𝒑
                              𝑷𝒐𝒃𝒔 @𝟏𝟎 − 𝑷𝒓𝒏𝒅
                                          𝒐𝒃𝒔 @𝟏𝟎                      𝑷𝒕𝒓𝒖𝒆 @𝟏𝟎 − 𝑷𝒓𝒏𝒅
                                                                                    𝒕𝒓𝒖𝒆 @𝟏𝟎                         𝑷𝒐𝒃𝒔 @𝟏𝟎 − 𝑷𝒓𝒏𝒅
                                                                                                                                 𝒐𝒃𝒔 @𝟏𝟎                      𝑷𝒕𝒓𝒖𝒆 @𝟏𝟎 − 𝑷𝒓𝒏𝒅
                                                                                                                                                                           𝒕𝒓𝒖𝒆 @𝟏𝟎
                      0.12                                      0.4                                        0.04                                        0.4

                       0.1
Simple popularity


                                                                0.3                                        0.03                                        0.3
                      0.08
                                                                0.2                                        0.02                                        0.2
                      0.06                                                                                                                                                                              p(rated|seen)
                                                                0.1                                        0.01                                        0.1                                                     0.9
                      0.04
                                                                                                                                                                                                               0.8
                      0.02                                        0                                           0                                          0                                                     0.7
                                                                                                                                                                                                               0.6
                        0                                       -0.1                                       -0.01                                       -0.1                                                    0.5
                             0.1   p(rated|seen,relevant)   1          0.1    p(rated|seen,relevant)   1           0.1    p(rated|seen,relevant)   1          0.1          p(rated|seen,relevant)   1
                                                                                                                                                                                                               0.4
                      0.12                                      0.4                                        0.04                                        0.4                                                     0.3
                                                                                                                                                                                                               0.2
Relevant popularity


                       0.1
                                                                0.3                                        0.03                                        0.3                                                     0.1
                      0.08

                      0.06                                      0.2                                        0.02                                        0.2

                      0.04
                                                                0.1                                        0.01                                        0.1
                      0.02

                        0                                        0                                            0                                          0
                             0.1   p(rated|seen,relevant)   1         0.1    p(rated|seen,relevant)    1          0.1    p(rated|seen,relevant)    1          0.1         p(rated|seen,relevant)    1


Figure 4. Effect of the rating/relevance bias on popularity precision. The relation between popularity precision and rating biases is
shown for two different scenarios: a) intense communication (left) with 𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏, 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕) = 𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏, ¬𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕) = 𝟏
and b) moderate communication (right) with 𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏, 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕) = 𝒑(𝒕𝒆𝒍𝒍|𝒔𝒆𝒆𝒏, ¬𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕) = 𝟎. 𝟑𝟔. Similarly to Figures 2
and 3, the curves at 𝒑(𝒓𝒂𝒕𝒆𝒅|𝒔𝒆𝒆𝒏) = 𝟎. 𝟗 are truncated because it is not possible to have 𝒑(𝒓𝒂𝒕𝒆𝒅|𝒔𝒆𝒆𝒏, 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕) < 𝟎. 𝟓 for
such a high rating prior, and same for 𝒑(𝒓𝒂𝒕𝒆𝒅|𝒔𝒆𝒆𝒏, 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕) > 𝟎. 𝟓 on 𝒑(𝒓𝒂𝒕𝒆𝒅|𝒔𝒆𝒆𝒏) = 𝟎. 𝟏.
As a general trend, we see that popularity is more effective (in both                                               for the prediction –the ratings do not count on the true precision
variants and metrics) when users are prone to share items they like:                                                computation, but just the full true relevance. Even in some regions
all the rows of all the maps display a monotonic growth left to right,                                              where 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) < 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡), true
and all the curves in the graphics also show a steady growing trend.                                                precision is good because negative ratings are ignored by this vari-
In terms of observed precision this is just natural: for a given num-                                               ant. However, sharing non-liked items beyond a certain degree does
ber of total ratings (our simulation stopping condition), the number                                                hurt relevant popularity: we see a strong decreasing trend in the
of positive ratings gets higher if discovery is biased towards liked                                                color map columns. This happens because items which are not liked
items, and the difference to random precision is roughly proportion-                                                by that many users get enough positive ratings (corresponding to the
al to the positive ratings density, for statistical reasons.                                                        few users who do like the items) to surpass highly liked items for
In true precision, the trend is explained because in a relevance-                                                   which relevance remains more unobserved.
biased communication, the number of relevant and total ratings of                                                   The trends on negative communication display some nuances in
each item will correlate with the number of users who like each                                                     observed precision, where at some points sharing negative experi-
item. Thereby liked items become statistically more popular, caus-                                                  ences seems to improve the measured precision of popularity. We
ing an increase in the resulting true popularity precision. In the                                                  hypothesize that this is due to the fact that when an item is shared,
opposite case, sharing items users do not like is counterproductive                                                 it may be liked by the receiver even if the sharer did not like it.
for popularity, because the generated ratings mislead the recom-                                                    Once negative discovery is saturated, further negative sharing may
mendation: the items with most ratings (predominantly negative)                                                     cause a slight relative raise in positive discovery (and therefore
are not liked by many users, yet they get recommended by popular-                                                   positive ratings), as we see in the color maps for
ity. Simple popularity is particularly vulnerable to this, as it does                                               𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) < 0.5 in observed precision. We can also
not distinguish between positive and negative ratings. This popu-                                                   see that more communication results in better observed precision
larity variant seems to be essentially sensitive to just whether                                                    in the curves of the graphics (darker curves run above lighter
𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) > 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) or the oppo-                                                       ones). The trade-off between negative communication and total
site. Once the inequality leans clearly enough to either side, preci-                                               communication explains the “mix of diagonal trends” in the color
sion does not vary much further, as we can see by the rather uni-                                                   maps in observed precision.
form colors in the triangular sections above and below the diagonal
in the color map. In fact (though we do not show this information in                                                4.3 Rating/relevance bias
the figure) the red and blue cells mostly correspond to negative and                                                In order to study the effect of the rating bias on popularity recom-
positive values respectively in this particular color map (i.e. preci-                                              mendation, we take fixed neutral values for the communication
sion is below or above random on each side of the diagonal).                                                        biases 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) = 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) = 1,
                                                                                                                    that is, users share with their friends all the items they find. In this
Relevant popularity on the other hand is more robust: it is almost                                                  setting, we vary 𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) and analyze the result-
insensitive to the relevance bias for 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛) < 0.5 –the preci-                                                ing curves for fixed values of 𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛).
sion curves run almost constant and high with respect to
𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) for such low levels of global communica-                                                     Figure 4a shows the result. The first obvious trend is that observed
tion. This makes sense because once popularity can correctly identi-                                                popularity grows with the bias towards rating relevant items. This
fy the relevant items by correlation with positive ratings, it does not                                             is because for a fixed total number of generated ratings (the
matter how much rating information the recommendation is using                                                      simulation stopping condition), the number of positive ratings
                    a) Facebook network data                                                   b) Barabási-Albert graph
   1                                                                        1

 0.9                                                                       0.9
                            p(relevant)                                                              p(relevant)
 0.8                        p(rated, relevant)                             0.8                       p(rated, relevant)
                            p(seen)                                                                  p(seen)
 0.7                                                                       0.7

 0.6                                                                       0.6

 0.5                                                                       0.5

 0.4                                                                       0.4

 0.3                                                                       0.3

 0.2                                                                       0.2

 0.1                                                                       0.1

   0                                                                        0
       0     500     1000     1500           2000   2500   3000   3500           0    500     1000     1500           2000   2500   3000   3500
                                         i                                                                        i
Figure 5. Positive rating, discovery and relevance distributions generated for different network structures: a) Facebook data (left)
and b) a Barabási-Albert network with the same size in terms of number of nodes and arcs (right). The plots show the situation
reached when the simulation has been run until 500,000 ratings are generated. The 𝒙 axis corresponds to the items, sorted by their
number of positive ratings. Each dot shows the ratio of users who like (red), have discovered (green), and have rated positively (yel-
low) the corresponding item. We may observe a steeper green discovery front (viral effect) in the Barabási-Albert network.
increases with that bias. And as pointed out earlier, this statistically    more evenly distributed, the generated ratings are not extremely
increases the difference to random precision by roughly the                 skewed towards relevant items as before with viral propagation, and
positive rating density.                                                    these “good items” are thus excluded from fewer recommendations.
This effect is not observed in true precision. In fact, and quite           The true precision of simple popularity does depend on the ratio of
paradoxically, the true precision of relevant popularity degrades           positive vs. negative ratings, and this is clearly shown in the corre-
with the positive rating bias (the curves display a decreasing              sponding graphic, where in fact precision steps up as soon as
trend). This is explained by a non-trivial interaction between a            𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) > 𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡).
viral network effect and the recommendation protocol. The                   We also examine whether the network structure can be a factor in
communication setting in these experiments is of extreme                    the observed dynamics. For this purpose, we run the same simula-
diffusion, since users always decide to share items every time they         tion on a Barabási-Albert (BA) graph [2] with the same number of
get a turn. Thus the first items to be found, by chance of exogenous        users and friendship links as in the Facebook (FB) dataset. We take
discovery by some user, spread quickly through the network and              a    simple     configuration     with    𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) =
accumulate a comparatively high number of ratings. The                      𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) = 1, 𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) = 1 and
recommendation protocol mandates that users not be                          𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) = 0, that is, users share everything
recommended items they have already rated themselves. This                  they discover, and rate only what they like.
means that early rated items will be excluded from
recommendations of more target users, as they got more ratings              Figure 5 shows the difference in the distribution of discovery and
earlier. As 𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) grows, generated ratings lean           positive ratings on each graph. Discovery is steeper in BA than FB,
towards items with high 𝑝(𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡|𝑖). Thus the items excluded              with the best known items been discovered by almost all users, and
more often in the protocol will tend to be the most relevant ones,          the least known been discovered by almost none. Discovery is
whereby true relevance decreases for statistical reasons: a decrease        neutral with respect to relevance in this configuration (hence the
of the effective relevance density in the set of candidate items.           green and red plots do not show correlation). The positive rating
                                                                            distribution correlates with discovery because the more an item is
4.4 Network effects                                                         discovered the more chances it gets to be rated. It also correlates
As we have seen, in addition to the effect of individual users’             with relevance, because of the rating-relevance bias configuration.
behavior, further network effects may emerge from social-level              Figure 6 shows the effect of this in the resulting precision. The
dynamics, which end up affecting popularity. In order to complete           results do not differ significantly between the two types of graphs,
the observation of viral effects just discussed in the previous sec-        except mainly for the true precision of relevant popularity, which
tion, we repeat the same experiments with lower communication               is good on FB, but close to random on BA. This is because infor-
rates 𝑝(𝑡𝑒𝑙𝑙 |𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) = 𝑝(𝑡𝑒𝑙𝑙 |𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) = 0.36,            mation travels faster on the preferential attachment model of BA
which produces a positive rating distribution similar to MovieLens          and the viral effect hurts true precision, whereas information takes
1M –we omit the details about this for the sake of space.                   longer to move outside friendship clusters on FB and popularity
Figure 4b shows the results. We see that the paradoxical effect of the      retains a better effectiveness. Note that in this setting, simple and
positive rating bias in the true precision of relevant popularity disap-    relevant popularity are equivalent since all the generated ratings
pears. Now in fact there is no dependence on the bias, as one should        are positive as per the setting 𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) = 0.
expect since relevant popularity ignores negative ratings, true preci-      Popularity gets a very slight advantage in observed precision on
sion ignores all ratings, and the correlation between relevance and         BA. We attribute this to an effect of a steeper positive rating distri-
positive ratings –which determines the effectiveness of relevant            bution with this graph, which results in the 10 topmost popular
popularity– is achieved as soon as sufficient (a small number of)           items accumulating a higher number of positive ratings, thereby
positive ratings have been generated. The exclusion of rated items is       yielding a higher observed 𝑃𝑜𝑏𝑠 @10.
not that badly biased against relevant items because discovery being
                                                                                              0.25
                                                                                                     Simple popularity
              0.3
                                       Relevant popularity                                     0.2   Relevant popularity
                                       Random recommendation
                                                                                                     Random
              0.2                                                                             0.15
       P@10


                                                                                       P@10
                                                                                                     recommendation

                                                                                               0.1
              0.1
                                                                                              0.05

               0                                                                                0
                    Observed    True      Observed       True                                         Observed             True
                         Facebook             Barabási-Albert             Figure 7. An example where observed and true precision disa-
Figure 6. The effect of graph structure on popularity precision.          gree in the comparison of two recommenders (popularity vari-
The observed and true precision of relevant popularity vs.                ants vs. random recommendation).
random recommendation on Facebook data (left) and a prefer-
ential attachment graph model (right) can be compared.                    dynamic networks, non-static user preferences (including effects of
                                                                          social influence in preference formation and propagation), etc. A
4.5 Effectiveness disagreement                                            user study to check what trends are given in practice in specific
The biases and effects we have analyzed can even cause a disagree-        social environments would also be highly relevant to our research.
ment between observed and true relevance not just in quantitative
terms, but also in terms of the comparison of two recommenders, in        6. ACKNOWLEDGMENTS
this case popularity and random. Figure 7 shows one such example          This work was supported by the national Spanish Government
on the FB graph. Simply with maximum “anti-relevance” communi-            (grant nr. TIN2013-47090-C3-2).
cation bias 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) = 0, 𝑝(𝑡𝑒𝑙𝑙|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡)
= 1, and neutral rating 𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) =
                                                                          7. REFERENCES
                                                                          [1] Bakshy, E., Rosenn, I., Marlow, C., and Adamic, L. The Role of
𝑝(𝑟𝑎𝑡𝑒𝑑|𝑠𝑒𝑒𝑛, ¬𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡) = 1, we get a negative direct correla-
                                                                              Social Networks in Information Diffusion. WWW 2012, Lyon,
tion between positive ratings and relevance: the items with the most          France, April 2012, 519-528.
positive ratings are the ones with the least users liking them. As a
                                                                          [2] Barabási, A.-L. and Albert, R. Emergence of scaling in random
consequence, the true precision of both simple and relevant populari-
                                                                              networks. Science 286(5439), October 1999, 509–512.
ty is worse than random. Oblivious of the full true relevance, the
observed precision of popularity is still higher than random (which is    [3] Blattner, M. and Medo, M. Recommendation Systems in the
too low to be barely visible in the figure).                                   Scope of Opinion Formation: a Model. Decisions Workshop at
                                                                               RecSys 2012, Dublin, Ireland, September 2012, 32-39.
5. CONCLUSION                                                             [4] Celma, Ò. and Herrera, P. A new approach to evaluating novel
We have studied the effect of different aspects of user behavior in            recommendations. RecSys 2008, Lausane, Switzerland, October
social networks on the effectiveness of popularity-based recommen-             2008, 179-186.
dation. We define a formal model to represent such factors, based on      [5] Cremonesi, P., Koren, Y., and Turrin, R. Performance of recom-
which we can simulate different situations and observe the resulting           mender algorithms on top-n recommendation tasks. RecSys 2010,
effects. The analysis sheds findings such as a) popularity is most             Barcelona, Spain, September 2010, 39-46.
effective when users have a bias towards items they like when they        [6] Doerr, B., Fouz, M., and Friedrich, T. Why rumors spread so
share and rate items; b) highly active inter-user communication and            quickly in social networks. Comm. ACM 55(6), Jan 2012, 70-75.
viral information propagation pumps up observed precision –if
                                                                          [7] Herlocker, J. L., Konstan, J. A., Terveen, L. G., and Riedl, J. T.
communication is intense, even sharing non-liked items improves
                                                                               Evaluating collaborative filtering recommender systems. ACM
observed precision a bit further; c) viral propagation and a bias              Trans. Information Systems 22(1), January 2004, 5-53.
towards rating liked items can cause a decrease in true precision due
                                                                          [8] Marlin, B. M., Zemel, R. S., Roweis, S. T., and Slaney, M. Col-
to the exclusion of rated items from recommendations; d) viral prop-
                                                                               laborative Filtering and the Missing at Random Assumption. UAI
agation of negative opinions can cause a disagreement between
                                                                               2007, Vancouver, Canada, July 2007, 267-275.
measured and true precision even in terms of system comparisons; e)
network structure is an additional factor for the effectiveness of        [9] McAuley, J. J. and Leskovec, J. Learning to Discover Social
precision, as it determines to a significant extent the speed of infor-        Circles in Ego Networks. NIPS 2012, Lake Tahoe, NV, USA,
                                                                               December 2012, 548-556.
mation transmission and discovery, thus intensifying or moderating
the viral effects and their consequences on popularity.                   [10] Myers, S. A., Zhu, C., and Leskovec, J. Information Diffusion
                                                                               and External Influence in Networks. KDD 2012, Beijing, China,
The possibilities for continuation of the presented work are manifold.         August 2012, 33-41.
We are currently aiming to back the reported empirical observations
                                                                          [11] Pradel, B., Usunier, N., and Gallinari, P. Ranking with non-random
with a formal analysis of the explored model and effects. Beyond               missing ratings: influence of popularity and positivity on evaluation
that, the simplifications assumed so far can be relaxed in many                metrics. RecSys 2012, Dublin, September 2012, 147-154.
directions: we may consider different inter-user communication
                                                                          [12] Steck, H. Training and testing of recommender systems on data
modalities (e.g. communicate with more than one friend per turn,
                                                                               missing not at random. KDD 2010, Washington, DC, USA, July
etc.), introduce the distinction between user discovery and item               2010, 713-722
consumption, non-uniform user behaviors, biases in extrinsic item
discovery (e.g. relevance bias in user searches), discovery loop from     [13] Steck, H. Item popularity and Recommendation Accuracy.
                                                                               RecSys 2011, Chicago, IL, USA, October 2011, 125-132.
recommendations, temporal dynamics (e.g. new items and users
keep entering the system), further dependencies between events            [14] Steck, H. Evaluation of recommendations: rating-prediction and
(such as user decisions and choices depending on discovery source),            ranking. RecSys 2013, Hong Kong, October 2013, 213-220.