<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ACM RecSys</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Exploring social network effects on popularity biases in recommender systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rocío Cañamares</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pablo Castells</string-name>
          <email>pablo.castells@uam.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>10</volume>
      <abstract>
        <p>Recommending items ranked by popularity has been found to be a fairly competitive approach in the top-N recommendation task. In this paper we explore whether popularity should always be expected to be an effective approach for recommendation, and what causes, factors and conditions determine and explain such effectiveness. We focus on two fundamental potential sources of biases in rating data which determine the answer to these questions: item discovery by users, and rating decision. We research the role of social communication as a major source of item discovery biases (and therefore rating biases). We undertake the study by defining a probabilistic model of such factors, and running simulations where we analyze the relationships between the effectiveness of popularity and different configurations of social behavior.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Popularity</kwd>
        <kwd>social networks</kwd>
        <kwd>evaluation</kwd>
        <kwd>viral propagation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Some authors have analyzed this issue recently [
        <xref ref-type="bibr" rid="ref12 ref13 ref14 ref15 ref9">8,11,12,13,14</xref>
        ], and
have proposed specific techniques to consider the biases in the
distribution of missing ratings, in both the recommendation
algorithms and the evaluation methodology and metrics. The question
has also been addressed from the perspective of the actual utility of
recommendation: recommending popular items has the obvious
shortcoming of a lack of surprise for the user, approximating (by
definition) the worst possible results in terms of the novelty
dimension [
        <xref ref-type="bibr" rid="ref5">4</xref>
        ]. Despite this obvious shortcoming, popular
recommendations appear to be reasonably effective in practice (e.g. as a fallback
option), item popularity is actually an (intentional or accidental)
ingredient of many state of the art recommendation algorithms, and
commercial applications seem to be using it among other signals in
recommendation functionalities. However, we barely find in the
literature a clear analysis of the causes and characteristics of the
popularity biases, and the relationship between the popularity
distribution and the potential consequences in the performance and
evaluation of recommendation algorithms.
      </p>
      <p>It is natural to wonder a) whether popularity should always be
expected to be an effective approach (or partial signal) for
recommendation, b) what causes, factors and conditions determine and
explain such effectiveness, and c) whether the apparent
effectiveness actually reflects true effectiveness, or is the result of a
distortion of some sort in the evaluation methodologies. We address
such questions in this paper.</p>
      <p>
        Popularity-based recommendation exploits biases in the
distribution of available observed ratings among items –or equivalently, of
the distribution of missing ratings. Thus studying the properties of
popularity is essentially the same as studying the characteristics of
rating distributions, and their biases. In unbiased situations (where
ratings are uniformly distributed), popularity is equivalent to
random recommendation and makes no particular sense as a
recommendation strategy. Popularity therefore makes sense when rating
data is biased or, in other words, missing not at random [
        <xref ref-type="bibr" rid="ref12 ref8">7,11</xref>
        ].
In this paper we focus on two fundamental potential sources of
biases in rating data: item discovery biases, and rating decision
biases. The latter refers to the factors that determine whether or not
a user decides to rate an item he has interacted with; for instance,
in many cases users may be typically more prone to rate items they
have liked than items they have not liked. The former refers to the
fact that in order to be rated by a user, the user needs first to
become aware that the item exists. Biases in item discovery
distribution then naturally result in biases in the items that ultimately get
more ratings or less.
      </p>
      <p>Discovery biases are determined by the sources by which users
discover items. People get to know items through a variety of
channels such as direct user searches, advertisement from
providers, random encounter, suggestions from a recommender system,
etc. Beyond this and foremost, our social environment is a key
source of information and discovery for which people have a
particular reliance and trust compared to other channels. The
perspective of the role word of mouth has in the distribution of ratings
connects the problem at hand to an issue of network propagation:
the items that propagate faster and farther in the social network
will tend to get more ratings.</p>
      <p>
        Propagation phenomena have been extensively studied in the area
of complex networks, and social networks in particular (for
diseases, rumors, viral effects, etc.) [
        <xref ref-type="bibr" rid="ref10 ref11 ref2 ref7">1,6,9,10</xref>
        ], but with scarce exceptions
[
        <xref ref-type="bibr" rid="ref4">3</xref>
        ] the connection to biases in user rating distribution have been
barely examined before. Yet we find that network effects can be a
major explanatory factor for recommendation data biases and
popularity effects.
      </p>
      <p>In this paper we address this perspective. We posit in particular the
following potential key factors in creating popularity biases,
determining whether popularity becomes or not a good strategy to
achieve recommendation effectiveness:
 User behavior in their communication with peers, in particular
the biases towards positive or negative experiences when
sharing one’s experiences with others, and the overall frequency
with which users intercommunicate in social networks.
 User behavior in rating decisions, in particular, biases towards
rating positive or negative experiences.
 Social network structure, in particular link density and
clustering structure.</p>
      <p>We undertake this study by representing the involved factors in a
probabilistic model defined by random variables subject to
interdependent distributions. Based on the model, the problem can be
approached, complementarily, by a formal analysis, or by
empirical observation through simulations. In this paper we pursue the
latter path. We identify the key variables, parameters and
dependencies describing the factors we aim to focus on and we explore,
through simulation based on the proposed model, the resulting
effects on the effectiveness of popularity, aiming to identify
different situations and uncover potential explanations thereof.</p>
    </sec>
    <sec id="sec-2">
      <title>2. A SOCIAL RATING GENERATION</title>
    </sec>
    <sec id="sec-3">
      <title>MODEL</title>
      <p>We start our analysis by formalizing the fundamental actions,
events and variables involved in the rating generation process,
upon which we will formally identify and formalize the key factors
for the phenomena we aim to observe (user behavior trends and
related network processes), and their relations to resulting effects
(data biases and effectiveness variations in popularity-based
recommendation), in the form of probabilistic dependencies and
model parameters.</p>
      <p>In order to generate input data for an item, a user needs to become
aware that the item exists, decide to interact with the item, and then
decide to rate it. Popularity biases in recommender system input
data can be therefore related to two main factors: a) biases in the
items that users discover: some items become known to many
more users than others; and b) biases in the items that users decide
to rate (or consume or interact with): once a user experiences an
item, there may be some systematic reason why users decide to
rate certain items and not others.</p>
      <p>The primary necessary steps by which a rating value is generated
can be thus identified as follows:
1. A user discovers an item, i.e. he becomes aware that the item
exists.
2. The user decides to interact with (consume, click, play, etc.)
the item.
3. The user decides to rate the item.</p>
      <p>Moreover, in a social environment, we consider an additional
relevant action by users on items:
4. The user shares with some of his friends his experience with
the item. This brings to step 1 (discovery) each person
informed by the user about the item.</p>
      <p>
        The distinction between steps 2 and 3 is not a clear cut or simply
inexistent in common applications, where users do not enter
explicit ratings, and user-item interaction data are used as input
instead by recommendation algorithms; for this reason and the sake
of simplicity we shall ignore the difference in our model.
These steps thus create a cycle by which users become aware of
items or, from the item perspective, items progressively traverse
the social network of users, becoming known to the users they
come across, and becoming rated by some of them. How far and
what regions an item reaches in the network depends on the
intrinsic communication patterns of users in the network, the
dependence of the latter on characteristics of the items, and the shape and
connectivity of the network, which is known to affect the
development of network propagation phenomena [
        <xref ref-type="bibr" rid="ref7">6</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>2.1 Random variables and parameters</title>
      <p>We formally model the described process in terms of a set of
binary random variables defined upon different sample spaces
combining the set  of all users, the set ℐ of all items, and the set  of all
time points we may consider in the model, as follows:
 Rating:  : ℐ ×  ×  → {0,1} takes value 1 for a sampled
element ( ,  ,  ) if user  ∈  has rated item  ∈ ℐ by or before
time  ∈  , and 0 otherwise.
 Relevance:  : ℐ ×  → {0,1} takes value 1 if the
sampled user likes the sampled item, and 0 otherwise. Notice that
this variable is not observed unless it becomes visible to the
system when the user rates the item. Note also that we assume
as a simplification that relevance is a static condition and does
not change with time or context.
 Discovery:  : ℐ ×  ×  → {0,1} is 1 if the user is aware
the item exists by or before the time point at hand, and 0
otherwise. The same as relevance, this variable is not observed
unless a user rates an item, in which case we know he must have
seen it to begin with.</p>
      <p>As mentioned before, we ignore the distinction between
knowing an item exists (e.g. we have seen a movie title on a
billboard), and actually experiencing the item (e.g. we actually
watch the movie). The difference can be worth being
considered as it involves a decision on the part of the user, but it is
not required for the focus of our present analysis.
 Communication:   :  × ℐ ×  ×  → {0,1} is 1 if a user
tells a given friend about a given item at a given time when
both friends talk to each other, and 0 otherwise.</p>
      <p>We consider that users only share experiences with people they
are connected with in a social network. This does not involve
any loss of generality, as we do not make any assumption on
the nature of the network at this point. The simplifying
restriction will be made in our experiments, where we will use or
simulate specific social network structures and assume (as a
simplification) they embody full knowledge of all connections
between users.</p>
      <p>The key distributions and dependencies which capture the relevant
factors in the behavior of the defined model can be expressed in
terms of conditional probabilities. We would mainly foresee two
such key dependencies: the propensity of users to rate items they
like vs. items they do not like, and their inclination to share
positive vs. negative experiences. This can be expressed by four
conditional distributions:
 (
 (
 (
 (
|
|
|
|
, 
, ¬
, 
, ¬
,  ,  ,  ,  )
,  ,  ,  ,  )
,  ,  ,  )
,  ,  ,  )
If we make the simplifying assumption that the decision to rate and
share mainly depends on the relevance of the item, and we ignore
for a moment the differences between users in this respect, as well
as the possible variations in user behavior over time, we may
consider the approximation  ( | ,  ,  ,  ,  ) ∼
 ( | ,  ) –and the same for the other four
distributions– in such a way that we have four conditional probabilities
defining two behavioral dimensions, which may act as
configuration parameters of the model:
 Communication/relevance bias:  (</p>
      <p>( | , ¬ ).
 Rating/relevance bias:  (
 ( | , ¬  ).
|
We are interested in studying how these parameters affect the
effectiveness of popularity-based recommendation. For that purpose, we
shall simulate an environment the dynamics of which are based on
the proposed model, run recommendations in that environment using
the generated ratings, and measure their effectiveness according to
the simulated data. The model defines a few rules that changes of
state in the environment should be governed by, but in order to
simulate the environment dynamics we need to define a set of
triggering actions and events, and the order in which they take place.</p>
    </sec>
    <sec id="sec-5">
      <title>2.2 Model dynamics</title>
      <p>We consider the following simplified scenario that represents how
items may become known to users and eventually rated. We have a
population of users and a set of items. Based on the model defined
in the previous subsection, user-item pairs undergo a sequence of
states, from unknown to discovered to rated, in this order, where
the two latter states may or may not be ever reached. Items are
discovered by users through friends: at certain points in time, users
choose a friend and an item they have discovered, and decide
whether or not to tell the friend about the item. If communication
takes place, the friend discovers the item the user talks about (if he
had not discovered the item already). Communication takes place
as a dialog, which means that the friend will in turn choose some
discovered item and (under the same relevance-based
communication probability pattern) talk back about it to the first user, who
will then discover this item. In our chosen configuration, users talk
about an item on their own initiative only once at most, but they
can talk about it any number of times when asked.</p>
      <p>Users thus discover items by communication through the social
network. However, initially all items are unknown to all users. In
order to bootstrap the system, we may either define an initial state
where an arbitrary set of user-item pairs are in the discovered state
(e.g.  random users have discovered each item), or we include an
additional discovery source, extrinsic to the social network, through
which items may also become known. In our current implementation
we choose the second option. The source may represent e.g. catalog
browsing and searching, item advertisement, etc., and can be
implemented as random sampling (as slow and infrequent as we would
desire with respect to the overall simulation time flow) of user-item
pairs for discovery, or biased sampling by some arbitrary
distribution, or even a recommender system. In our case we choose random
sampling by a ratio of 0.1% of the simulation time step (that is, on
average every 1 out of 1,000 simulation steps, users discover an item
at random with replacement –i.e. we do not force the sampled item
to be unknown and it may have been discovered already).
The decision to rate items and to share the experience with friends
can be required of the simulated users in different ways and order.
As a simplification, the decision to rate an item or not is made at the
time when the user discovers the item. If the user does not rate it, the
decision is not reconsidered anymore. Regarding communication, in
our chosen configuration each user is given a chance to talk about an
item to a friend once every simulation time unit –or inversely, the
time unit is defined as an iteration where every user is given the
chance to speak to a friend. The item is chosen uniformly at random
(without replacement if the user took the initiative) among the ones
the user has discovered, and the friend is sampled uniformly at
random (with replacement) from all the user’s social contacts.
The rating and sharing decisions, when the user is faced to them as
explained in the previous paragraph, are taken based on the
probabilistic model described in the previous subsection. That is, when a
user discovers an item, he will rate it with probability
 ( | ,  ) if the user likes the item, and with
probability  ( | , ¬ ) if he does not.
Analogously, the decision to talk or not to a friend about an item is taken
(once the friend and the item have been sampled) according to
 ( |  ,  ) if the user likes the item, and
 ( | , ¬ ) if he does not.</p>
      <p>In order to carry out the above simulated actions, it is apparent that
we should know whether a given user likes a given item at the time
when this determines the probabilities of the user’s decisions.
Relevance is in general an unobserved variable for the system
(until a user rates an item) and for the user himself (until he
discovers an item). We deal with this lack of observation by
simulating relevance knowledge as a certain user-item relevance
distribution. This knowledge will remain hidden to the system (in
particular to the recommender systems we will run in our experiments),
but will be made “visible” to a) the simulated users when they
discover an item, and b) the computation of recommendation
effectiveness metrics, as we will explain shortly.</p>
      <p>Our model does not make any assumption about the relevance
distribution, but in our experiments, we assume the number of users who
like an item (which is equal to  ( | ) multiplied by the
number of users) has a long-tail distribution shape. This is an
arbitrary decision in our work at this point, which could be contrasted by
means of a poll of some kind in a real setting. It does not seem to be
a critical aspect of the model in our simulations though. In order to
obtain the long tail shape we use an adjusted power law defined by
  ( |  ) =  1 +  ( +  2)− for the  -th most liked item.
The parameter  ∈ [0, ∞) defines the steepness of the relevance
distribution, where  = 0 gives a uniform distribution. We adjust the
remaining parameters  1,  2 and  in such a way that –we omit the
details– the distribution adheres to a given prior  ( ), and
the extremes of the curve for the most and least liked users ( 1 and
 |ℐ|) behave as one would expect, that is: lim   ( | 1) = 1,
 →∞
| |ℐ|) = 0, li→m0   ( | 1) =
| |ℐ|) =  (
). Figure 1 shows the shape of
lim   (
 →∞
li→m0   (
the curve for  = 1 and  (
) = 0.2 with 3700 items.</p>
      <p>Given a distribution thus defined, we generate the sequence of the
number of users who like each item, and we assign this number
randomly to the items. Then for each item, we assign the
corresponding number of users liking the item by randomly sampling
the users. We thus create a scenario where each user likes a set of
items beforehand, although he does not know he likes an item until
he discovers it. If the user rates the item, the system will also know
whether or not the user likes it: if he does, the rating value will be
0.7
0.6
0.5
)
i|t 0.4
n
a
v
e
lre 0.3
(
p
positive, and negative otherwise. We thus consider as a
simplification that relevance is an immutable condition which does not
change with time nor context.</p>
    </sec>
    <sec id="sec-6">
      <title>3. ITEM POPULARITY RECOMMENDA</title>
    </sec>
    <sec id="sec-7">
      <title>TION AND ITS EFFECTIVENESS</title>
      <p>
        Before we get into the empirical analysis of popularity effectiveness,
we cast precise definitions of popularity and the metrics to assess its
effectiveness in recommendation. In the usual definition of
popularity-based recommendation one finds in the literature, items are ranked
by their total number of ratings, regardless of whether the rating
values express a positive or negative preference [
        <xref ref-type="bibr" rid="ref6">5</xref>
        ]. Given the role
of relevance we are analyzing in our model, we find it worthy to also
consider a variant of popularity recommendation where only positive
ratings are considered. We therefore study both variants in our
experiments, the scoring functions of which are defined as:
Simple popularity:  ( ,  ) = |{ ∈  | ( ,  ) ≠ ∅}|
      </p>
      <p>Relevant popularity:  ( ,  ) = |{ ∈  | ( ,  ) ≥  }|
where  ( ,  ) is the rating assigned to  by  , and  is the threshold
value at or above which the rating expresses a positive preference.
As a metric of recommendation effectiveness we shall take
precision  @ , which is a simple to compute and analyze yet
representative metric. We shall distinguish between the observed
precision   , in which a recommended item is considered relevant if it
has been rated by the target user with a positive value, and true
precision   , in which we use all the simulated relevance
knowledge, including that which has not become known to the
system in the form of ratings:
 
 
@ = avg |{ ∈   | ( ,  ) ≥  }|⁄</p>
      <p>∈
@ = avg |{ ∈   | likes  }|⁄</p>
      <p>∈
where   is the set of top  items recommended to  . Naturally  
is what offline experimental evaluations commonly report. This will
allow us to contrast precision as it is usually measured in offline
recommender system experiments, to the true precision defined by
the full underlying user preferences which have determined the
generation of ratings and their distribution in our model.</p>
    </sec>
    <sec id="sec-8">
      <title>4. SIMULATION-BASED EMPIRICAL OB</title>
    </sec>
    <sec id="sec-9">
      <title>SERVATIONS</title>
      <p>The proposed model allows representing different user behavior
patterns by different model configurations using different
parameter settings. In order to analyze the effect that such configurations
produce on the effectiveness of popularity-based recommendation,
we implement a simulation framework that runs the model
dynamics described in the previous section. The framework supports the
integration of an arbitrary set of recommendation algorithms by
just having them adhere to a simple abstract API. At each
simulation time step, the framework generates a temporal split of rating
data with a 0.5/0.5 ratio of training/test data, runs all the
recommendation algorithms, and evaluates the observed and true
precision for each of them. This way we can monitor how the
performance of recommenders evolves along the simulation. For the
focus of the present paper we only observe three recommenders:
popularity, relevant popularity, and random.</p>
      <p>The basic research questions we aim to shed light on in our
experiments are the following:
RQ1. How does the observed and true precision of
popularitybased recommendation depend on the users’ social
communication patterns, and in particular the bias towards sharing
positive vs. negative experiences?
RQ2. How does the observed and true precision of
popularitybased recommendation depend on the user bias towards
rating liked vs. non-liked items?
RQ3. Can certain social network characteristics and effects (such
as its topology and viral phenomena) alter the dependencies
between user behavior and popularity precision?
RQ4. May the observed and true precision disagree in terms of
how popularity compares to random recommendation, as a
consequence of particular social behavior patterns?</p>
    </sec>
    <sec id="sec-10">
      <title>4.1 Experimental setup</title>
      <p>
        For most of the observations we report next, we use the social
network data from Facebook made available by J. Leskovec [
        <xref ref-type="bibr" rid="ref10">9</xref>
        ],
containing 88,234 social connections among 4,039 users. Taking
inspiration on the order of scale of MovieLens 1M, we take a total set of
3,700 items. We simulate a relevance distribution with prior
 ( ) = 0.2 and steepness  = 1, which results in the
distribution shown earlier in Figure 1. For discovery bootstrapping, in
addition to word of mouth, users discover an item at random 1 out of
every 1,000 simulation steps. We run all simulations until 500,000
ratings have been generated, which is half the size of MovieLens
1M. The reason for not running the simulations longer is to avoid the
distorting saturation effects that eventually start to appear as a
consequence of the implicit closed world assumption involved in having
a fixed set of users and items all along. E.g. in the limit all users end
up consuming and/or sharing most items regardless of their
preferences and behavior patterns, just by reason of exhaustion of any
better remaining option. In the results we present here, we run each
simulation only once, in order to show how the observed patterns
can be perceived even without averaging random effects (having
informally checked that the variation is moderate when averaging).
In order to study the effects of the relevance biases in sharing and
rating user behavior, we shall fix certain parameters and observe
the variation of popularity precision as we vary the others.
      </p>
    </sec>
    <sec id="sec-11">
      <title>4.2 Communication/relevance bias</title>
      <p>To isolate the effect of communication biases, we take a
relevanceneutral rating behavior by ( | ,  ) =
 ( | , ¬ ) = 1, that is, users always rate all
items they discover. Then, we shall vary  ( | ,  ),
 ( | , ¬ ) and the prior  ( | ) as we explain
next.</p>
      <p>(
|</p>
      <p>)
0.25
1
S
y
t
i
r
a
l
u
p
o
p
t
n
a
v
e
R
1
1</p>
      <p>True precision  
0.1
0
0.1
0.1
0
0.1
1
1
to the precision of random recommendation. For each recmmender/metric combination, a color map shows the resulting precision
values (blue being the maximum and red the minimum value) for the corresponding values of  (
|
like the item), and will therefore travel farther than items fewer users
like, whereby more-liked items become discovered by more users.
0.1
p(tell|seen,relevant)</p>
      <p>1
0.1
p(tell|seen,relevant)</p>
      <p>1
0.4
0.2
0.1
0
0.4
0.3
0.2
0.1
0
tion in most configurations, variants and metrics. However, we see
that we may not take this for granted in all cases, and popularity
becomes a worse than random approach in some circumstances.</p>
      <p>
        Comparing the graphics of true and observed precision, we see that
the latter shows much lower values than the former. This is because
observed precision only counts observed relevance in the form of
ratings, which is a fraction of the total relevance that true precision
takes into account –we are just reproducing the well-known fact that
observed precision is a lower bound of true precision [
        <xref ref-type="bibr" rid="ref8">7</xref>
        ]. We may
also observe that relevant popularity is generally a better option than
simple popularity, as one would anticipate. We also see at first sight
that the effects of the communication biases on the two popularity
variants are almost the same in terms of observed precision, whereas
they differ in terms of true precision, as we shall discuss next.
p 0.02
S
      </p>
      <p>0
op 0.06
tn 0.04
eva 0.02
0.4
0.3
0.2
0.1
0
-0.1
0.4
0.3
0.2
0.1
0
a) Maximum communication
b) Moderate communication



−  
−</p>
      <p>0.1
p(rated|seen,relevant)
1
0.1
p(rated|seen,relevant)
1
0.1
p(rated|seen,relevant)
1
0.1</p>
      <p>p(rated|seen,relevant)
1
, 
shown for two different scenarios: a) intense communication (left) with  (
and b) moderate communication (right) with  (
) =  (
|
|
matter how much rating information the recommendation is using
ant. However, sharing non-liked items beyond a certain degree does
hurt relevant popularity: we see a strong decreasing trend in the
color map columns. This happens because items which are not liked
by that many users get enough positive ratings (corresponding to the
few users who do like the items) to surpass highly liked items for
which relevance remains more unobserved.</p>
      <p>The trends on negative communication display some nuances in
observed precision, where at some points sharing negative
experiences seems to improve the measured precision of popularity. We
hypothesize that this is due to the fact that when an item is shared,
it may be liked by the receiver even if the sharer did not like it.</p>
      <p>Once negative discovery is saturated, further negative sharing may
cause a slight relative raise in positive discovery (and therefore
positive
simulation stopping condition), the number of positive ratings
increases with that bias. And as pointed out earlier, this statistically
increases the difference to random precision by roughly the
positive rating density.</p>
      <p>This effect is not observed in true precision. In fact, and quite
paradoxically, the true precision of relevant popularity degrades
with the positive rating bias (the curves display a decreasing
trend). This is explained by a non-trivial interaction between a
viral network effect and the recommendation protocol. The
communication setting in these experiments is of extreme
diffusion, since users always decide to share items every time they
get a turn. Thus the first items to be found, by chance of exogenous
discovery by some user, spread quickly through the network and
accumulate a comparatively high number of ratings. The
recommendation protocol mandates that users not be
recommended items they have already rated themselves. This
means that early rated items will be excluded from
recommendations of more target users, as they got more ratings
earlier. As  ( | ,  ) grows, generated ratings lean
towards items with high  ( | ). Thus the items excluded
more often in the protocol will tend to be the most relevant ones,
whereby true relevance decreases for statistical reasons: a decrease
of the effective relevance density in the set of candidate items.</p>
    </sec>
    <sec id="sec-12">
      <title>4.4 Network effects</title>
      <p>
        As we have seen, in addition to the effect of individual users’
behavior, further network effects may emerge from social-level
dynamics, which end up affecting popularity. In order to complete
the observation of viral effects just discussed in the previous
section, we repeat the same experiments with lower communication
rates  ( | ,  ) =  ( | , ¬ ) = 0.36,
which produces a positive rating distribution similar to MovieLens
1M –we omit the details about this for the sake of space.
Figure 4b shows the results. We see that the paradoxical effect of the
positive rating bias in the true precision of relevant popularity
disappears. Now in fact there is no dependence on the bias, as one should
expect since relevant popularity ignores negative ratings, true
precision ignores all ratings, and the correlation between relevance and
positive ratings –which determines the effectiveness of relevant
popularity– is achieved as soon as sufficient (a small number of)
positive ratings have been generated. The exclusion of rated items is
not that badly biased against relevant items because discovery being
more evenly distributed, the generated ratings are not extremely
skewed towards relevant items as before with viral propagation, and
these “good items” are thus excluded from fewer recommendations.
The true precision of simple popularity does depend on the ratio of
positive vs. negative ratings, and this is clearly shown in the
corresponding graphic, where in fact precision steps up as soon as
 ( | ,  ) &gt;  ( |  , ¬ ).
We also examine whether the network structure can be a factor in
the observed dynamics. For this purpose, we run the same
simulation on a Barabási-Albert (BA) graph [
        <xref ref-type="bibr" rid="ref3">2</xref>
        ] with the same number of
users and friendship links as in the Facebook (FB) dataset. We take
a simple configuration with  ( | ,  ) =
 ( | , ¬ ) = 1,  ( | ,  ) = 1 and
 ( | , ¬ ) = 0, that is, users share everything
they discover, and rate only what they like.
      </p>
      <p>Figure 5 shows the difference in the distribution of discovery and
positive ratings on each graph. Discovery is steeper in BA than FB,
with the best known items been discovered by almost all users, and
the least known been discovered by almost none. Discovery is
neutral with respect to relevance in this configuration (hence the
green and red plots do not show correlation). The positive rating
distribution correlates with discovery because the more an item is
discovered the more chances it gets to be rated. It also correlates
with relevance, because of the rating-relevance bias configuration.
Figure 6 shows the effect of this in the resulting precision. The
results do not differ significantly between the two types of graphs,
except mainly for the true precision of relevant popularity, which
is good on FB, but close to random on BA. This is because
information travels faster on the preferential attachment model of BA
and the viral effect hurts true precision, whereas information takes
longer to move outside friendship clusters on FB and popularity
retains a better effectiveness. Note that in this setting, simple and
relevant popularity are equivalent since all the generated ratings
are positive as per the setting  ( | , ¬ ) = 0.
Popularity gets a very slight advantage in observed precision on
BA. We attribute this to an effect of a steeper positive rating
distribution with this graph, which results in the 10 topmost popular
items accumulating a higher number of positive ratings, thereby
yielding a higher observed   @10.
0.1
0</p>
      <sec id="sec-12-1">
        <title>Relevant popularity</title>
      </sec>
      <sec id="sec-12-2">
        <title>Random recommendation</title>
      </sec>
      <sec id="sec-12-3">
        <title>Observed True Observed True</title>
      </sec>
      <sec id="sec-12-4">
        <title>Barabási-Albert</title>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>4.5 Effectiveness disagreement</title>
      <p>The biases and effects we have analyzed can even cause a
disagreement between observed and true relevance not just in quantitative
terms, but also in terms of the comparison of two recommenders, in
this case popularity and random. Figure 7 shows one such example
on the FB graph. Simply with maximum “anti-relevance”
communication bias  ( | ,  ) = 0,  ( | , ¬ )
= 1, and neutral rating  ( | ,  ) =
 ( | , ¬ ) = 1, we get a negative direct
correlation between positive ratings and relevance: the items with the most
positive ratings are the ones with the least users liking them. As a
consequence, the true precision of both simple and relevant
popularity is worse than random. Oblivious of the full true relevance, the
observed precision of popularity is still higher than random (which is
too low to be barely visible in the figure).</p>
    </sec>
    <sec id="sec-14">
      <title>5. CONCLUSION</title>
      <p>We have studied the effect of different aspects of user behavior in
social networks on the effectiveness of popularity-based
recommendation. We define a formal model to represent such factors, based on
which we can simulate different situations and observe the resulting
effects. The analysis sheds findings such as a) popularity is most
effective when users have a bias towards items they like when they
share and rate items; b) highly active inter-user communication and
viral information propagation pumps up observed precision –if
communication is intense, even sharing non-liked items improves
observed precision a bit further; c) viral propagation and a bias
towards rating liked items can cause a decrease in true precision due
to the exclusion of rated items from recommendations; d) viral
propagation of negative opinions can cause a disagreement between
measured and true precision even in terms of system comparisons; e)
network structure is an additional factor for the effectiveness of
precision, as it determines to a significant extent the speed of
information transmission and discovery, thus intensifying or moderating
the viral effects and their consequences on popularity.</p>
      <p>The possibilities for continuation of the presented work are manifold.
We are currently aiming to back the reported empirical observations
with a formal analysis of the explored model and effects. Beyond
that, the simplifications assumed so far can be relaxed in many
directions: we may consider different inter-user communication
modalities (e.g. communicate with more than one friend per turn,
etc.), introduce the distinction between user discovery and item
consumption, non-uniform user behaviors, biases in extrinsic item
discovery (e.g. relevance bias in user searches), discovery loop from
recommendations, temporal dynamics (e.g. new items and users
keep entering the system), further dependencies between events
(such as user decisions and choices depending on discovery source),
0.25</p>
      <sec id="sec-14-1">
        <title>Relevant popularity</title>
      </sec>
      <sec id="sec-14-2">
        <title>Random recommendation Observed True</title>
        <p>dynamic networks, non-static user preferences (including effects of
social influence in preference formation and propagation), etc. A
user study to check what trends are given in practice in specific
social environments would also be highly relevant to our research.</p>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>7. REFERENCES</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>This work was supported by the national Spanish Government (grant nr</article-title>
          .
          <source>TIN2013-47090-C3-2).</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Bakshy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosenn</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marlow</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Adamic</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>The Role of Social Networks in Information Diffusion</article-title>
          .
          <source>WWW</source>
          <year>2012</year>
          , Lyon, France,
          <year>April 2012</year>
          ,
          <fpage>519</fpage>
          -
          <lpage>528</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Barabási</surname>
            ,
            <given-names>A.-L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Albert</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Emergence of scaling in random networks</article-title>
          .
          <source>Science</source>
          <volume>286</volume>
          (
          <issue>5439</issue>
          ),
          <year>October 1999</year>
          ,
          <fpage>509</fpage>
          -
          <lpage>512</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Blattner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Medo</surname>
            ,
            <given-names>M. Recommendation</given-names>
          </string-name>
          <article-title>Systems in the Scope of Opinion Formation: a Model</article-title>
          .
          <source>Decisions Workshop at RecSys</source>
          <year>2012</year>
          , Dublin, Ireland,
          <year>September 2012</year>
          ,
          <fpage>32</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Celma</surname>
          </string-name>
          , Ò. and
          <string-name>
            <surname>Herrera</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>A new approach to evaluating novel recommendations</article-title>
          .
          <source>RecSys</source>
          <year>2008</year>
          , Lausane, Switzerland,
          <year>October 2008</year>
          ,
          <fpage>179</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Cremonesi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Turrin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Performance of recommender algorithms on top-n recommendation tasks</article-title>
          .
          <source>RecSys</source>
          <year>2010</year>
          , Barcelona, Spain,
          <year>September 2010</year>
          ,
          <fpage>39</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Doerr</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fouz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <article-title>Why rumors spread so quickly in social networks</article-title>
          .
          <source>Comm. ACM</source>
          <volume>55</volume>
          (
          <issue>6</issue>
          ),
          <source>Jan</source>
          <year>2012</year>
          ,
          <fpage>70</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Herlocker</surname>
            ,
            <given-names>J. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Konstan</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Terveen</surname>
            ,
            <given-names>L. G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Riedl</surname>
            ,
            <given-names>J. T.</given-names>
          </string-name>
          <article-title>Evaluating collaborative filtering recommender systems</article-title>
          .
          <source>ACM Trans. Information Systems</source>
          <volume>22</volume>
          (
          <issue>1</issue>
          ),
          <year>January 2004</year>
          ,
          <fpage>5</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Marlin</surname>
            ,
            <given-names>B. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zemel</surname>
            ,
            <given-names>R. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roweis</surname>
            ,
            <given-names>S. T.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Slaney</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Collaborative Filtering and the Missing at Random Assumption</article-title>
          .
          <source>UAI</source>
          <year>2007</year>
          , Vancouver, Canada,
          <year>July 2007</year>
          ,
          <fpage>267</fpage>
          -
          <lpage>275</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <surname>McAuley</surname>
            ,
            <given-names>J. J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Leskovec</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Learning to Discover Social Circles in Ego Networks</article-title>
          .
          <source>NIPS</source>
          <year>2012</year>
          ,
          <string-name>
            <surname>Lake</surname>
            <given-names>Tahoe</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NV</surname>
          </string-name>
          , USA,
          <year>December 2012</year>
          ,
          <fpage>548</fpage>
          -
          <lpage>556</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Myers</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Leskovec</surname>
            ,
            <given-names>J. Information</given-names>
          </string-name>
          <string-name>
            <surname>Diffusion</surname>
          </string-name>
          and
          <article-title>External Influence in Networks</article-title>
          .
          <source>KDD</source>
          <year>2012</year>
          , Beijing, China,
          <year>August 2012</year>
          ,
          <fpage>33</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Pradel</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Usunier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Gallinari</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>Ranking with non-random missing ratings: influence of popularity and positivity on evaluation metrics</article-title>
          .
          <source>RecSys</source>
          <year>2012</year>
          , Dublin,
          <year>September 2012</year>
          ,
          <fpage>147</fpage>
          -
          <lpage>154</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Steck</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>Training and testing of recommender systems on data missing not at random</article-title>
          .
          <source>KDD</source>
          <year>2010</year>
          , Washington, DC, USA,
          <year>July 2010</year>
          ,
          <fpage>713</fpage>
          -
          <lpage>722</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Steck</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>Item popularity and Recommendation Accuracy</article-title>
          .
          <source>RecSys</source>
          <year>2011</year>
          , Chicago, IL, USA,
          <year>October 2011</year>
          ,
          <fpage>125</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Steck</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>Evaluation of recommendations: rating-prediction and ranking</article-title>
          .
          <source>RecSys</source>
          <year>2013</year>
          ,
          <string-name>
            <given-names>Hong</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <year>October 2013</year>
          ,
          <fpage>213</fpage>
          -
          <lpage>220</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>