Recommendation systems in the scope of opinion
                      formation: a model

                             Marcel Blattner                                                 Matus Medo
                    Laboratory for Web Science                                           Physics Department
                University of Applied Sciences FFHS                                      University of Fribourg
                      Regensdorf, Switzerland                                            Fribourg, Switzerland
                      marcel.blattner@ffhs.ch                                         matus.medo@unifr.ch

ABSTRACT                                                                   on its own in the 90s [42, 20, 28, 21, 11]. The interest in rec-
Aggregated data in real world recommender applications of-                 ommendation systems increased steadily in recent years, and
ten feature fat-tailed distributions of the number of times                attracted researchers from different fields [43]. The success
individual items have been rated or favored. We propose a                  of highly rated Internet sites as Amazon, Netflix, YouTube,
model to simulate such data. The model is mainly based on                  Yahoo, Last.fm and others is to a large extent based on
social interactions and opinion formation taking place on a                their recommender engines. Corresponding applications rec-
complex network with a given topology. A threshold mech-                   ommend everything from CD/DVD’s, movies, jokes, books,
anism is used to govern the decision making process that                   web sites to more complex items such as financial services.
determines whether a user is or is not interested in an item.                 The most popular techniques related to recommendation
We demonstrate the validity of the model by fitting atten-                 systems are collaborative filtering [8, 26, 11, 24, 28, 21, 41,
dance distributions from different real data sets. The model               45] and content-based filtering [14, 40, 35, 5, 30]. In ad-
is mathematically analyzed by investigating its master equa-               dition, researchers developed alternative methods inspired
tion. Our approach provides an attempt to understand rec-                  by fields as diverse as machine learning, graph theory, and
ommender system’s data as a social process. The model can                  physics [16, 17, 37, 52, 51, 10, 48, 50]. Furthermore, rec-
serve as a starting point to generate artificial data sets useful          ommendation systems have been investigated in connection
for testing and evaluating recommender systems.                            with trust [2, 39, 47, 32, 33] and personalized web search [9,
                                                                           12, 46], which constitutes the new research frontier in search
                                                                           engines.
Categories and Subject Descriptors                                            However, there are still many open challenges in the re-
H.1.m [Information Systems]: Miscellaneous                                 search field of recommendation systems [1, 22, 25, 18, 24, 43,
                                                                           15]. One key question is connected to the understanding of
                                                                           the user rating mechanism. We build on a well documented
General Terms                                                              influence of social interactions with peers on the decision to
Experimentation, Theory                                                    vote, favor, or even purchase an item [44, 27]. We propose
                                                                           a model inspired by opinion formation taking place on a
                                                                           complex network with a predefined topology. Our model is
Keywords                                                                   able to generate data observed in real world recommender
recommender systems, opinion formation, complex networks                   systems. Despite its simplicity, the model is flexible enough
                                                                           to generate a wide range of different patterns. We mathe-
                                                                           matically analyze the model using a mean field approach to
1.    INTRODUCTION                                                         the full Master Equation. Our approach provides an under-
   This is the information age. We are witnessing informa-                 standing of the data in recommender systems as a product
tion production and consumption in a speed never seen be-                  of social processes. The model can serve as a data genera-
fore. The WEB2.0 paradigm enables consumers and produc-                    tor which is valuable for testing and evaluation purposes for
ers to exchange data in a collaborative way benefiting both                recommender systems.
parties. However, one of the key challenges in our digitally-                 The rest of the paper is organized as follows. The model
driven society is information overload [7]. We have the ’pain              is outlined in Sec. (2). Methods, data set descriptions, and
of choice’. Recommendation systems represent a possible so-                validation procedures are in Sec. (3). Results are presented
lution to this problem. They have emerged as a research area               in Sec. (4). Discussion and an outlook for future research
                                                                           directions are in Sec. (5).

                                                                           2.    MODEL
                                                                           2.1    Motivation
Paper presented at the 2012 Decisions@RecSys workshop in conjunction         Our daily decisions are heavily influenced by various infor-
with the 6th ACM conference on Recommender Systems. Copyright 2012         mation channels: advertisement, broadcastings, social inter-
for the individual papers by the papers’ authors. Copying permitted for
private and academic purposes. This volume is published and copyrighted    actions, and many others. Social ties (word-of-mouth) play
by its editors.                                                            a pivotal role in consumers buying decisions [44, 27]. It was


                                                                      32
demonstrated by many researchers that personal communi-              the connections of the corresponding Influence-Network. From
cation and informal information exchange not only influence          our model’s point of view this means the following: an in-
purchase decisions and opinions, but shape our expectations          dividual’s IIA for a particular item i may be shifted due
of a product or service [49, 4, 3]. On the other hand, it was        to social interactions with directly connected peers (these
shown [23], that social benefits are a major motivation to           interactions thus take place on the corresponding IN), who
participate on opinion platforms. If somebody is influenced          already experienced the product or service in question. This
by recommendations on an opinion platform like MovieLens             process can shift the Intrinsic-Item-Anticipation of an indi-
or Amazon, social interactions and word-of-mouth in general          vidual who did not yet experience product/object i closer to
are additional forces governing the decision making process          or beyond the critical-anticipation-threshold.
to purchase or even to rate an object in a particular way               We now summarize the basic ingredients of our model.
[31].                                                                An individual user’s opinions on objects are assembled in
   Our model is formulated within an opinion formation frame-        two consecutive stages: i) opinion making based on different
work where social ties play a major role. We shall discuss           external sources, including suggestions by recommendation
the following main ingredients of our model:                         systems and ii) opinion making based on social interactions
                                                                     in the Influence-Network. The second process may shift the
   • Influence-Network (IN)
                                                                     opinions generated by the first process.
   • Intrinsic-Item-Anticipation (IIA)
                                                                     2.2    Mathematical formulation of the model
   • Influence-Dynamics (ID)
                                                                        In this section we firstly describe how individuals’ Intrinsic-
                                                                     Item-Anticipations may change due to social interactions
Influence Network.                                                   taking place on a particular Influence-Network. Secondly, we
   We call the network where context-relevant information            introduce dynamical processes governing the opinion prop-
exchange takes place an Influence-Network (IN). Nodes of             agation.
the IN are people and connections between nodes indicate
the influence among them. Note that we put no constraints            IIA shift.
on the nature of how these connections are realized. They               We model a possible shift in the IIA as:
may be purely virtual (over the Internet) or based on physi-                                           (1−γ)
cal meetings. We emphasize that INs are domain dependent,                                            Θj
                                                                                      fˆij = fij +             .                   (1)
i.e., for a given community of users, the Influence Network                                          kj
concerning books may differ greatly (in topology, number of
ties, tie strength, etc.) from that concerning another subject       where fˆij is the shifted Intrinsic-Item-Anticipation of indi-
such as food or movies. Indeed, one person’s opinion lead-           vidual j for object i, fij is the unbiased IIA, Θj is the num-
ers (relevant peers) concerning books may be very different          ber of j’s neighbors, who already experienced and liked item
from those for food or other subjects. In this scope, we see         i, kj denotes the total number of j’s neighbors in the cor-
the INs as domain-restricted views on social networks. It            responding IN, and γ ∈ (0, 1) quantifies trust of individuals
is thus reasonable to assume that Influence Networks are             to their peers. An individual j will consume, purchase, or
similar to social interaction networks which often exhibit a         positively rate an item i only if
scale-free topology [6]. However, our model is not restricted                                    fˆij ≥ ∆.                         (2)
to a particular network structure.
                                                                    We identify ∆ as the Critical-Anticipation-Threshold. Val-
Intrinsic-Item-Anticipation.                                        ues of fij are drawn from a probability distribution fi . Since
   Suppose a new product is launched on the market. Ad-             the IIA for each individual is an aggregate of many different
vertisement, marketing campaigns, and other efforts to at-          and largely independent contributions, we assume that fi
tract customers predate the launching process and continue          is normally distributed, fi ∈ N (µi , σ). (Unless stated oth-
after the product started to spread on the market. These            erwise.) To mimic different item anticipations for different
efforts influence product-dependent customer anticipation.          objects i, we draw the mean µi from a uniform distribution
It is clear that the resulting anticipation is a complex com-       U (−, ). We maintain µi , , and σ, so that fi is roughly
bination of many different components including intrinsic           bounded by (−1, 1), i.e., −1 ≤ µ − 3σ < µ + 3σ ≤ 1. Note
product quality and possibly also suggestions from recom-           that fˆij can exceed these boundaries after a shift of the corre-
mendation systems.                                                  sponding IIA occurs. The second term on the right hand side
   In our model we call the above-described anticipation Intrinsic- of Eq. (1) is the influence of j’s neighborhood weighted by
Item-Anticipation (IIA) and measure it by a single number.          trust γ. To better understand the interplay between γ and
It is based on many external sources, except for the influence      the density of attending users in the neighborhood of user
generated by social interactions. It is the opinion on some-        i, ρ := Θj /kj , we refer to Fig. 1. Trust γ ≈ 1 causes a big
thing taken by individuals, before they start to discuss the        shift on the IIA’s even for ρ ≈ 0. On the other hand, γ ≈ 0
subject with their peers. Furthermore, we assume that an            needs high ρ to yield a significant IAA shift. These prop-
individual will invest resources (time/money) into an object        erties are understood as follows: people trusting strongly in
only, if the Intrinsic-Item-Anticipation is above a particular      their peers need only few positive opinions to be convinced,
threshold, which we call Critical-Anticipation-Threshold.           whereas people trusting less in their social environment need
                                                                    considerable more signals to be influenced.
Influence-Dynamics.
   The Influence-Dynamics describes how individuals’ Intrinsic-      Influence-Dynamics.
Item-Anticipations are altered by information exchange via              The Influence-Dynamics proceeds as follows. Firstly, we


                                                               33
                                                                   Algorithm 1 RecSysMod algorithm. P contains the con-
                                                                   figuration parameter for the network. ∆ is the Anticipation
                                                                   Threshold and γ denotes the trust. O ∈ N is the number of
                                                                   objects to simulate. G(N, E) is the network. N is the set of
                                                                   nodes and E is the set of edges.
                                                                     1: procedure RecSysMod I(P, ∆, γ, O)
                                                                     2:    G(N, E) ← GenNetwork(P)
                                                                     3:    for all Objects in O do
                                                                     4:       generate distribution fi from N (µi , σ)
                                                                     5:       for each node j ∈ N in G do
                                                                     6:           draw fij from fi
                                                                     7:           if fij < ∆ then
                                                                     8:               jstate ← S
                                                                     9:           else
                                                                   10:                jstate ← A
                                                                   11:            end if
Figure 1: Contour plot for γ and ρ = Θj /kj . Num-                 12:        end for
bers inside the plot quantify the shift in the IAA as              13:        repeat
a function of γ and ρ.                                             14:            for all j with jstate = S AND Θj > 0 do
                                                                                                  h i(1−γ)
                                                                                                   Θ
                                                                   15:                fˆij ← fij + j  kj
draw an Influence-Network IN(P) with a fixed network topol-        16:              if fˆij < ∆ then
ogy (power-law, Erdős-Rényi, or another). P refers to a          17:                   jstate ← D
set of appropriate parameters for the Influence-Network in         18:              else
question (like network type, number of nodes, etc.). The           19:                   jstate ← A
network’s topology is not affected by the dynamical pro-           20:              end if
cesses (opinion propagation) taking place on it. We justify        21:          end for
this static scenario by assuming that the time scale of the        22:       until |{j|jstate = S AND Θj > 0}| = 0
topology change is much longer then the time scale 1 of opin-      23:    end for
ion spreading in the network. Each node in the Influence-          24: end procedure
Network corresponds to an individual. For each individual
j we draw an unbiased Intrinsic-Item-Anticipation fij from
the predefined probability distribution fi . At each time step,
every individual is in one of the following states: {S, A, D}.     tion for the dynamics. As already said before, two things can
S refers to a susceptible state and corresponds to the initial     happen when a non-attender is connected to an attender:
state for all nodes at t = 0. A refers to an attender state        a) he/she becomes an attender too, or b) he/she becomes
and corresponds to an individual with the property fˆij ≥ ∆.       a denier who will not attend/favor the item. For these two
                                                                   interaction types we formally write:
D refers to a denier state with the property fˆij < ∆ after
an information exchange with his/her peers in the Influence-                                      λ
                                                                                        S + A −→ 2A
Network happened. An individual in state D or A can not                                        α
change his/her state anymore. It is clear that an individual                            S + A −→ D + A                          (3)
in state A cannot back transform to the susceptible state          Here λ denotes the probability that a susceptible node con-
S, since he/she did consume or favor item i and we do not          nected to an attender becomes an attender too, and α is
account for multiple attendances in our model. An indi-            the probability that a susceptible node attached to an at-
vidual in state D was influenced but not convinced by his          tender becomes a denier. To take into account the underly-
opinion leaders (directed connected peers). We make the            ing network topology of the Influence Network it is com-
following assumption here: if individual j’s opinion leaders       mon to introduce compartments k [19]. Let NkA be the
are not able to convince individual j, meaning that individ-       number of nodes in state A with k connections, NkS the
ual’s j Intrinsic Item Anticipation fˆij stays below the criti-    number of nodes in state S with k connections, and NkD
cal threshold ∆ after the influence process, then we assume        the number of nodes in state D with k connections, respec-
that j’s opinion not to attend object i remains unchanged          tively. Furthermore we define the corresponding densities:
in the future. Therefore we have the following possible tran-      ak (t) = NkA /Nk , sk (t) = NkS /Nk and dk (t) = NkD /Nk . Nk is
sitions for each node in the influence network: jS → jA or         the total number of nodes with k connections in the network.
jS → jD . Node states are updated asynchronously which             Since every node from Nk must be in one of the three states,
is more realistic than synchronous updating, especially in         ∀t : ak (t) + sk (t) + dk (t) = 1. A weighted sum over all k
social interaction models [13]. The Influence-Dynamics is          compartments    gives the total fraction of attenders at time t,
summarized in Algorithm 1.                                                 P
                                                                   a(t) = k P (k)ak (t) where P (k) is the degree distribution
                                                                   of the network (it also holds that a(t) = N A (t)/N ). The
Master Equation.                                                   time dependence of our state variables ak (t), dk (t), sk (t) is
    We are now in the position to formulate the Master Equa-                                                    
                                                                                      ȧk (t) = λksk (t)Ω       
1
 The term time scale denotes a dimensionless quantity and
                                                                                                                
                                                                                       ˙
                                                                                      dk (t) = αksk (t)Ω                         (4)
specifies the devisions of time. A shorter time scale means                                                     
a faster spreading of opinions in the network.                                        ṡk (t) = −(α + λ)ksk (t)Ω
                                                                                                                


                                                              34
where Ω is the density of attenders in the neighborhood of         17632 artists, and 92834 user-listended artists relations in
susceptible node with k connections averaged over k                total. In addition, the data set contains 12717 bi-directional
                    X                                              user friendship relations. These data sets are chosen because
                Ω=      P (k)(k − 1)ak / hki           (5)         they exhibit very different attendance distributions and thus
                       k                                           provide an excellent playground to validate our model in dif-
where hki denotes the mean degree of the network. As               ferent settings.
outlined above, λ is the probability that a node in state
S transforms to state A if it is connected to a node in            Experiments.
state A. This happens when fˆij > ∆. Therefore, we have               Data topologies. We firstly investigate the simulated
∆− < fij < ∆ where ∆− = ∆ − (1/k)1−γ . From this we                attendance distributions as a function of trust γ, the an-
           R∆
have λ = ∆ f (x)dx, where f (x) is the expectation dis-            ticipation threshold ∆, and the network topology. For this
              −
                                          R∆                       purpose we simulate the dynamics on a toy network with 500
tribution. Similarly we write for α = l − f (x)dx, where           nodes and record the final attendance number of 300 objects.
l denotes the lower bound of the expectation distribution          The simulation is conducted for ER and PL networks and
f (x). A crude mean field approximation can be obtained by         performed as outlined in the simulations paragraph above.
multiplying the right hand sides of Eq. (4) with P (k) and         In Fig.(2) and Fig.(3) we investigate the skewness [53] of the
summing over k, which yields a set of differential equations       attendance distributions and the maximal attendance ob-
                                                                   tained for the corresponding parameter settings. The skew-
                                               
                ȧ(t) = λ hki s(t)a(t),        
                 ˙
                                                                  ness of a distribution is a measure for the asymmetry around
                d(t) = α hki s(t)a(t),                   (6)
                                                                  its mean value. A positive skewness value means that there
                ṡ(t) = −(α + λ) hki s(t)a(t).
                                               
                                                                   is more weight to the left from the mean, whereas a negative
                                                                   value indicates more weight in the right from the mean.
which is later used to obtain analytical results for the atten-
                                                                      Fitting real data. We explore the model’s ability to fit
dance fraction a(t).
                                                                   real world recommendation attendance distributions found
                                                                   in the described data sets. For this purpose we fix for the
3.   METHODS                                                       Netflix data set a network with 480189 nodes and perform a
  We describe here our simulation procedures, datasets, ex-        simulation for 17770 objects. In the MovieLens case we do
periments, and analytical methods.                                 the same for 943 nodes and 1682 objects and for the Lastfm
                                                                   data set we simulate on a network with 1892 nodes and 17632
Simulations.                                                       objects. In the case of Lastfm we have the social network
  Our simulations employ Alg. (1). As outlined in the model        data as well. We validate our model on that data set by two
section, we do not change the network topology during the          experiments: a) we use the provided user friendship network
dynamical processes. We experiment with two different net-         as simulation input and fit the attendance distribution and
work types, Erdős-Rényi (ER), and power law (PL) which           b) we fit the attendance distribution like in the MovieLens
are both generated by a so-called configuration model [34].        and Netflix case with an artificially generated network.
ER and PL represent two fundamentally different classes               Mathematical analysis. We investigate the Master Equa-
of networks. The former is characterized by a typical de-          tions Eq. (4) and Eq. (6). We provide a full analytical solu-
gree scale (mean degree of the network), whereas the latter        tion for Eq. (6) and an analytical approximation for Eq. (4)
exhibits a fat-tailed degree distribution which is scale free.     in the early spreading stage.
The networks are random and have no degree correlations
and no particular community structure. To obtain repre-            4.   RESULTS
sentative results we stick to the following approach: we fix          Data topologies. The landscape of attendance distribu-
the network type, number of nodes, number of objects, and          tions of our model is demonstrated in Fig. (2) and Fig. (3).
network type relevant parameters to draw an ER or PL net-          To obtain these results, simulations were performed as de-
work. We call this a configuration P. In addition, we fix the      scribed in Sec. (3). The item anticipation fi was drawn from
variance σ of the anticipation distributions fi . We perform       a normal distribution with mean values µi ∈ U (−0.1, 0.1)
each simulation on 50 different networks belonging to the          and variance σ = 0.25 fixed for all items. Both networks
same configuration P and on each network we simulate the           have 500 nodes. In the Erdős-Rényi case, we used a wiring
dynamics 50 times. Then we average the obtained atten-             probability p = 0.03 between nodes. The Power Law net-
dance distributions over all 2500 simulations.                     work was drawn with an exponent δ = 2.25. The simulated
                                                                   attendance distributions in Fig.(2) and Fig.(3) show a wide
Datasets.                                                          range of different patterns for both ER and PL Influence-
   To show the validity of our model we use real world rec-        Networks. In particular, both network types can serve as
ommender datasets. MovieLens (movielens.umn.edu), a                a basis for attendance distributions with both positive and
web service from GroupLens (grouplens.org) where ratings           negative skewness. Therefore, the observed fat-tailed distri-
are recorded on a five stars scale. The data set contains          butions are not a result of the heterogeneity of a scale free
1682 movies and 943 users. Only 6, 5% of possible votes are        network but they are emergent properties of the dynamics
expressed. Netflix data set (netflix.com). We use the Net-         produced by our model. The parameter region for highly
flix grand prize data set which contains 480189 users and          positively-skewed distributions is the same for both network
17770 movies and also uses a five stars scale. Lastfm data         types. The parameters γ and ∆ can be tuned so that all
set (Lastfm.com). This data set contains social networking,        items are attended by everybody or all items are attended
tagging, and music artist listening information from users         by nobody. While not relevant for simulating realistic atten-
of the Last.fm online music system. There are 1892 users,          dance distributions, these extreme cases help to understand


                                                              35
the model’s flexibility.


                                                                Figure 4: Fit of the MovieLens attendance dis-
                                                                tribution with trust γ = 0.50, critical anticipation
Figure 2: Skewness of the attendance distributions              threshold ∆ = 0.6, anticipation distribution variance
as a function of trust γ and the critical anticipation          σ = 0.25, and power law network with exponent
threshold ∆ for Erdős-Rény networks with 500 nodes            δ = 2.25, 943 nodes, and 1682 simulated objects.
and 300 simulated items.


                                                                Figure 5: Fit of the Netflix attendance distribution
                                                                with trust γ = 0.52, critical anticipation threshold
Figure 3: Skewness of the attendance distributions              ∆ = 0.72, anticipation distribution variance σ = 0.27,
as a function of trust γ and the critical anticipation          and power law network with exponent δ = 2.2, 480189
threshold ∆ for power-law networks with 500 nodes               nodes, and 17770 simulated objects.
and 300 simulated items.
                                                                                                 Ru
   Fitting real data We fit real world recommender data         ditions for the first movers a0 = ∆ f (x)dx, s(0) = 1 − a(0),
from MovieLens, Netflix and Lastfm with results reported        and d(0) = 0. In the following we use the bra-ket nota-
in Fig. (4), Fig. (5), Fig. (6), Fig. (7), and Tab. (1), re-    tion hxi to represent the average of a quantity x. Standard
spectively. The real and simulated distributions are com-       methods can now be used to arrive at2
pared using Kullback-Leibler (KL) divergence [29]. We re-
port the mean, median, maximum, and minimum of the                                         (τ hki)−1 exp(t/τ )
                                                                        a(t) =                                           .   (7)
simulated and real attendance distributions. Trust γ, antic-                     (α + λ) [exp(t/τ ) − 1] + (τ hki a0 )−1
ipation threshold ∆, and anticipation distribution variance     Here τ is the time scale of the propagation which is defined
σ are reported in figure captions. We also compare the aver-    as
aged mean degree, maximum degree, minimum degree, and
clustering coefficient of the real Lastfm social network and                       τ = (a0 α hki + λ hki)−1 .                (8)
networks obtained to fit the data. Results are reported in                                                        −1
                                                                This is similar to the time scale τ = (λ hki)       in the well
Tab. (2) and Fig. (8). Note that thus obtained parameter
                                                                known SI Model [38, 6]. Eq.(7) can be very useful in predict-
values can be useful also in real applications where, assum-
                                                                ing the average behavior of users in a recommender system.
ing that our social opinion formation model is valid, one
                                                                  Since Eq. (4) is not accessible to a full analytical solution,
could detect decline of the overall trust value in an online
                                                                we investigate it for the early stage of the dynamics. As-
community, for example.
   Mathematical analysis. Eq. (6) can be solved analyti-        2
                                                                  We give here only the solution for a(t) because we are
cally. We have ∀t : a(t)+s(t)+d(t) = 1 with the initial con-    mainly interested in the attendance dynamics.


                                                           36
                                                                     D        KL        Med        Mean            Max          Min
                                                                     ML       0.046    27/26       59/60         583/485         1/1
                                                                     NF       0.030   561/561    5654/5837    232944/193424     3/16
                                                                     LFM1     0.05      1/1       5.3/5.2        611/503         1/1
                                                                     LFM2     0.028     1/1       5.3/5.8        611/547         1/1

                                                                    Table 1: Simulation results. ML: Movielens, NF:
                                                                    Netflix, LFM1: Lastfm with real network, LFM2:
                                                                    Lastfm with simulated network, KL: Kullback-
                                                                    Leibler divergence, Med: Median, Mean, Max:
                                                                    maximal attendance (data/simulated), Min: mini-
                                                                    mal attendance (data/simulated).

                                                                     D        hki     kmin   kmax     δ      C
                                                                     LFM1     13.4     1      119   2.3    0.186
                                                                     LFM2     12.0     1      118   2.25    0.06
Figure 6: Fit of the Lastfm attendance distribution
with trust γ = 0.4, critical anticipation threshold ∆ =             Table 2: Mean, minimum, maximum degree, clus-
0.8, anticipation distribution variance σ = 0.24, and               tering coefficient C, and estimated exponent δ of the
real Lastfm user friendship network with 1892 nodes                 real (LFM1) and simulated (LFM2) social network
and 17632 simulated objects.                                        for the Lastfm data set.


Figure 7: Fit of the Lastfm attendance distribution                 Figure 8: Log-log plot of real (red) and simulated
with trust γ = 0.6, critical anticipation threshold                 (blue) social network degree distribution P (k) for
∆ = 0.8, anticipation distribution variance σ = 0.24,               the Lastfm data set. Inset: plot of the cumulative
and power law network with exponent δ = 2.25, 1892                  degree distribution.
nodes and 17632 simulated objects.

                                                                    We emphasize that Eq.(10) is valuable in predicting users’
suming a(0) = a0  0, we can neglect the dynamics of d(t)
                                                                    behavior of a recommender system in an early stage.
to obtain
                                 !
                          k2
                Ω̇(t) =
                         hki
                             − 1 Ω(t).                              5.   DISCUSSION
                                                                       Social influence and our peers are known to form and in-
In addition, Eq. (4) yields                                         fluence many of our opinions and, ultimately, decisions. We
                                                )                   propose here a simple model which is based on heteroge-
             ȧk (t) = λk(1 − ak (t))Ω(t)
                                                            (9)     neous agent expectations, a social network, and a formalized
             ṡk (t) = −(α + λ)k(1 − ak (t))Ω(t)                    social influence mechanism. We analyze the model by nu-
                                                                    merical simulations and by master equation approach which
Neglecting terms of order a2k (t) and summing the solution          is particularly suitable to describe the initial phase of the
of ak (t) over P (k), we get a result for the early spreading       social “contagion”. The proposed model is able to generate
stage                                                               a wide range of different attendance distributions, includ-
                                               
              a(t) = a(0) 1 + τ λ exp(t/τ ) − 1 ,          (10)     ing those observed in popular real systems (Netflix, Lastfm,
                                                                    and Movielens). In addition, we showed that these patterns
with the timescale τ = k2 / λ( k2 − hki) . The obtained
                                              
                                                                    are emergent properties of the dynamics and not imposed
time scale τ valid in the early stage of the opinion spreading      by topology of the underlying social network. Of particular
is clearly dominated by the network heterogeneity. This re-         interest is the case of Lastfm where the underlying social
sult is in line with known disease models, e.g., SI,SIR [38, 6].    network is known. Calibrating the observed attendance dis-


                                                               37
tribution against the model then leads not only to social           [9] A. Birukov, E. Blanzieri, and P. Giorgini. Implicit: An
influence parameters but also to the degree distribution of             agent-based recommendation system for web search.
the social network which agrees with that of the true social            In Proceedings of the fourth international joint
network.                                                                conference on Autonomous agents and multiagent
   The Kullback-Leibler distances (KL) for the simulated                systems, pages 618–624. ACM, 2005.
and real attendance distributions are below 0.05 in all cases,     [10] M. Blattner. B-rank: A top N recommendation
thus demonstrating a good fit. However, the maximum at-                 algorithm. In Proceedings of The International
tendances could not be reproduced exactly by the model.                 Multi-Conference on Complexity, Informatics and
One reason may be missing degree correlations in the sim-               Cybernetics (IMCIC 2010), volume 1, pages 337–341,
ulated networks in contrast to real networks where positive             Orlando, USA, 2010.
degree correlations (so-called degree assortativity) are com-      [11] J. Breese, D. Heckerman, and C. Kadie. Empirical
mon. For the Lastfm user friendship network we observe a                analysis of predictive algorithms for collaborative
higher clustering coefficient C ≈ 0.18 compared to the clus-            filtering. In Proceedings of the 14th Annual Conference
tering coefficient C ≈ 0.06 in the simulated network. To                on Uncertainty in Artificial Intelligence (UAI-98),
compensate for this, a higher trust parameter γ is needed to            pages 43–52, San Francisco, CA, 1998. Morgan
fit the real Lastfm attendance distribution with simulated              Kaufmann.
networks.                                                          [12] P. Brusilovsky, A. Kobsa, and W. Nejdl. The adaptive
   We are aware that our statistics to validate the model               web: methods and strategies of web personalization.
is not complete. But we are confident, that our approach                Springer-Verlag New York Inc, 2007.
points to a fruitful research direction to understand recom-       [13] G. Caron-Lormier, R. Humphry, D. Bohan, C. Hawes,
mender systems’ data as a social driven process.                        and P. Thorbek. Asynchronous and synchronous
   The proposed model can be a first step towards a data                updating in individual-based models. ecological
generator to simulate bipartite user-object data with real-             modelling, 212(3-4):522–527, 2008.
world data properties. This could be used to test and val-
                                                                   [14] M. Claypool, A. Gokhale, T. Miranda, P. Murnikov,
idate new recommender algorithms and methods. Future
                                                                        D. Netes, and M. Sartin. Combining content-based
research directions may expand the proposed model to gen-
                                                                        and collaborative filters in an online newspaper. In
erate ratings within a predefined scale. Moreover, it could
                                                                        Proc. ACM SIGIR 99, Workshop Recommender
be very interesting to investigate the model in the scope of
                                                                        Systems: Algorithms and Evaluation, 1999.
social imitation [36].
                                                                   [15] H. Drachsler, T. Bogers, R. Vuorikari, K. Verbert,
                                                                        E. Duval, N. Manouselis, G. Beham, S. Lindstaedt,
6.   REFERENCES                                                         H. Stern, M. Friedrich, et al. Issues and considerations
 [1] G. Adomavicius and A. Tuzhilin. Toward the next                    regarding sharable data sets for recommender systems
     generation of recommender systems: A survey of the                 in technology enhanced learning. Procedia Computer
     state-of-the-art and possible extensions. IEEE                     Science, 1(2):2849–2858, 2010.
     transactions on knowledge and data engineering, pages         [16] F. Fouss, A. Pirotte, J. Renders, and M. Saerens.
     734–749, 2005.                                                     Random-walk computation of similarities between
 [2] R. Andersen, C. Borgs, J. Chayes, U. Feige,                        nodes of a graph with application to collaborative
     A. Flaxman, A. Kalai, V. Mirrokni, and                             recommendation. Knowledge and Data Engineering,
     M. Tennenholtz. Trust-based recommendation                         IEEE Transactions on, 19(3):355–369, 2007.
     systems: an axiomatic approach. In Proceeding of the          [17] F. Fouss, L. Yen, A. Pirotte, and M. Saerens. An
     17th international conference on World Wide Web,                   experimental investigation of graph kernels on a
     pages 199–208. ACM, 2008.                                          collaborative recommendation task. In Data Mining,
 [3] E. Anderson and L. Salisbury. The formation of                     2006. ICDM’06. Sixth International Conference on,
     market-level expectations and its covariates. Journal              pages 863–868. IEEE, 2007.
     of Consumer Research, 30(1):115–124, 2003.                    [18] W. Geyer, J. Freyne, B. Mobasher, S. Anand, and
 [4] J. Arndt. Role of product-related conversations in the             C. Dugan. 2nd workshop on recommender systems
     diffusion of a new product. Journal of Marketing                   and the social web. In Proceedings of the fourth ACM
     Research, 4(3):291–295, 1967.                                      conference on Recommender systems, pages 379–380.
 [5] M. Balabanović and Y. Shoham. Fab: content-based,                 ACM, 2010.
     collaborative recommendation. Communications of the           [19] J. Gleeson. High-accuracy approximation of
     ACM, 40(3):66–72, 1997.                                            binary-state dynamics on networks. Physical Review
 [6] A. Barrat, M. Barthlemy, and A. Vespignani.                        Letters, 107(6):68701, 2011.
     Dynamical Processes on Complex Networks.                      [20] D. Goldberg, B. D. Nichols, and D. Terry. Using
     Cambridge University Press New York, NY, USA,                      collaborative filtering to weave an information
     2008.                                                              tapestry. Commun. ACM, 35(12):61–70, 1992.
 [7] S. Bergamaschi, F. Guerra, and B. Leiba. Information          [21] N. Good, J. Schafer, J. Konstan, A. Brochers,
     overload. Internet Computing, IEEE, 14(6):10–13,                   B. Sarwar, J. Herlocker, and J. Riedl. Combining
     2010.                                                              collaborative filtering with personal agents for better
 [8] D. Billsus and M. Pazzani. Learning collaborative                  recommendations. In Proc. Conf. Am. Assoc. Artificial
     information filters. In Proceedings of the Fifteenth               Intelligence (AAAI-99), pages 439–446, USA, 1999.
     International Conference on Machine Learning,                 [22] I. Guy, A. Jaimes, P. Agulló, P. Moore, P. Nandy,
     volume 54, page 48, 1998.                                          C. Nastar, and H. Schinzel. Will recommenders kill


                                                              38
     search?: recommender systems-an industry                   [39] J. O’Donovan and B. Smyth. Trust in recommender
     perspective. In Proceedings of the fourth ACM                   systems. In Proceedings of the 10th international
     conference on Recommender systems, pages 7–12.                  conference on Intelligent user interfaces, pages
     ACM, 2010.                                                      167–174. ACM, 2005.
[23] T. Hennig-Thurau, K. Gwinner, G. Walsh, and                [40] M. Pazzani and D. Billsus. Content-based
     D. Gremler. Electronic word-of-mouth via                        recommendation systems. Lecture Notes Computer
     consumer-opinion platforms: What motivates                      Science, 4321:325–341, 2007.
     consumers to articulate themselves on the Internet?        [41] P. Resnick, N. Iakovou, M. Sushak, P. Bergstrom, and
     Journal of Interactive Marketing, 18(1):38–52, 2004.            J. Riedl. Grouplens: An open architecture for
[24] J. Herlocker, J. Konstan, L. Terveen, and J. Riedl.             collaborative filtering of netwews. In Proc. Computer
     Evaluating collaborative filtering recommender                  Supported Cooberative Work Conf., 1994.
     systems. ACM Trans. Inf. Syst., 22(1):5–53, 2004.          [42] P. Resnick and H. Varian. Recommender systems.
[25] D. Jannach, W. Geyer, J. Freyne, S. Anand,                      Commun. ACM, 40:56–58, March 1997.
     C. Dugan, B. Mobasher, and A. Kobsa. Recommender           [43] F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor,
     Systems & the Social Web. In Proceedings of the 2009            editors. Recommender Systems Handbook. Springer,
     ACM Conference on Recommender Systems, RecSys                   2011.
     2009, New York, NY, USA, October 23-25, 2009.              [44] M. Richins and T. Root-Shaffer. THE ROLE OF
     ACM, 2009.                                                      EVOLVEMENT AND OPINION LEADERSHIP IN
[26] K.Goldberg, T. Roeder, D. Gupta, and C. Perkins.                CONSUMER WORD-OF-MOUTH: AN IMPLICIT
     Eigentaste: a constant time collaborative filtering             MODEL MADE EXPLICIT. Advances in consumer
     algorithm. Information Retrieval, (4):133–151, 2001.            research, 15:32–36, 1988.
[27] Y. Kim and J. Srivastava. Impact of social influence in    [45] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl.
     e-commerce decision making. Proceedings of the ninth            Item-based collaborative filtering recommendation
     international conference on Electronic commerce                 algorithms. In WWW ’01: Proceedings of the 10th
     (ICEC), pages 293–302, 2007.                                    international conference on World Wide Web, pages
[28] J. Konstan, B. Miller, D. Maltz, J. Herlocker, and              285–295, New York, NY, USA, 2001. ACM Press.
     L. Gordon. Grouplens: Applying collaborative filtering     [46] K. Sugiyama, K. Hatano, and M. Yoshikawa. Adaptive
     to usenet news. Comm. ACM, 40(3):77–87, 1997.                   web search based on user profile constructed without
[29] S. Kullback and R. Leibler. On information and                  any effort from users. In Proceedings of the 13th
     sufficiency. The Annals of Mathematical Statistics,             international conference on World Wide Web, pages
     22(1):79–86, 1951.                                              675–684. ACM, 2004.
[30] G. Linden, B. Smith, and J. York. Amazon. com              [47] F. Walter, S. Battiston, and F. Schweitzer. A model of
     recommendations: Item-to-item collaborative filtering.          a trust-based recommendation system on a social
     IEEE Internet computing, 7(1):76–80, 2003.                      network. Autonomous Agents and Multi-Agent
[31] M. Mason, R. Dyer, and M. Norton. Neural                        Systems, 16(1):57–74, 2008.
     mechanisms of social influence. Organizational             [48] G. Webb, M. Pazzani, and D. Billsus. Machine
     Behavior and Human Decision Processes,                          learning for user modeling. User Modeling and
     110(2):152–159, 2009.                                           User-Adapted Interaction, 11(1):19–29, 2001.
[32] P. Massa and P. Avesani. Trust-aware collaborative         [49] W. Whyte Jr. The web of word of mouth, 1954.
     filtering for recommender systems. On the Move to          [50] T. Zhang and V. Iyengar. Recommender systems
     Meaningful Internet Systems 2004: CoopIS, DOA, and              using linear classifiers. The Journal of Machine
     ODBASE, pages 492–508, 2004.                                    Learning Research, 2:334, 2002.
[33] P. Massa and B. Bhattacharjee. Using trust in              [51] Y. Zhang, M. Blattner, and Y. Yu. Heat conduction
     recommender systems: an experimental analysis. Trust            process on community networks as a recommendation
     Management, pages 221–235, 2004.                                model. Physical review letters, 99(15):154301, 2007.
[34] M.E.J.Newman. Structure and function of complex            [52] T. Zhou, J. Ren, M. Medo, and Y. Zhang. Bipartite
     networks. SIAM Review, 45:167–256, 2003.                        network projection and personal recommendation.
[35] P. Melville, R. Mooney, and R. Nagarajan.                       Physical Review E, 76(4):46115, 2007.
     Content-boosted collaborative filtering for improved       [53] D. Zwillinger and S. Kokoska. CRC standard
     recommendations. In Proc. 18h Nat’l Conf. ARtificial            probability and statistics tables and formulae. CRC,
     Intelligence, 2002.                                             2000.
[36] Q. Michard and J. Bouchaud. Theory of collective
     opinion shifts: from smooth trends to abrupt swings.
     The European Physical Journal B-Condensed Matter
     and Complex Systems, 47(1):151–159, 2005.
[37] B. Mirza, B. Keller, and N. Ramakrishnan. Studying
     recommendation algorithms by graph analysis.
     Journal of Intelligent Information Systems,
     20(2):131–160, 2003.
[38] M. Newman. Spread of epidemic disease on networks.
     Physical Review E, 66(1):016128, 2002.


                                                           39