<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicola Barbieri</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Massimo Guarascio</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ettore Ritacco ICAR-CNR Via Pietro Bucci</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rende</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>barbieri</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>guarascio</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ritacco}@icar.cnr.it</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <fpage>27</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>Recommender systems are widely used in E-Commerce for making automatic suggestions of new items that could meet the interest of a given user. Collaborative Filtering approaches compute recommendations by assuming that users, who have shown similar behavior in the past, will share a common behavior in the future. According to this assumption, the most e ective collaborative ltering techniques try to discover groups of similar users in order to infer the preferences of the group members. The purpose of this work is to show an empirical comparison of the main collaborative ltering approaches, namely Baseline, Nearest Neighbors, Latent Factor and Probabilistic models, focusing on their strengths and weaknesses. Data used for the analysis are a sample of the well-known Net ix Prize database.</p>
      </abstract>
      <kwd-group>
        <kwd>Recommender Systems</kwd>
        <kwd>Collaborative Filtering</kwd>
        <kwd>Net ix</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The exponential growth of products, services and
information makes fundamental the adoption of intelligent
systems to guide the navigation of the users on the Web. The
goal of Recommender Systems is to pro le a user to suggest
him contents and products of interest. Such systems are
adopted by the major E-commerce companies, for example
Amazon.com 1, to provide a customized view of the systems
to each user. Usually, a recommendation is a list of items,
that the system considers the most attractive to customers.
User pro ling is performed through the analysis of a set of
users' evaluations of purchased/viewed items, typically a
numerical score called rating. Most recommender systems are
based on Collaborative Filtering (CF) techniques [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which
analyze the past behavior of the users, in terms of
previously given ratings, in order to foresee their future choices
and discover their preferences. The main advantage in
using CF techniques relies on their simplicity: only users' past
ratings are used in the learning process, no further
informations, like demographic data or item descriptions, are needed
(techniques that use this knowledge are called Content Based
[
        <xref ref-type="bibr" rid="ref10 ref14">10, 14</xref>
        ]). Four di erent families of techniques have been
studied: Baseline, Neighborhood based, Latent Factor
analysis and Probabilistic models. This work aims to show an
empirical comparison of a set of well-known approaches for
CF, in terms of quality prediction, over a real (non
synthetic) dataset. Several works have focused on the analysis
and performance evaluation of single techniques (i.e.
excluding ensemble approaches), but at the best of our knowledge
there is no previous work that performed such a deep
analysis comparing di erent approaches.
2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>BACKGROUND</title>
      <p>The following notation is used: u is a user, m is a movie,
r^mu is the rating (stored into the data set) expressed by the
user u with respect to the movie m (zero if missing), and
given a CF model, rmu is the predicted rating of the user u
for the movie m. On October 2006, Net ix2, leader in the
movie-rental American market, released a dataset
containing more of 100 million of ratings and promoted a
competition, the Net ix Prize 3, whose goal was to produce a 10%
improvement on the prediction quality achieved by its own
recommender system, Cinematch. The competition lasted
three years and was attended by several research groups from
all over the world. The dataset is a set of tuple (u; m; r^mu)
and the model comparison is performed over a portion of the
entire Net ix data 4. This portion is a random sample of
the data, and is divided into two sets: a training set D and a
test set T . D contains 5; 714; 427 ratings of 435; 659 users on
2; 961 movies, T consists of 3; 773; 781 ratings (independent
from the training set) of a subset of training users (389; 305)
on the same set of movies. The evaluation criterion chosen
is the Root Mean Squared Error (RMSE):</p>
      <p>RM SE =
s P
(u;m) 2 T (rmu</p>
      <p>r^mu)2
jT j
(1)
Cinematch achieves (over the entire Net ix test set) an RMSE
value equals to 0:9525, while the team BellKor's Pragmatic
Chaos, that won the prize, achieved a RMSE of 0:8567. This
score was produced using an ensemble of severeal predictors.
2http://www.net ix.com/
3http://www.net ixprize.com/
4http://repository.icar.cnr.it/sample net ix/</p>
    </sec>
    <sec id="sec-3">
      <title>COLLABORATIVE</title>
    </sec>
    <sec id="sec-4">
      <title>FILTERING MODELS</title>
      <p>Studied models belong to four algorithm families:
Baseline, Nearest Neighbor, Latent Factor and Probabilistic
models. A detailed description of all the analyzed techniques
follows.
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Baseline Models</title>
      <p>Baseline algorithms are the simplest approaches for
rating prediction. This section will focus on the analysis of
the following algorithms: OverallMean, MovieAvg, UserAvg,
DoubleCentering. OverallMean computes the mean of all
ratings in the training set, this value is returned as
prediction for each pair (u; m). MovieAvg predicts the rating of
a pair (u; m) as the mean of all ratings received by m in
the training set. Similarly, UserAvg predicts the rating of a
pair (u; m) as the mean of all ratings given by u. Given a
pair (u; m), DoubleCentering compute separately the mean
of the ratings of the movie rm, and the mean of all the
ratings given by the user ru. The value of the prediction is a
linear combination of these means:
rmu =
rm + (1
) ru
(2)
where 0
best value for
1. Experiments on T have shown that the
is 0:6 (see Fig. 1).</p>
    </sec>
    <sec id="sec-6">
      <title>Nearest Neighbor models</title>
      <p>
        Neighborhood based approaches compute the prediction
basing on a chosen portion of the data. The most common
formulation of the neighborhood approach is the
K-NearestNeighbors (K-NN). rmu is computed following simple steps.
A similarity function associates a numerical coe cient to
each pair of user, then K-NN nds the neighborhood of u
selecting the K most similar users to him, said neighbors.
The rating prediction is computed as the average of the
ratings in the neighborhood, weighted by the similarity coe
cients. User-based K-NN algorithm is intuitive but doesn't
scale because it requires the computation of similarity
coefcients for each pair of users. A more scalable formulation
can be obtained considering an item-based approach [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]:
the predicted rating for the pair (u; m) can be computed by
aggregating the ratings given by u on the K most similar
movies to m: fm1; : : : ; mK g. The underlying assumption is
that the user might prefer movies more similar to the ones
he liked before, because they share similar features. In this
approach the number of similarity coe cients (respectively
fs1; : : : ; sK g) depends on the number of movies which is
much smaller than the number of users. The prediction is
computed as:
rmu =
      </p>
      <p>
        PK
i=1 si rmui
PK
i=1 si
In the rest of the paper, only item-based K-NN algorithms
will be considered. The similarity function plays a central
role : its coe cients are necessary for the identi cation of
the neighbors and they act as weights in the prediction. Two
functions, commonly used for CF, are Pearson Correlation
and Adjusted Cosine [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] coe cients: preliminary studies
proved that Pearson Correlation is more e ective in
detecting similarities than Adjusted Cosine. Moreover as discussed
in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], similarity coe cients based on a larger support are
more reliable than the ones computed using few rating
values, so it is a common practice to weight the similarity
coefcients using the support size, technique often called
shrinkage. Shrinkage is performed as follows. Let U (mi; mj) be
the set of users that rated movies mi and mj, and let smi;mj
be the similarity coe cient between these two movies:
(3)
(4)
(5)
(6)
(7)
s0mi;mj =
smi;mj jU (mi; mj)j
      </p>
      <p>jU (mi; mj)j +
Where is an empirical value. Experiments showed that the
best value for is 100, so in the following K-NN algorithms
with Pearson Correlation and shrinkage with = 100 will
be considered. This rst model will be called SimpleK-NN.
An improved version can be obtained considering the
difference of preference of u with respect to the movies in the
neighborhood (fm1; : : : ; mK g) of m. Formally:
rmu = bum +</p>
      <p>PK
i=1 si (r^mui</p>
      <p>
        PK
i=1 si
bumi )
Where fs1; : : : ; sK g are the similarity coe cients between m
and its neighbors, bum and bmi are baseline values computed
u
using Eq. 2. In this case the model is named BaselineK-NN,
otherwise, if the baseline values are computed according to
the so called User E ect Model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the model will be called
K-NN (user e ect). An alternative way to estimate
itemto-item interpolation weights is by solving a least squares
problem minimizing the error of the prediction rule. This
strategy, proposed in [
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ], de nes the Neighborhood
Relationship Model, one of the most e ective approaches applied
during the Net ix prize. rmu is computed as:
rmu =
      </p>
      <p>K
X wmmi r^mi</p>
      <p>u
i=1
Where mi is a generic movie in the neighborhood of m, and
wmmi are weights representing the similarity between m and
mi computed as the solution of the following optimization
problem:
minw X
v
rmi
v6=u</p>
      <p>K
X wmmi r^mj</p>
      <p>v
j=1
!2
Fig. 2 shows the behaviors of K-NN models with di erent
values of K. Best performances are achieved by the
Neighborhood Relationship Model.
3.3</p>
    </sec>
    <sec id="sec-7">
      <title>Latent Factor Models via Singular Value</title>
    </sec>
    <sec id="sec-8">
      <title>Decomposition (SVD)</title>
      <p>The assumption behind Latent Factor models is that the
rating value can be expressed considering a set of contributes
which represent the interaction between the user and the
target item on a set of features. Let A be a matrix [jusersj
jmoviesj], Au;m is equal to the rank chosen by the user u
for the movie m. A can be approximated as the product
between two matrices: A U M , where U is a matrix
[jusersj K] and M is a matrix [K jmoviesj], K is an
input parameter of the model and represents the number of
features to be considered. Intuitively, A is generated by a
combination of users (U ) and movies (M ) with respect to
a certain number of features. Fixed the number of features
K, SVD algorithms try to estimate the values within U and
M , and give the prediction of rmu as:
rmu =</p>
      <p>K
X Uu;i Mi;m
i=1
where Uu;i is the response of the user u to the feature i, and
Mi;m is the response of the movie m on i. Several approaches
have been proposed to overcome the sparsity of the original
rating matrix A and to determine a good approximation
solving the following optimization problem:
(U; M ) = arg mU;Min 4
2</p>
      <p>X
(u;m)in D
u
r^m</p>
      <p>
        K
X Uu;i Mi;m 5 (9)
i=1
!3
Funk in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed an incremental procedure, based on
gradient descent, to minimize the error of the model on
observed ratings. User and movie feature values are randomly
initialized and updated as follows:
      </p>
      <p>U u0;i
Mi0;m
=
=</p>
      <p>Uu;i + (2eu;m</p>
      <p>Mi;m)</p>
      <p>Mi;m + (2eu;m Uu;i)
where eu;m = r^mu rmu is the prediction error on the pair
(u; m) and is the learning rate. The initial model could
be further improved considering regularization coe cients
. Updating rules become:</p>
      <p>U u0;i
Mi0;m
=
=</p>
      <p>Uu;i + (2eum</p>
      <p>Mi;m
Mi;m + (2eum Uu;i</p>
      <p>Uu;i)
Mi;m)
An extension of this model could be obtained considering
user and movie bias vectors, which de ne a parameter for
each user and movie:
(8)
(10)
(11)
(12)
(13)</p>
      <p>
        Where c is the user bias vector and d is the movie bias vector.
An interesting version of the SVD model was proposed in
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. According to this formulation, known as Asymmetric
SVD, each user is modeled through her the rated items:
Where M (u) is the set of all the movies rated by the user
u. A slight di erent version, called SVD++, proposed in
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], models each user by using both a user-features vector
and the corresponding implicit feedback component (movies
rated by each user in the training set and the ones for whom
is asked the prediction in the test-set).
      </p>
      <p>Latent factor models based on the SVD decomposition change
according to the number of considered features and the
structure of model, characterized by presence of bias and
baseline contributes. The optimization procedure used in the
learning phase plays an important role: learning could be
incremental (one feature at the time) or in batch (all
features are updated during the same iteration of data).
Incremental learning usually achieves better performances at
the cost of learning time. Several version of SVD models
have been tested, considering the batch learning with
learning rate 0:001. Feature values have been initialized with the
value p K + rand( 0:005; 0:005) where is the overall
rating average and K is the number of the considered features.
The regularization coe cient, where needed, has been set
to 0:02. To avoid over tting, the training set has been
partitioned into two di erent parts: the rst one is used as
actual training set, while the second one, called validation
set, is used to evaluate the model. The learning procedure
is stopped as soon the error on the validation set increases.
Performance of the di erent SVD models are summarized in
Tab.1, while Fig.3 shows the accuracy of the main SVD
approaches. An interesting property of the analyzed models is
that they reach convergence after almost the same number
of iteration, no matter how many features are considered.
Better performances are achieved if the model includes bias
or baseline components; the regularization factors decrease
the overall learning rate but are characterized by an high
accuracy. In the worst case, the learning time for the
regularized versions is about 60 min. The SVD++ model with
20 features obtains the best performance with a relative
improvement on the Cinematch score of about 5%.</p>
      <p>
        Several probabilistic methods have been proposed for the
CF, they try to estimate the relations between users or
products through probabilistic clustering techniques. The
Aspect Model [
        <xref ref-type="bibr" rid="ref7 ref8">8, 7</xref>
        ], also called pLSA, is the main
probabilistic model used in the CF, and belongs to the class of
Multinomial Mixture Models. Such models assume that data
were independently generated, and introduce a latent
variable (also called hidden), namely Z, that can take K values.
Fixed a value of Z, u and m are conditionally independent.
The hidden variable is able to detect the hidden structure
within data in terms of user communities, assuming that
Z, associated to observation (u; m; r^mu), models the reason
why the user u voted for the movie m with rating r^mu.
Formally, assuming the user community version, the posterior
probability of r^mu = v is:
      </p>
      <p>
        K
P (r^mu = vju; m) = X P (r^mu = vjm; z)P (Z = zju)
z=1
Where P (Z = zju) represents the participation in a pattern
of interest by u, and P (r^mu = vjm; z) is the probability that
a user belonging to pattern z gives rating v on the movie m.
A simpli ed version of the Aspect Model is the Multinomial
Mixture Model that assumes there is only one type of user
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]:
      </p>
      <p>
        K
P (r^mu = vju; m) = X P (r^mu = vjm; z)P (Z = z)
z=1
(16)
(17)
The standard learning procedure, for the Multinomial
Mixture Model, is the Expectation Maximization algorithm [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
Fig. 4 shows the RMSE achieved by the Multinomial
Mixture Model with di erent number of latent class. The model
has been initialized randomly and the learning phase
required about 40 iterations of the training set but since the
rst 10 iterations the model reaches the 90% of its
potentiality. The best result (0:9662) is obtained considering 10
latent settings for Z. The pLSA model was tested
assuming a Gaussian distribution for the rating probability given
the state of the hidden variable and the considered movie
m, in the user-community version. The model was tested
for di erent values of user-communities, as in Fig. 5. To
avoid over tting was implemented the early stopping
strategy, described in the previous section. The best pLSA model
produces an improvement of around 1% on Cinematch. The
drawback of the model is the process of learning: a few
iterations (3 to 5) of the data are su cient to over t the model.
4.
      </p>
    </sec>
    <sec id="sec-9">
      <title>MODEL COMPARISON</title>
      <p>In this section it is performed a comparative analysis of
the above described models. Each model is tuned with it
best parameters settings. As said before Cinematch, the
Net ix's Recommender System, achieves an RMSE equals
to 0:9525. Figure 6 shows the RMSE of all Baseline
models mentioned. The best model is the doubleCentering, but
no one of them outcomes the accuracy of Cinematch.
Figure 7 shows the mentioned K-NN models performances.
Performances are really better than baseline ones. Except
the SimpleK-NN, all approaches improve Cinematch's
precision, especially the Neighborhood Relationship Model.
Quality of SVD models is shown in gure 8. SVD models show
the best performances, note SVD++. Figure 9 shows the
behavior of the two proposed probabilistic models. Only
pLSA outcomes Cinematch. Finally, gure 10 compare the
best models for each algorithm family. In this
experimentation SVD++ results to be the best model among all
proposed ones.</p>
    </sec>
    <sec id="sec-10">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>
        This work has presented an empirical comparison of some
of the most e ective individual CF approaches applied to
the Net ix dataset, with their best settings. Best
performances are achieved by the Neighborhood Relationship and
the SVD++ models. Moreover, the symbiosis of standard
approaches with simple baseline or biases models improved
the performances, obtaining a considerable gain with respect
to Cinematch. From a theoretical point of view,
probabilistic models should be the most promising, since the
underlying generative process should in principle summarize
the bene ts of latent modeling and neighborhood in uence.
However, these approaches seem to su er from over tting
issues: experiments showed that their RMSE value is not
comparable to the one achieved by SVD or K-NN models.
Future works will focus on the study of the Latent Dirichlet
Allocation (LDA) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] that extends the pLSA model
reducing the risk of over tting, and on the integration of
baseline/bias contributes in probabilistic approaches.
      </p>
    </sec>
    <sec id="sec-11">
      <title>REFERENCES</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Volinsky</surname>
          </string-name>
          .
          <article-title>Modeling relationships at multiple scales to improve accuracy of large recommender systems</article-title>
          .
          <source>In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , pages
          <volume>95</volume>
          {
          <fpage>104</fpage>
          , New York, NY, USA,
          <year>2007</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Bell</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          .
          <article-title>Improved neighborhood-based collaborative ltering</article-title>
          .
          <source>In In Proc. of KDD-Cup and Workshop at the 13th ACM SIGKDD International Conference of Knowledge Discovery and Data Mining</source>
          , pages
          <volume>7</volume>
          {
          <fpage>14</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Bell</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          .
          <article-title>Scalable collaborative ltering with jointly derived neighborhood interpolation weights</article-title>
          .
          <source>In ICDM '07: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining</source>
          , pages
          <volume>43</volume>
          {
          <fpage>52</fpage>
          , Washington, DC, USA,
          <year>2007</year>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. I.</given-names>
            <surname>Jordan</surname>
          </string-name>
          .
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>J. Mach. Learn. Res.</source>
          ,
          <volume>3</volume>
          :
          <fpage>993</fpage>
          {
          <fpage>1022</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Funk</surname>
          </string-name>
          .
          <article-title>Net ix update: Try this at home</article-title>
          . URL: http://sifter.org/ simon/Journal/20061211.html.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nichols</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Oki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Terry</surname>
          </string-name>
          .
          <article-title>Using collaborative ltering to weave an information tapestry</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>35</volume>
          :
          <fpage>61</fpage>
          {
          <fpage>70</fpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          .
          <article-title>Collaborative ltering via gaussian probabilistic latent semantic analysis</article-title>
          .
          <source>In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval</source>
          , pages
          <volume>259</volume>
          {
          <fpage>266</fpage>
          , New York, NY, USA,
          <year>2003</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          .
          <article-title>Latent semantic models for collaborative ltering</article-title>
          .
          <source>ACM Trans. Inf</source>
          . Syst.,
          <volume>22</volume>
          (
          <issue>1</issue>
          ):
          <volume>89</volume>
          {
          <fpage>115</fpage>
          ,
          <year>January 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          .
          <article-title>Factorization meets the neighborhood: a multifaceted collaborative ltering model</article-title>
          .
          <source>In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , pages
          <volume>426</volume>
          {
          <fpage>434</fpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lieberman</surname>
          </string-name>
          .
          <article-title>Letizia: An Agent that Assists Web Browsing</article-title>
          .
          <source>In Proc. of Int. Joint Conf. on Arti cial Intelligence</source>
          , pages
          <fpage>924</fpage>
          {
          <fpage>929</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Marlin</surname>
          </string-name>
          .
          <article-title>Modeling user rating pro les for collaborative ltering</article-title>
          .
          <source>In In NIPS*17</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Moon</surname>
          </string-name>
          .
          <article-title>The expectation-maximization algorithm</article-title>
          .
          <source>Signal Processing Magazine</source>
          , IEEE,
          <volume>13</volume>
          (
          <issue>6</issue>
          ):
          <volume>47</volume>
          {
          <fpage>60</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paterek</surname>
          </string-name>
          .
          <article-title>Improving regularized singular value decomposition for collaborative ltering</article-title>
          .
          <source>Proceedings of KDD Cup and Workshop</source>
          , pages
          <volume>39</volume>
          {
          <fpage>42</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Pazzani</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Billsus</surname>
          </string-name>
          .
          <article-title>Content-based recommendation systems</article-title>
          . In P. Brusilovsky,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kobsa</surname>
          </string-name>
          , and W. Nejdl, editors,
          <source>The Adaptive Web</source>
          , volume
          <volume>4321</volume>
          of Lecture Notes in Computer Science, pages
          <volume>325</volume>
          {
          <fpage>341</fpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B.</given-names>
            <surname>Sarwar</surname>
          </string-name>
          , G. Karypis,
          <string-name>
            <given-names>J.</given-names>
            <surname>Konstan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Reidl</surname>
          </string-name>
          .
          <article-title>Item-based collaborative ltering recommendation algorithms</article-title>
          .
          <source>In WWW '01: Proceedings of the 10th international conference on World Wide Web</source>
          , pages
          <volume>285</volume>
          {
          <fpage>295</fpage>
          . ACM,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>