<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Online Selection of Mediated and Domain-Specific Predictions for Improved Recommender Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stephanie Rosenthal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuela Veloso</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anind Dey</string-name>
          <email>anindg@cs.cmu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science Carnegie Mellon University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2009</year>
      </pub-date>
      <abstract>
        <p>Recommender systems use a set of reviewers and advice givers with the goal of providing accurate userdependent product predictions. In general, these systems assign weights to different reviewers as a function of their similarity to each user. As products are known to be from different domains, a recommender system also considers product domain information in its predictions. As there are few reviews compared to the number of products, it is often hard to set the similarity-based weights as there is not a large enough subset of reviewers who reviewed the same products. It has then been recently suggested that not considering domains will increase the amount of reviewer data and the overall prediction accuracy in a mediated way. However, clearly, if different reviewers are similar to a user in each product domain, then domain-specific predictions could be superior to mediated ones. In this paper, we consider two advice giver algorithms to provide domain-specific and mediated predictions. We analyze both advice giver algorithms using large real data sets to characterize when each is more accurate for users. We realize that for a considerable number of users, the domain-specific predictions are possible and more accurate. We then contribute an improved general recommender system algorithm that autonomously selects the most accurate mediated or domain-specific advice giver for each user. We validate our analysis and algorithm using real data sets and show the improved predictions for different users.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>We model a product recommender system as a set of reviews
defined by reviewers, product domains (e.g., DVDs, books,
clothes), and advice givers. Users request that the
recommender system provide predictions of whether they will like
a set of products of their choosing. Users then have the
option of providing their own reviews of those products. As
the advice giver that makes product predictions receives the
user’s actual reviews, it assigns domain-specific weights to
the reviewers as a function of the similarity between their
reviews and the user’s. The reviewers whose reviews are most
similar to the user’s receive higher weight. The advice giver
uses the weights of the reviewers and their reviews to guide
its predictions with the goal of providing the most accurate
predictions.</p>
      <p>
        Because reviewers review a relatively small number of
products, it is difficult to find enough reviewers with similar
reviews to make accurate predictions for every product each
user requests. The problem is exacerbated as products are
divided into domains so there are fewer product reviews to
train the domain-specific weights with. Resolving the data
sparsity problem has been the focus of much recommender
system work. Although it is widely accepted that
domainspecific reviewers result in accurate predictions, it has
recently been suggested that a mediated advice giver that
combines multiple domains of products and holds only a single
set of weights for each, user would help alleviate the data
sparsity problem
        <xref ref-type="bibr" rid="ref10 ref2 ref3">(Berkovsky, Kuflik, and Ricci 2007a)</xref>
        .
      </p>
      <p>While learning only one set of weights will increase the
amount of data to train with, there is an underlying
assumption that the reviewers with similar reviews to a user in one
product domain (e.g., DVDs) will also have similar reviews
to that user in other domains (e.g., books and clothes). While
a domain-specific advice giver captures these differences,
the mediated advice giver does not. Intuitively, it seems
unlikely that for all users there is a set of reviewers with similar
reviews in all domains of products. The focus of this work is
to understand in real recommender data sets how data
sparsity and the user’s reviews affect the weights of the reviewers
and the accuracy of the advice givers’ predictions.</p>
      <p>We first present an overview of how advice givers weigh
reviewers and make predictions for users and give examples
of weights that affect the two advice givers’ accuracy. We
show using data from two large recommender systems that
potentially half of the users benefitted from the mediated
advice giver while the other half required domain-specific
weights. Additionally we find in a third and more sparse
recommender data set that both advice givers have equal
accuracies when reviewers do not provide reviews for more
than one category. We, then, show how accuracy changes for
each advice giver as reviewers review products in more
categories. Assuming that reviewers do provide reviews in more
categories (as found in the first two data sets) and because
different users require the two advice givers equally, we
contribute two online user-dependent selection algorithms
for the recommender system to choose which advice giver
makes the highest accuracy predictions for each user.
Finally, we validate both our initial findings and the selection
algorithms with a fourth recommender system data set and
conclude that each user benefits from the user-dependent
selection rather than a recommender system that uses one type
of advice giver.</p>
    </sec>
    <sec id="sec-2">
      <title>Advice Givers</title>
      <p>A recommender system is comprised of a set of products
with corresponding domain information, reviews R and an
advice giver. The set of reviews R is an M N matrix of M
reviewers and N products. The review Rij is a discrete value
v 2 V that reviewer ri provides for product pj . Possible
values V may be binary f0; 1g or ranging over a subset of the
integers (e.g., f1; 2; 3; 4; 5g). Each product pj is assigned a
domain d 2 D.</p>
      <sec id="sec-2-1">
        <title>Domain-Specific Advice Giver</title>
        <p>
          An advice giver’s task is to provide personalized predictions
v 2 V of products p that a user u requests. In order to
provide personalized predictions for each user, the
DomainSpecific Advice Giver (DSAG) assigns a weight wu;d to each
i
reviewer for each user and domain d (See Algorithm 1). The
weight of a reviewer is related to how often the reviewer
gave review values similar to the user’s reviews and are
modeled after experts algorithms (
          <xref ref-type="bibr" rid="ref1 ref8">(Auer et al. 1995; Littlestone
and Warmuth 1994)</xref>
          ) which have been used widely in
predicting reviews (e.g.,
          <xref ref-type="bibr" rid="ref9">(Nakamura and Abe 1998)</xref>
          ). These
weights are initially uniform across the reviewers for each
user (Line 1)
        </p>
        <p>8u; d; i wiu;d = 1=M
The reviewers are the ”experts,” and the advice giver makes
a prediction by polling the reviewers as a function of the
weights assigned to them. The advice giver uses weighted
majority to make a prediction for a product in domain dk
that a user requests, by summing the weights of reviewers
that provide each value v, and predicts the value with the
most weight:
argmaxv</p>
        <p>X I (Rij == v)
wu;k</p>
        <p>
          i
i
where I is the identity function that returns I (true) = 1
and I (f alse) = 0 (Lines 2-4) (
          <xref ref-type="bibr" rid="ref8">(Littlestone and Warmuth
1994)</xref>
          ). The advice giver updates the weights as a function
of the distance between the reviewer’s reviews and the user
u’s later actual review aj for the product pj (Lines 5,6):
wu;k =
i
        </p>
        <p>
          exp(ln(wiu;k)
Ph exp(ln(whu;k)
`1(Rij ; aj ))
`1(Rhj ; aj ))
Because the advice giver makes predictions about the user’s
review but does not know the actual review ahead of time,
the weights for the domain dk are recalculated online as the
user provides reviews for products in that domain. We
implement the sleeping experts algorithm to reweigh only the
reviewers that provided a review for the product
          <xref ref-type="bibr" rid="ref4 ref6">(Freund
et al. 1997; Blum and Mansour 2005)</xref>
          . If a reviewer does
(1)
(2)
(3)
        </p>
        <p>Algorithm 1 Domain-Specific Advice Giver (DSAG)
1: For a new user u, initialize wu;c according to (1)
i
2: for all products pj do
3: k domain(pj )
4: Predict according to (2)
5: if user u gives review aj then
6: Update wiu;k according to (3)
7: end if
8: end for
not provide a review for the product pj , its weight does not
change. The goal of the advice giver is to weigh the
reviewers for each user such that the resulting predictions are as
accurate as possible compared to the hindsight knowledge
of the users’ reviews.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Mediated Advice Giver</title>
        <p>
          The DSAG can provide precise predictions for users in
each domain, but it requires enough reviews in R to cover
all products with enough reviewers and requires the user
provide enough reviews in each domain to reweigh the
reviewers enough times for the weights to converge. In
typical recommender systems, however, the review matrix R is
very sparse in both the number of reviews provided for a
particular product and the number of reviews provided by a
particular reviewer. Because of this sparsity, the number of
reviewers that get reweighed for any given product that the
user requests is far fewer than the total number of reviewers.
As a result, the DSAG requires the user to review a lot of
products before it can provide accurate enough predictions.
This problem is exacerbated because the products are often
split into domains and the algorithms require the user to
review the same number of products in each domain to predict
accurately in each. The focus of much recommender
system research has centered around resolving this data sparsity
problem
          <xref ref-type="bibr" rid="ref4">(Adomavicius and Tuzhilin 2005)</xref>
          .
        </p>
        <p>
          While most work has focused on hybrid recommender
systems to increase accuracy by combining different
weighing techniques (e.g.,
          <xref ref-type="bibr" rid="ref10 ref5">(Burke 2002; Umyarov and Tuzhilin
2007)</xref>
          ), one recent idea is to combine domains to increase
the number of products that affect the reviewers’ weights.
One idea is to keep domain-specific weights, but to allow
the DSAG to reference all of a reviewer’s weights to
determine if that reviewer is similar to the user in any
domain
          <xref ref-type="bibr" rid="ref10 ref2 ref3">(Berkovsky, Kuflik, and Ricci 2007b)</xref>
          . If there is not
enough information about a particular domain to make
sug5: wiu
6: end if
7: end for
Algorithm 2 Mediated Advice Giver (MAG)
1
jMj
1: For a new user u, initialize 8i wiu
2: for all products pj do
3: Predict v = argmaxv Pi I (Rij == v)
4: if user reports review aj then
        </p>
        <p>
          Pehxepx(pln(l(nw(iuw)hu)`1`(1R(iRj;hajj;a))j))
wu
i
gestions about a product, the system could take advantage
of the user’s similar reviewers in other domains to make
predictions. Mediation, on the other hand, combines the
weights from multiple domains together
          <xref ref-type="bibr" rid="ref10 ref2 ref3">(Berkovsky, Kuflik,
and Ricci 2007a)</xref>
          .
        </p>
        <p>A Mediated Advice Giver (MAG) makes
domainindependent predictions with the expectation that the advice
giver will identify the most similar reviewers sooner, and
provide more accurate predictions with sparse matrices,
because all products affect the same weights. The Mediated
Advice Giver (Algorithm 2) is the same as the DSAG, except
that a single set of weights is maintained which is updated
for every product in every domain. The MAG is accurate
when there is a consistent set of similar reviewers to a user
for every domain (reviewers have proportional weights in
all domains). However, when different reviewers have high
weights in different domains, the MAG weighs all equally,
which can result in poor predictions. Intuitively, if a user
had reviews similar to one set of reviewers about DVDs and
very different reviews about books, the DSAG, which holds
different weights for DVDs and books, would provide more
accurate predictions.</p>
        <p>Example Suppose a new user joins a recommender
system that uses a Domain-Specific Advice Giver with three
reviewers (M = 3) r1; r2; r3, that review four products
(N = 4) p1; p2; p3; p4 with reviews presented in Table
1(a). The values are shown for later use in the example.
D = fd1; d2g are assigned to the products and the
possible values that the reviewer can give and advice giver can
predict are V = f1; 2; 3; 4; 5g. The DSAG initializes the
weights wiu;d to 1=M = 1=3 for both domains (Table 1(b)
column t0). The user requests a prediction for product p1
in domain d1. The advice giver calculates which value to
predict using the initial weights and chooses to predict 3
because it has the maximum weight associated with it. The
user notifies the advice giver that their actual review r1 of
p1 is 5 (last row of Table 1(a)) and the advice giver uses that
information to reweigh the reviewers for domain d1 (Table
1(b) column t1). Then the user requests a prediction for p2,
and the advice giver uses the new initialized weights for
domain d2 to predict 2. The user responds with value 5 and the
advice giver recalculates the weights from domain d2
(column t2). This continues for all 4 products.</p>
        <p>The weights (shown in Table 1(b)) do not change on the
time steps when the advice giver predicts a value for a
product that is from a different domain. At the end, we can see
that the user has the same reviews as r1 for domain d1 and
the same reviews for r2 as d2 and the DSAG correctly
identifies them as most similar by assigning them highest weight.</p>
        <p>Now suppose that R is the same, but the recommender
system uses a MAG to predict for the user. Because
reviewers 1 and 2 are each correct 50% of the time, their weights
change over the four products (See Table 1(c)). The MAG
assigns higher weight to the wrong reviewer and predicts 2
or 3 when it should predict 5 for all products.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Approach</title>
        <p>The focus of this work is to determine whether data sparsity
consistently affects the accuracy of the DSAG for all users
and choose the most accurate advice giver for each user. We
will first show, using synthetic data, different weight
distributions for users in the MAG and DSAG and analyze their
prediction accuracies. We then show, using three real
recommender system data sets, that both the MAG and DSAG give
more accurate predictions for some users. Additionally, for
very sparse data sets, we find that both advice givers can give
identical results. We analyze this phenomenon and provide
accuracy results on synthetic data sets with these properties.
Because neither advice giver can be excluded as less
accurate, we provide two algorithms to dynamically select which
advice giver to use for each user and validate our results and
algorithms on a fourth recommender data set.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Advice Giver Evaluation - Synthetic Data</title>
      <p>In this section, we evaluate the worst-case and more
realistic review matrices and user preferences to better
understand the performance of the MAG and DSAG under
different amounts of data sparsity and different reviewers. We
show that the MAG converges faster on the weight
distribution, assuming those best reviewers are the same across
categories. Then, we will show a more extreme case of the
example above where a user agrees strongly with one
reviewer in each category and disagrees strongly with the rest,
causing the DSAG to perform better than the MAG. When
we relax the constraint of a different “best” reviewer in each
category, the MAG and DSAG perform equally well.
Finally, we relax the assumption that all reviewers review all
products and explore very sparse review matrices. We show
that in cases where reviewers only review products in a
particular category, the two advice givers perform equally well.
We will use these results later to analyze our results from
three real recommender system data sets.</p>
      <sec id="sec-3-1">
        <title>Sparse Data</title>
        <p>For the following examples, we will assume that users have
polar preferences - either strongly disliking (1) or strongly
liking (5) each product. For simplicity, the user will always
strongly like the product and give it a 5. Also for simplicity,
we will only have m reviewers, m categories, and n &gt;&gt; m
products that are evenly distributed across the categories.</p>
        <p>The MAG can converge on a single weight distribution
using all of the products while the DSAG instead uses m
weight distributions - one for each category. Assuming that
each of the m categorical weight distributions are similar,
the DSAG will converge on each distribution separately
although it turns out they should all be the same. If each
weight distribution takes the same amount of time to
converge, the DSAG will take longer to come to the same
conclusion as the MAG. If the products are not distributed
evenly across categories, it could take the DSAG much
longer to converge on the rare categories. As an example,
we define the review matrix in the following way:
Rij =
85 (p ^ i = 1)
&lt;</p>
        <p>1 ((1
:1 i 6= 1</p>
        <p>p) ^ i = 1)</p>
        <p>For all categories, reviewer 1 is correct p percent of the
time and gives the same review as the rest of the
reviewers (1-p) percent of the time. The DSAG will have to find
this pattern for each weight distribution while the MAG only
finds it once. Because this is a simple distribution and all
reviewers give reviews for all products, it takes relatively few
products to find the pattern. The MAG finds the right weight
distribution after 1=p products while the DSAG takes m=p
products to converge all distributions.</p>
        <p>In general, it takes 1=p product reviews to find the weight
of each reviewer for each weight distribution. If not all
reviewers provide reviews for each product, it could take much
longer to converge. The MAG also assumes that the weight
distributions for each category are similar. If they are not,
combining them together into the single distribution may
cause prediction errors.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Categorical Weight Distributions</title>
        <p>The DSAG can perform well compared to the MAG when
the reviewers have very different weights in each category as
shown in the example above. As a more extreme example,
we define the review matrix in the following way:
Rij =</p>
        <p>It is quite obvious to see that if the user always says 5, that
reviewer i always has the highest weight in category i and
the rest of the reviewers have almost 0 weight. The weight
distribution for each category is very different. The DSAG
converges on the correct weights very quickly because of
the high degree of similarity between reviewers and the user.
The MAG, alternatively, sees an equal number of products in
each category and converges on a uniform weight
distribution across reviewers. Because more reviewers recommend
the value 1 for each product, the MAG predicts incorrectly
every time. We tested this hypothesis with data generated
with the above rule on recommendation systems with the
weighted majority algorithm. The error rates of the systems
were calculated. The similarity distributions were also
examined at the end of the trials to compare to the expected
distribution.</p>
        <sec id="sec-3-2-1">
          <title>Advice Giver DSAG MAG</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>Accuracy</title>
          <p>100%
0%</p>
          <p>Table 2 shows the prediction accuracy by advice giver. As
expected, there is a significant drop in accuracy when
combining the domain-specific advice givers. The MAG
converges on a uniform weight distribution and gives the wrong
advice to the user for every product. The DSAG does not.
Next, we relax the requirement that each category have one
extremely accurate reviewer to understand how the two
advice givers predict as the “best” reviewer becomes less
obvious.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Changing Weight Distributions</title>
        <p>The greater the difference in weight distributions across
different categories, the worse the MAG predicts. We have
shown that the MAG can perform significantly worse than
the DSAG in this situation. We will now show how the
DSAG and MAG predict equally as the reviewers’ weights
in each category converge to the same distribution. In other
words, in the previous example, it is 100% likely that
reviewer i will predict correctly in for products in category i
and there is a 0% chance that any other user will be correct
for that product. Now, we create a review matrix based on a
probability p that reviewer i is the “best” reviewer for
category i in the following way:</p>
        <p>Rij =
85 (p ^ product j in category i)
&gt;&lt;&gt;1 ((1 p) ^ product j in category i)
&gt;5 ((1 p) ^ product j not in category i)
&gt;
:1 (p ^ product j not in category i)</p>
        <p>With probability p, the reviewer i gives review 5 and the
rest of the reviewers give review 1. Otherwise, some other
reviewer gives review 5 and the rest give review 1. As it
becomes more likely that the reviewer that gives 5 is random,
the weight distribution for each category becomes more
uniform. The MAG’s weight distribution is uniform in all cases
as before. It is important to note that if the “best” reviewer
is chosen any other way than uniform random, both
algorithms would perform better than random because there is a
reviewer with higher weight.</p>
        <p>Figures 1 and 2 show the error rates of the MAG and
DSAG respectively as we vary 0 p 1 in increments
of .1 and the number of categories from 3 to 10. As the
number of categories increases, there are more weight
distributions that must each converge, increasing the error rate
overall until they do. When p = 0, the review matrix is
the same as the DSAG example above and the DSAG again
has perfect accuracy after converging while the MAG is
almost always wrong. As p approaches 1, the domain-specific
weights become more uniformly random. However, as we
varied p, the mediated weight distribution over the
reviewers was consistently uniformly random and the error rate was
consistently high. Because it is not holding category
information, the MAG doesn’t find a difference between truly
random (p = 1) and domain-correlated (p = 0) reviews
and predicts the same way each time. The DSAG also has
a high error rate as the weight distributions become more
uniform. When the MAG and DSAG weight distributions
become more similar (to uniform or otherwise), the DSAG
and MAG predict the same. The DSAG performs better than
the MAG until the distributions are more than 50% similar.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Very Sparse Review Matrices</title>
        <p>The previous tests assumed that reviewers gave reviews for
all products and all categories. However, this is a strong
assumption. Now we relax this assumption and look at much
sparser review matrices. For simplicity, we assume that
only a single reviewer provides reviews in each category,
although this can easily be scaled up to include disjoint sets
of reviewers. Here, we will also assume that the single
reviewer has similar reviews as the user and both give reviews
of 5, although we could easily assume that a few of many
reviewers are similar. We define the entries of the review
matrix in the following way:</p>
        <p>Rij =
5
product j in category i
product j not in category i</p>
        <p>The DSAG in this case only has a single reviewer to
assign a weight for each category and gives it full weight. The
MAG, however, must combine these reviewers together into
a single weight distribution. Because the products are
uniformly distributed in the categories, the reviewers are also
uniformly weighted. However, unlike the previous examples
where the MAG had a high error rate, the MAG performs
equally as well as the DSAG here. Because the MAG only
has one review in the product column to compare against, it
always picks that review to follow and is 100% accurate
the same as the DSAG.</p>
        <p>We have shown in synthetic, simplified review matrices
that the MAG can converge on the weight distribution faster
than the DSAG with less data and that the DSAG is more
accurate when the categorical weight distributions are
different. However, we have also shown that the two advice givers
perform equally when the weight distributions are the same
across categories or when the review matrix is sparse and
there are disjoint sets of reviewers for each category. Next,
we analyze the properties of actual recommender systems to
understand how the two advice givers perform in these real
conditions with real data sparsity and real reviewers. It is
unclear whether the advice givers even perform the same for
two different users of the same recommender system. We
use these results to determine how the tradeoff between data
and weight distributions affects the accuracy of predictions
for different users.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Advice Giver Evaluation - Real Data</title>
      <p>To understand how often data sparsity affects the accuracy of
the DSAG for users, we built two recommender systems
using real data sets from popular recommender websites. We
compare the accuracy of the MAG to the DSAG across all
users and for each user, and evaluate how the weights of the
reviewers affect the success of the MAG. First, we describe
the recommender systems’ review matrices R.</p>
      <sec id="sec-4-1">
        <title>Experimental Method</title>
        <p>The reviewers and advice giver provide possible values V =
f1; 2; 3; 4; 5g to the user. For each recommender system, the
MAG and DSAG provide predictions for the same randomly
chosen users. We use the users’ reviews of products as the
truth for accuracy calculations and remove those reviews
from R. In order to provide both advice givers with the
highest chance of convergence, both were given the users’
reviews for every product immediately after the prediction.
We now present the individual characteristics of three
recommender system data sets and the users in those systems.</p>
        <p>The 2007 Netflix data set contains over 100 million movie
ratings for over 480 thousand Netflix customers from over
17000 movie titles (Netflix 2007). The DSAG used movie
genres as domains. Because Netflix did not include genre
information in the data set, we cross-referenced movie titles
with movie genres and obtained the set the DSAG used:
Science Fiction, Fantasy, Special Interest, Kids/Family,
Comedy, Western, Action/Adventure, Musical/Performing Arts,
Drama, Documentary, and Foreign. Only the movies with
a single genre were used, which resulted in a smaller data
set of N = 400 movies, over 100,000 reviewers, and over 4
million reviews. Approximately 1% of the review matrix R
was filled.</p>
        <p>
          The Yahoo! Research Webscope Music review data set
contains over 717 million reviews by M = 1.8 million
Yahoo! Music reviewers for N = 136,000 songs collected
from 2002 to 2006
          <xref ref-type="bibr" rid="ref11 ref12">(Webscope 2008b)</xref>
          . Each reviewer was
guaranteed to have reviewed at least 30 songs and each song
was reviewed by at least 20 reviewers, sparse in comparison
to the total number of songs and reviewers. The DSAG uses
20 main genres of music provided in the data set. For each
new user, there were an average of 100,000 reviewers giving
reviews for their songs.
        </p>
        <p>
          The final General Products data set was collected from
a popular online shopping website
          <xref ref-type="bibr" rid="ref7">(Leskovec, Adamic,
and Huberman 2006)</xref>
          . Over two years of data collection,
3,943,084 reviewers made 15,646,121 reviews for 548,523
products. 5813 products were discontinued and were thus
removed from the set. The DSAG used the ten product
categories defined in the dataset (i.e., Book, DVD, Electronics).
For each recommendation, the dataset provides the ID of the
reviewer, the numerical recommendation as defined above,
the date that the recommendation was given, and the product
that was reviewed. On average, users made
recommendations for about 100 products, with one user recommending
over 12000 and each product had between 3 and 2500
recommendations. Approximately .007% of the review matrix
is filled.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Results</title>
        <p>To evaluate the accuracy of each advice giver, we compare
the mean squared error (MSE) of the advice givers’
predictions to the users’ reviews to capture the distance between
the values. We test whether there is a difference between the
MSEs of the MAG and DSAG using an ANOVA (Analysis
of Variance). 50 users from Netflix and the General
Products and 20 users from Yahoo! Music were randomly
chosen from the sets to test. We divide this analysis up by the
sparsity of the dataset - General Products are sparse, while
Netflix and Yahoo! are not as sparse.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Results for Very Sparse Dataset For the General Product</title>
        <p>data set, our experimental results show there was no
significant difference (MAG MSE = .47, DSAG MSE =
.53) (F = :59; df = 1; p &gt; :05) between the DSAG and
MAG. In other words, there is no increase in accuracy as we
combine the domain-specific advice givers for each product
genre together into a single mediated advice giver. While it
is expected that users of multiple systems make reviews for
all of them so combining the systems results in more data,
we found that users tended to focus their reviews on a
specific category of products instead of reviewing products in
all categories.</p>
        <p>Although the entire user set was shared for all of the
domains in the General Product dataset, the set of users that
gave reviews for products in one category most likely did
not give reviews in any other category. If the most
similar users (with the highest weight) in each domain are
disjoint, then both DSAG and MAG give the same predictions
because the relative weights of the reviewers in each
category are the same for both advice givers. We found that
the overall performance does not change because given the
sparse data, each reviewer only has one accurate weight in
one category - the one they provide reviews in. When the
weight distributions of the DSAG are combined into the
single MAG distribution, the single category weight is carried
into the new distribution (and normalized) without the need
to average multiple weights together because the weight in
the rest of the categories have not changed from the
initialized value. The MAG weight distribution is the same as, and
not better than, each DSAG.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Results for Other Datasets For the Yahoo! and Netflix</title>
        <p>data sets, our experimental results show that there is no
statistically significant difference between the MSEs of the
MAG (MSE=2, std. dev. = .5) and DSAG (MSE=1.8, std.
dev = .3) for all users combined (F = 2:75; df = 1; p &gt;
0:05). We do find significant differences in the models for
individual users. For half of the users in both data sets, the
DSAG had a statistically significant lower MSE ( MSE =
.5, std. dev = .4) than the MAG (F = 7:14; df = 1; p &lt;
0:01), but for the other half of the users, the MAG was more
accurate ( MSE = .25, std. dev. = .2) although this was not
statistically significant.</p>
        <p>We found (as expected) that the MAG has a higher
accuracy than the DSAG when there were not enough
products reviewed by the user for the DSAG’s domain-specific
weights to converge. The DSAG provides more accurate
predictions when different reviewers have the highest weight
in different domains (See Example in Section 2).
Additionally, users for which the DSAG made better predictions
typically focused their product requests and reviews in a few
domains and did not require all domains, allowing those
domain weights to converge quickly. Because half of the
users benefitted from each advice giver, neither advice giver
should be used to make predictions for all users. We
propose an online selection algorithm to autonomously
determine which advice giver should make predictions for each
user.</p>
        <p>We have seen in both synthetic and real recommender
system data that when review matrices are very sparse, both
advice givers perform the same because the data is both sparse
and the weight distributions are disjoint. Next, we will
analyze the number of categories a reviewer must provide
reviews in, in order to see a difference in accuracy between
the DSAG and MAG advice givers. Then, assuming
reviewers provide enough reviews so that the data is not as sparse,
we propose two online selection algorithms to autonomously
determine which advice giver should make predictions for
each user.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Very Sparse Data</title>
      <p>We have shown that in very sparse matrices, there may
not be any overlap in reviewers across categories. In other
words, reviewers only review products in a single category.
The MAG and DSAG produce different weight distributions
but the same predictions with the same accuracy. Now, we
will show how the MAG is affected as reviewers review
products in more categories and as we can collect more data
to overcome the extreme sparseness of the review data. We
will use the same assumptions as before. The user always
gives review 5, so the reviewer that reviews a product with
value 5 should have the highest weight.</p>
      <sec id="sec-5-1">
        <title>Different Weight Distributions</title>
        <p>First, instead of completely disjoint sets of reviewers for
each category, our reviewers review products in a set of
categories. Only a single reviewer is correct in each
category but several other reviewers also give incorrect reviews.
More concretely, we define the review matrix in the
following way:
Rij =
85
&lt;</p>
        <p>1
:
product j in category i
product j in categories (i+1)%m - (i+k)%m
otherwise</p>
        <p>Reviewer i is correct for category i and gives reviews
far from the user’s preference for categories (i+1)%m to
(i+k)%m. The value k is the number of reviewers that
provide reviews for each product. By varying k, we can
understand how the MAG is affected as the amount of overlap
increases in the categories that reviewers review for. We
notice that k = 0 means that only a single reviewer reviews
products in a single category. Also, k = m means that all
reviewers provide reviews for all products. We are interested
in how the MAG behaves for values between 0 and m. We
know that although it may take the DSAG longer to
converge on the categorical weight distributions, it will always
have 100% accuracy.</p>
        <p>Figure 3 shows the error rate (y-axis) for the MAG
and DSAG with m = f1; :::; 10g (x-axis) and with k =
f1; :::; 10g. Each line represents a different k. Note that
there cannot be more than m reviewers overlapping at a time,
so the line for k starts at m. The line for 3 overlapping
reviewers starts at 100% error rate at m = 3 and drops a little
before converging at a high error rate. Because all reviewers
are “best” for one category, the MAG converges to a
uniform distribution. A majority of the reviewers give reviews
that conflict with the user, but the MAG follows the
majority so it predicts incorrectly nearly all the time depending on
the order of products. The DSAG always finds the correct
weight distribution and maintains a 0% error over all k.</p>
        <p>There is no statistical significance between the error rates
of the ks. As we increase k, the error is nearly constant.
However this error is expected because the weight
distributions are so different for each category. Next, we evaluate
the MAG performance on categorical weight distributions
that are similar but have sparse reviewer data.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Similar Weight Distributions</title>
        <p>We know that the MAG converges faster than the DSAG
when the categorical weight distributions are similar and all
reviewers always give reviews. Now we create a new review
matrix such that there are a few (e.g., 2) “best” reviewers for
a user and the rest give opposing reviews:</p>
        <p>Rij =
85
&gt;
&gt;&lt;5
&gt;
&gt;
:
1
(p ^ i = 1)
((1 p) ^ i = 2)
(q ^ i 6= 1)
otherwise</p>
        <p>With some probability p, reviewer 1 gives the best
response and otherwise reviewer 2 gives the best response.
Then the rest of the reviewers each give a response with
some other probability q and otherwise give no response.
These distributions do not depend on the category, so the
DSAG would have to find this same distribution for each
category. The MAG finds it just once.</p>
        <p>If either reviewer 1 or reviewer 2 always provides a
review for a product, the MAG can find these best reviewers
after getting reviews from each of them only a few times.
Because these reviewers are always correct with respect to
our user, a MAG that always trusts them will also always be
correct. Although these reviewers never appear at the same
time, the MAG still has one reviewer that has high weight to
use. The rest of the reviewers have low weight and are not
included in the prediction. If there are products for which
neither of the “best” reviewers provide reviews, there is no
way for the MAG to produce a good answer. The MAG
can be as accurate and converge on the reviewer weights
faster than the DSAG when the similarity distributions are
the same and reviewers do provide reviews for products in
multiple categories.</p>
        <p>We have shown that the MAG and DSAG are each more
accurate for some users, depending on the products they
choose from the review matrix. In very sparse review
matrices where reviewers provide reviews in only a single
category, both the MAG and DSAG have the same accuracy.
If we can increase the amount of data to get reviewers to
review products in multiple categories, both the DSAG and
MAG can perform well under some conditions. Because it is
not known which advice giver will be more accurate when a
user joins a recommender system, the recommender system
must calculate predictions of both advice givers as the user
requests reviews and provides reviews to determine which is
more accurate. We present two online selection algorithms
for the recommender system to use in determining which
advice giver (DSAG or MAG) makes the best predictions for
each user.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>User-Dependent Online Selection Algorithm</title>
      <p>
        First, we show a user-dependent online selection algorithm
(UdOS) which picks the single best advice giver for each
user. The UdOS Algorithm (Algorithm 3) first initializes a
new DSAG and MAG for the new user and sets their
accuracies to 0 (Steps 1-2). Because the MAG provides better
predictions when there are fewer data points available, the
MAG’s predictions are provided to each user to start (Step
3). Additionally, the algorithm requires that the
domainspecific weights converge before making a decision. It has
been shown that the weighted majority algorithm requires
approximately dlog(M )e user reviews to converge
        <xref ref-type="bibr" rid="ref8">(Littlestone and Warmuth 1994)</xref>
        . However, the advice givers
actually require at least one review from each advice giver find
the most accurate weights. Thus, the algorithm takes linear
time O(M ) in the number of reviewers to converge. The
count num usr reviews of the current number of reviews
is initialized to 0 (Step 4).
      </p>
      <p>Until the recommender system has received enough user
reviews, the UdOS algorithm makes predictions for the user
with the current most accurate advice giver (Steps 5-17).
The user requests a product pj to be predicted by the
recommender system (Step 6). The algorithm gets both the MAG
and DSAG predictions (Steps 7-8), but gives the user only
the prediction from the current best advice giver (Step 9).
If the recommender system receives the user’s actual review
aj , the algorithm increases the count of the user reviews,
and recalculates the best advice giver using the new
accuracy (Steps 10-16). After enough reviews have been
collected from the user, UdOS maintains only the best advice
giver, which continues to reweigh reviewers (Steps 18-21).</p>
      <p>Using the UdOS algorithm for each user, the
recommender system can decide whether the data sparsity or
reviewers’ weights affect the prediction accuracy more and
Algorithm 3 User-Dependent Selection Algorithm
1: dsag new DSAG(u), mag new M AG(u)
2: accMAG 0, accDSAG 0
3: BestAG mag
4: num usr reviews 0
5: while num usr reviews &lt; M do
6: pj getP roductRequest()
7: predMAG mag:predict(pj )
8: predDSAG dsag:predict(pj )
9: print predBestAG
10: if receive aj then
11: num usr reviews ++
12: accMAG calcAcc(mag)
13: accDSAG calcAcc(dsag)
14: mag:update(aj ), dsag:update(aj )
15: BestAG if (accMAG &lt; accDSAG) ?mag :
dsag
16: end if
17: end while
18: loop
19: p getP roductRequest()
20: print BestAG:predict(p)
21: end loop
choose the best advice giver. Although it is less
computationally efficient to calculate the predictions from both the
DSAG and MAG, because the algorithm picks the single
best advice giver, the recommender system provides better
predictions for all users. After making the selection, it saves
memory and computation by maintaining only the best one.</p>
    </sec>
    <sec id="sec-7">
      <title>User- and Category-Dependent Online</title>
    </sec>
    <sec id="sec-8">
      <title>Selection Algorithm</title>
      <p>Our second selection algorithm picks the best advice giver
for each category (UCdOS). If the DSAG converges for one
or a few categories because the user focuses on those
categories, this algorithm can pick the DSAG for those that have
converged and continue to use the MAG for all other
categories. The UCdOS Algorithm (Algorithm 4) first initializes
a new DSAG and MAG for the new user and sets their
accuracies to 0 (Steps 1-5) as in the UdOS algorithm.</p>
      <p>Until recommender system has received enough user
reviews, the UCdOS algorithm makes predictions for the user
with the current most accurate advice giver for each category
(Steps 5-18). The user requests a product pj in category ck
to be predicted by the recommender system (Steps 6-7). The
algorithm gets both the MAG and DSAG predictions (Steps
8-9), but gives the user only the prediction from the
current best advice giver for category ck (Step 10). If the
recommender system receives the user’s actual review aj , the
algorithm increases the count of the user reviews, and
recalculates the best advice giver using the new accuracy (Steps
11-17). After enough reviews have been collected from the
user, UCdOS maintains only the best advice giver for each
category, which continues to reweigh reviewers (Steps
1923).</p>
      <p>Using the UCdOS algorithm for each user, the
recomAlgorithm 4 User- and Category-Dependent Selection Alg.
mender system can decide for each category whether the
data sparsity or reviewers’ weights affect the prediction
accuracy more and choose the best advice giver on a category
specific basis. This allows for more flexibility in the
products the user picks. The UdOS algorithm requires that the
user pick products from all categories uniformly in order to
determine whether the DSAG or MAG is better overall for
all categories. In the UCdOS algorithm, even if users do
not pick products uniformly across categories, the most
appropriate advice giver is chosen. If a user does not request
products in some categories, there is less data to train from
and the MAG would produce better results. When a weight
distribution for a category has converged, the DSAG is more
appropriate to use and the UCdOS algorithm uses that.</p>
      <p>The UCdOS is less computationally efficient than the
UdOS, because it must maintain the DSAG and MAG
until all of the DSAG weight distributions converge, but if it
means that it provides better predictions than any other
algorithm it may be worth it.</p>
    </sec>
    <sec id="sec-9">
      <title>Validation</title>
      <p>The UdOS and UCdOS algorithms will choose the advice
giver with the highest accuracy after enough products are
reviewed by the user. Ideally, the accuracy of the
predictions from the algorithm will be as good as the best advice
giver even through the selection process in order to
maintain user satisfaction with the recommender system. In
order to validate that the UdOS and UCdOS algorithms both
pick the better advice giver and maintain high accuracy, we
tested them against the MAG and DSAG advice givers on a
fourth recommender system data set. We calculate accuracy
over time to understand how both selection algorithm
perform and show that they finds the best advice giver for each
user while minimizing errors.</p>
      <sec id="sec-9-1">
        <title>Experimental Method</title>
        <p>
          The Yahoo! Movies data set contains 7,642 users, 11,915
movies and 211,231 reviews
          <xref ref-type="bibr" rid="ref11 ref12">(Webscope 2008a)</xref>
          . We use the
same movie genres for domains as the Netflix movies data
set. Only the movies with a single genre were used which
resulted in a smaller data set of 9,000 reviewers for 1000
movies. Only 0.23% of the review matrix R was filled. Each
product was reviewed by at least two reviewers.
        </p>
        <p>We use the same recommender system setup as the
experimental results above. The advice givers and selection
algorithms provide possible values V = f1; 2; 3; 4; 5g for the
same randomly chosen users. The advice givers received the
users’ reviews after every product prediction and use the
reviews, which were removed from R, to reweigh reviewers.
We report the MSE instead of accuracy because it better
represents the error distance between the prediction and users’
reviews. Additionally, because we did not have enough data
to wait for all of the M reviews, we used logM as the
parameter to wait before converging on the best advice giver
so that we could report a best advice giver for each user. In
general, it became obvious which advice giver to choose
after waiting for only logM reviews for this data set although
for some users it was not sufficient.</p>
        <p>We found that the calcAcc function should place more
weight on the later predictions and less weight on the earlier
predictions because the advice giver was more likely to be
incorrect with fewer user reviews when determining the best
advice giver. We used a step function as follows although
we found that any linear function or step function with a
different span produced the same results.
calcAcc(ag) = avg(accuracy recent 10 predictions)</p>
        <p>If the DSAG makes poor predictions for new domains at
the beginning of the product set, the UdOS and UCdOS will
always determine that it has a lower accuracy. We use both
the cumulative (unweighted) and weighted MSEs in the
results. Twenty users were chosen at random as test users.
Each user requested predictions for between 20 and 130
products. The MAG, DSAG, and UdOS and UCdOS
algorithms’ predictions were recorded for each user and used to
calculate each unweighted MSE (see Tables 3 and 4).</p>
      </sec>
      <sec id="sec-9-2">
        <title>UdOS Results</title>
        <p>Overall, thirteen users received better predictions from the
DSAG and 6 received better predictions from the MAG. One
user received equivalent predictions from both advice givers.
Our results show that for all 20 users the UdOS algorithm
picked the best advice giver. Furthermore, the UdOS
algorithm’s error was no worse than the worst advice giver and at
times can be better than the best advice giver. We will now
describe in more detail how the UdOS algorithm chose the
advice givers for some users.</p>
        <p>For User 1, data sparsity affected the predictions from the
DSAG and he received better predictions from the MAG, so
the UdOS algorithm continued using the MAG for the
duration of the product set. User 19 received better predictions
(a) The cumulative MSE for User 8 over time.
(b) The weighted MSE for User 8 over time.
from the DSAG than the MAG, because the user’s similar
reviewers were different for each product domain. The UdOS
algorithm recognized that the DSAG had a lower MSE (See
Figure 4) and switched to using the DSAG’s predictions at
product 18. After the switch, the UdOS cumulative MSE
converged towards that of the DSAG.</p>
        <p>The algorithm picked the DSAG for other users like 10,
but the switch occurred late enough that the cumulative MSE
did not start converging towards the DSAG’s MSE. Because
the cumulative MSE was the same for both the DSAG and
MAG for User 16, it didn’t matter which advice giver the
UdOS picked (marked with ** in Table 3). As a result, it
continually used the MAG that it started with.</p>
        <p>The UdOS algorithm picked the advice giver with higher
overall MSE for three users (marked with * in Table 3).
User 8, for example, received better predictions from the
UdOS algorithm than either the MAG or DSAG could
provide alone because the UdOS switched back and forth
between the two advice givers before deciding which to use.
The MSE of the DSAG is lower than the MAG’s because the
MAG poorly predicted some products in the middle of the
set (specifically products 20-30) (See Figure 5(a)). However,
once the MAG’s weight distributions identified the most
similar reviewers (product 32), its predictions were better
than the DSAG’s and the UdOS algorithm picked the MAG
(See 5(b)).</p>
      </sec>
      <sec id="sec-9-3">
        <title>UCdOS Results</title>
        <p>Overall, nine users received the same predictions as the
UdOS, picking either one advice giver or the other for all
of the product categories the user requested (See Table 4, no
*). Note that this does not mean that the DSAG or MAG
was chosen for all categories, just that it picked for all
categories that there was data for. If a new product from a new
category was queried, while the UdOS would have to use
the same advice giver as it picked for the other categories,
the UCdOS could use the MAG until enough products in
that category were reviewed. This feature means that the
UCdOS could be more accurate using the MAG while the
DSAG’s weight distribution for the new category converges.
However it could be less accurate if the MAG distribution is
very different than the category weight distribution.</p>
        <p>Six of the users received worse predictions from the
UCdOS compared to the UdOS, but still at least as good as the
worse advice giver (See Table 4, *). In these cases, the
MAG’s single distribution was different from the DSAG’s
category distributions. The UCdOS used the MAG for each
category until the DSAG converged. The MAG produced
worse predictions in the UCdOS than if the DSAG was
chosen starting from a uniform distribution in the UdOS. This
is a particularly good instance of our initial example that the
MAG’s poor results when the DSAG’s categorical
distributions are very different. The UdOS chooses the DSAG as
soon as there is any indication that the weight distributions
are different for different categories. Because it takes time
for each of the DSAG’s to converge, it is worse for the
UCdOS to use the MAG before the convergence even though it
has an indication that the DSAG’s weights are different than
the MAG’s. In this case, waiting ensure the selection
algorithm picks the right advice giver for each category actually
makes the prediction accuracy worse.</p>
        <p>Five reviewers received better predictions than the UdOS,
including several who received better predictions than either
the DSAG or MAG could provide alone (See Table 4, **).
These users had the best benefit from the data sparsity
versus weight distribution tradeoff. The UCdOS was able to
use the DSAG for the categories that the user queried often
and the MAG for the other categories. The DSAG’s category
weight distributions were similar to the MAG’s distribution
for some categories and different in others. The UCdOS did
not have to pick a single advice giver and could take
advantage of the MAG instead of waiting for the category weight
distribution to converge to the same values. This sped up the
selection process, allowing the UCdOS to be more accurate
for longer than either of the two advice givers alone.</p>
      </sec>
      <sec id="sec-9-4">
        <title>Discussion</title>
        <p>Both of the user selection algorithms pick the best advice
givers with the knowledge they have. The UdOS algorithm
picks a single advice giver to use for the rest of the
predictions. Once it finds that two different categories have
different weight distributions and the DSAG is performing better,
it switches to using it. The algorithm makes the
assumption that if there are two weight distributions that are
different, that all will be different. More importantly, it makes a
strong tradeoff that the different distributions are more
important than the data sparsity problem. If there are different
distributions, the algorithm must find each of the categorical
weight distributions and cannot take advantage of the MAG
while they converge.</p>
        <p>The UCdOS algorithm picks an advice giver for each
category. It does not make the assumption that all weight
distributions are different, but does assume that the distributions
are either like the MAG distribution or not. It can take
advantage of the MAG while each of the DSAG distributions
are converging, which greatly improved the accuracy of the
predictions for five users compared to the UdOS. However,
we found that sometimes the MAG is a detriment and
using it makes the selection algorithm predict worse than the
UdOS when the MAG distribution is different from the
category weight distribution. Overall, both selection algorithms
do well for all of our test users and there is no clear better
algorithm to choose.</p>
        <p>These selection algorithms do pick the best advice giver,
but still require that the weight distributions either be
completely different from each other or the same as the MAG.
It is possible that high weight reviewers in DVDs have the
same high weight in books because the user likes mysteries
in both categories but have very different taste in clothes.
One other selection algorithm that could be tested in future
work would be one that tests whether any of the converged
category weight distributions perform well for a new
category. This algorithm would take advantage of all the
previous work done to converge the distribution and speed up
the convergence of a new distribution making the algorithm
more accurate earlier in the product queries. It would be
more costly, however, in the space required to hold all of the
possible distributions.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Conclusion</title>
      <p>In this work, we first presented an overview of how
advice givers weigh reviewers and make predictions for users.
We presented the Mediated (MAG) and Domain-Specific
(DSAG) advice givers that weigh the tradeoffs between data
sparsity and reviewers’ weights differently and affect
prediction accuracy in recommender systems. Although the DSAG
provides more accurate predictions when there are sufficient
reviews, using a MAG was recently suggested when review
data is too sparse. We used three real recommender data sets
to show that both advice givers provide accurate predictions
to some users. The DSAG provides more accurate
predictions to users when different reviewers were weighed highly
in different domains while the MAG is more accurate when
the review matrix and user reviews are sparse.</p>
      <p>We presented our User-Dependent Online Selection
Algorithm and User- and Category-Dependent Online Selection
algorithm as two online methods for recommender systems
to decide which advice giver is more accurate for each user.
The algorithms wait for the reviewers’ weights to converge
before picking the best advice giver. We validated our
findings on a fourth recommender data set and demonstrated
that the UdOS algorithm successfully picks the better
advice giver to provide the most accurate predictions possible
for each user. We conclude that a recommender system that
uses the UdOS or UCdOS algorithms make better
predictions for all users than one that uses only one of the two
advice givers.
Adomavicius, G., and Tuzhilin, A. 2005. Toward the next
generation of recommender systems: A survey of the
stateof-the-art and possible extensions. IEEE Trans. on
Knowledge and Data Engineering 734–749.</p>
      <sec id="sec-10-1">
        <title>Netflix. 2007. The netflix dataset and prize. Umyarov, A., and Tuzhilin, A. 2007. Leveraging aggregate ratings for better recommendations. In Recommender Systems, 161–164.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cesa-Bianchi</surname>
          </string-name>
          , N.;
          <string-name>
            <surname>Freund</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ; and Schapire,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>1995</year>
          .
          <article-title>Gambling in a rigged casino: The adversarial multiarmed bandit problem</article-title>
          .
          <source>In FOCS</source>
          ,
          <fpage>322</fpage>
          -
          <lpage>331</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Berkovsky</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Kuflik,
          <string-name>
            <given-names>T.</given-names>
            ; and
            <surname>Ricci</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <year>2007a</year>
          .
          <article-title>Crossdomain mediation in collaborative filtering</article-title>
          .
          <source>In User Modeling</source>
          ,
          <fpage>355</fpage>
          -
          <lpage>359</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Berkovsky</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Kuflik,
          <string-name>
            <given-names>T.</given-names>
            ; and
            <surname>Ricci</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <year>2007b</year>
          .
          <article-title>Distributed collaborative filtering with domain specialization</article-title>
          .
          <source>In Recommender Systems</source>
          ,
          <volume>33</volume>
          -
          <fpage>40</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Blum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Mansour</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <year>2005</year>
          .
          <article-title>From external to internal regret</article-title>
          .
          <source>In COLT</source>
          ,
          <fpage>621</fpage>
          -
          <lpage>636</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Burke</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2002</year>
          .
          <article-title>Hybrid recommender systems: Survey and experiements</article-title>
          .
          <source>User Modeling and User-Adapted Interaction</source>
          <volume>12</volume>
          (
          <issue>4</issue>
          ):
          <fpage>331</fpage>
          -
          <lpage>370</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Freund</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schapire</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Singer,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          ; and Warmuth,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>1997</year>
          .
          <article-title>Using and combining predictors that specialize</article-title>
          .
          <source>In STOC</source>
          ,
          <fpage>334</fpage>
          -
          <lpage>343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Leskovec</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Adamic</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Huberman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2006</year>
          .
          <article-title>The dynamics of viral marketing</article-title>
          .
          <source>In ACM Conf. on Electronic Commerce</source>
          ,
          <fpage>228</fpage>
          -
          <lpage>237</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Littlestone</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Warmuth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>1994</year>
          .
          <article-title>The weighted majority algorithm</article-title>
          .
          <source>Information and Computation</source>
          <volume>212</volume>
          - 261.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Nakamura</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Abe</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>1998</year>
          .
          <article-title>Collaborative filtering using weighted majority prediction algorithms</article-title>
          .
          <source>In International Conference on Machine Learning</source>
          ,
          <fpage>395</fpage>
          -
          <lpage>403</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Umyarov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tuzhilin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>Leveraging aggregate ratings for better recommendations</article-title>
          .
          <source>In Recommender Systems</source>
          ,
          <volume>161</volume>
          -
          <fpage>164</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Webscope</surname>
            ,
            <given-names>Y. R.</given-names>
          </string-name>
          <year>2008a</year>
          .
          <article-title>Movie user ratings of movies and descriptive content information v1.0</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Webscope</surname>
            ,
            <given-names>Y. R.</given-names>
          </string-name>
          <year>2008b</year>
          .
          <article-title>Music user ratings of songs with song attributes v1.0</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>