<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Iterative Smoothing Technique for Improving Stability of Recommender Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gediminas Adomavicius</string-name>
          <email>gedas@umn.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jingjing Zhang</string-name>
          <email>jjzhang@indiana.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indiana University</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Minnesota</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>3</fpage>
      <lpage>8</lpage>
      <abstract>
        <p>We focus on the measure of recommendation stability, which reflects the consistency of recommender system predictions. Stability is a desired property of recommendation algorithms and has important implications on users' trust and acceptance of recommendations. Prior research has reported that some popular recommendation algorithms can suffer from a high degree of instability. In this study we propose a scalable, general-purpose iterative smoothing approach that can be used in conjunction with different traditional recommendation algorithms to improve their stability. Our experimental results on real-world rating data demonstrate that the proposed approach can achieve substantially higher stability as compared to the original recommendation algorithms. Importantly, the proposed approach not only does not sacrifice the predictive accuracy in order to improve recommendation stability, but is actually able to provide additional accuracy improvements at the same time.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In the recommender systems literature, evaluating performance of
recommendation algorithms has always been a key issue, and
recommendation accuracy has been the major focus in developing
evaluation metrics [
        <xref ref-type="bibr" rid="ref11 ref23">11,23</xref>
        ]. As a result, much of the research in
the recommender systems area has focused on proposing new
techniques to enhance the accuracy of recommendation
algorithms in predicting what users will like, as exemplified by
the recent $1M Netflix prize competition. Prediction accuracy
metrics typically compare the rating values estimated by a
recommendation algorithm against the actual rating values and
reflect the closeness of the system’s predictions to users’ true
ratings. In addition to recommendation accuracy, researchers
have proposed a number of alternative types of measures,
including recommendation coverage, diversity, novelty,
serendipity, and several others, to evaluate the performance of
recommender systems [
        <xref ref-type="bibr" rid="ref11 ref23">11,23</xref>
        ]. Of special interest to us is the
recently introduced measure of recommendation stability [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
which reflects the level of consistency among the predictions
Copyright is held by the author/owner(s). Workshop on Recommendation Utility
Evaluation: Beyond RMSE (RUE 2012), held in conjunction with ACM RecSys
2012. September 9, 2012, Dublin, Ireland.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORK</title>
      <p>
        Based on how unknown ratings are predicted, recommendation
techniques can be classified into three general categories:
contentbased, collaborative filtering, and hybrid [
        <xref ref-type="bibr" rid="ref1 ref3">1,3</xref>
        ]. Among different
recommendation approaches, collaborative filtering techniques
have been most widely used, largely because they are domain
independent, require minimal, if any, information about user and
item features, yet can still achieve accurate predictions [
        <xref ref-type="bibr" rid="ref13 ref19">13,19</xref>
        ].
In a typical setting of collaborative filtering recommender
systems, users’ preferences for items are modeled via numeric
ratings. Thus, the recommendation problem is reduced to the
problem of estimating ratings for the items that have not been
seen by a user, and this estimation is usually based on the other
available ratings given by this and/or other users. More formally,
given a set of users U and a set of items I, the entire user-item
space is denoted as S = U × I. Let Rui represent rating that user u
gave to item i, where Rui is typically known only for a limited
subset of all possible (u,i) pairs. Let D be the set of known
ratings, and S\D be the set of unknown ratings. Therefore, the
recommendation task is to estimate unknown Rui values for all
(u,i)S\D pairs, given U, I, and known R(u,i) values for (u,i)  D.
As mentioned earlier, predictive accuracy has been a major focus
in recommender systems literature. One of the most widely used
predictive accuracy metrics for recommender systems is root
mean squared error (RMSE), which we will use in this paper to
report the recommendation accuracy results. However, there is a
growing understanding that good recommendation accuracy alone
may not give users a satisfying experience using the recommender
systems, and that some other (complementary) measures are also
important in evaluating the effectiveness of the system [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Our
focus in this paper is one of these important complementary
measures, recommendation stability [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Recommendation stability measures the inherent consistency
among the different predictions made by the system. Consider an
example where the system makes predictions for two movies, i1
and i2, for user u. Let’s denote the two rating predictions as
R*(u,i1) and R*(u,i2). Let’s assume that prediction R*(u,i1) is
precisely accurate and, after user u consumes item i1, it gets added
to the system as part of the known ratings, i.e., R(u,i1) = R*(u,i1).
The recommendation algorithm re-computes all other predictions
in light of the new data. Would this change the value of the other
prediction, R*(u,i2), and to what extent, even though the newly
incoming rating data was exactly as predicted by the system? In
other words, two predictions R*(u,i1) and R*(u,i2) of the same
recommender system can be viewed as “inconsistent” with each
other, if adding one of them to the training data for the system
changes the other prediction. As discussed in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the degree of
the change in predictions reflects the (in)stability of the
recommendation algorithm.
      </p>
      <p>
        Specifically, the stability of a recommender system is defined as
the extent to which the system’s predictions for the same items
stay the same/similar (i.e., are stable), when any and all new
incoming ratings submitted to the system are in complete
agreement with system’s prior predictions. Based on this idea, a
two-phase approach (illustrated in Figure 1) for computing
stability of a recommendation algorithm has been introduced in
prior literature [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In Phase 1, given a set of known ratings, a
predictive model is built, and predictions for all unknown ratings
are made. Then, a random subset of system’s predictions is added
to the original set of known ratings as hypothetical new incoming
ratings. In Phase 2, based on expanded training data, a second
predictive model is built using the same recommendation
technique, and predictions on remaining unknown ratings are
made. Stability is then measured by comparing the two sets of
predictions to compute their root mean squared difference, which
is called root mean squared shift (RMSS), and is computed in a
similar fashion as RMSE:
      </p>
      <p>
        RMSS
, ∈ ∩
,
,
|
∩
|
where P1 and P2 are the predictions made in Phase 1 and 2,
respectively, i.e., RMSS captures the shift in predictions made by
the same recommendation algorithm with new ratings that are in
complete agreement with algorithm’s own prior estimations.
Providing consistent predictions has important implications on
users’ trust and acceptance of recommendations. In the consumer
psychology literature, it has been shown that online advice-giving
agents with greater fluctuations in past opinions are considered
less informative than those with a more uniform pattern of
opinions [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and advice inconsistent with past recommendations
is considered less helpful than consistent advice [
        <xref ref-type="bibr" rid="ref24 ref7">7,24</xref>
        ].
Inconsistent recommendations may be discredited by users and, as
a result, the decrease in users’ trust will further reduce their
perceptions of the recommender system’s competence [
        <xref ref-type="bibr" rid="ref12 ref17">12,17</xref>
        ]. In
contexts where consistency is vital, the instability of a
recommender system is likely to have a negative impact on users’
acceptance and, thus, harm the success of the system.
Prior research has provided a comprehensive investigation into the
stability of popular recommendation algorithms [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and showed
that some widely used algorithms can be highly unstable. For
example, it has been shown that RMSS for user-based
collaborative filtering approach can be as high as 0.47 on the
Netflix movie rating data, meaning that on average every
predicted rating will shift by 0.47 stars (i.e., by about a half-star)
after adding to the training data some new ratings that are
identical to the system’s current predictions [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This is a very
significant shift in prediction, considering the length of the rating
scale for the dataset is only 4 stars (i.e., ratings go from 1 to 5).
Thus, techniques for stability improvement are needed. In this
paper, we propose a general-purpose meta-algorithmic approach
that can be used to improve stability of traditional
recommendation algorithms.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. ITERATIVE SMOOTHING APPROACH</title>
    </sec>
    <sec id="sec-4">
      <title>3.1 General Idea</title>
      <p>High instability results from predictions that are inconsistent with
each other. We propose an iterative smoothing approach, which
involves multiple iterations for repeatedly and collectively
adjusting the rating predictions of a recommendation algorithm
based on its other predictions and, thus, is explicitly aimed at
improving consistency of predicted ratings. The key idea of
iterative smoothing is that the rating predictions computed during
current iteration will be fed back into the data to predict other
ratings in subsequent iterations.</p>
      <p>Figure 2 provides an overview of the proposed iterative
smoothing algorithm, and Figure 3 gives a high-level illustration
of the overall process. Given rating space S and training set D
where ratings are known, predictions on unknown ratings S\D are
first estimated using some standard recommendation algorithm T.
These predictions are denoted as P0. The procedure of iterative
smoothing then starts to iteratively adjust estimations for each
rating in S\D based on all other ratings in the rating space S (i.e.,
both known as well as predicted) in order to proactively improve
consistency between different predicted ratings.</p>
      <p>More specifically, during the k-th iteration (for any k = 1, 2, …),
for each unknown user-item pair (u,i)S\D, a model fk,u,i is built
based on training dataset Dk,u,i. This dataset is constructed to
contain all known ratings combined with all predictions on
useritem pairs in S\D computed in the previous (k-1) iteration, except
for the prediction on (u,i) as indicated in Figure 2. As the result of
the k-th iteration, each model fk,u,i produces a new prediction for
(u,i) that is stored as Pk(u,i), i.e.:
,
,
,
,
,
∈
∈ \
,,
Here R(u,i) represents the known rating that user u gave item i,
and fk,u,i is the prediction model built based on Dk,u,i. Figure 4
illustrates the process within each iteration of the iterative
smoothing approach. The set of predictions made in the current
iteration is compared with predictions made in the previous
iteration to compute the deviation between the two sets (measured
in root mean squared difference). The iterative smoothing process
stops either after a fixed, pre-determined number of iterations or
when predictions on unknown ratings do not change.</p>
      <p>By definition, in each iteration, a separate model is built for every
user-item pair with unknown rating. Thus, in total, |S\D|K models
need to be constructed over the course of K iterations, each using
|S|–1 ratings as a training dataset. If t(x) is the time needed to
build a predictive model on data sample of size x using
recommendation algorithm T, then the time complexity of the
iterative smoothing approach is O(|S\D|Kt(|S|)).</p>
      <sec id="sec-4-1">
        <title>Iterative Smoothing Algorithm:</title>
        <p>Inputs: known ratings data D, # of iterations K, algorithm T</p>
      </sec>
      <sec id="sec-4-2">
        <title>Process:</title>
        <p>1. Build model f0 on known ratings D using some standard
recommendation algorithm T, i.e., f0T(D)
2. Apply model f0 to compute predictions P0 for unknown
ratings S\D, i.e., P0(u,i) = f0(u,i) for (u,i)S\D
3. For each iteration k  {1, …, K}</p>
        <p>For each unknown rating pair (u, i)S\D
a. Construct dataset Dk,u,i by including all known ratings D
and all predicted ratings Pk-1 from the previous iteration,
except for rating Pk-1(u,i), i.e.,</p>
        <p>Dk,u,i = D ∪ Pk-1\ {Pk-1(u,i)}
b. Build model fk,u,i on dataset Dk,u,i using T, i.e.,</p>
        <p>fk,u,i  T (Dk,u,i)
c. Make prediction on (u, i) and store in P k, i.e.,</p>
        <p>Pk(u, i) = fk,u,i(u, i)
4. Output predictions made in the final iteration PK</p>
      </sec>
      <sec id="sec-4-3">
        <title>Output: PK</title>
        <p>
          In other words, the complexity of iterative smoothing algorithm is
proportional to the size of unknown ratings in the rating space.
This computational requirement is likely to be prohibitively
expensive for many real-world scenarios, i.e., anytime the
useritem rating space is large and rating data is sparse. For example,
the Movielens 100K dataset is a publicly available movie rating
dataset [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], which is considered to be relatively small. This
dataset contains 100,000 movie ratings from 943 users on 1682
movies. Thus, the total number of possible ratings is about 1.6M
(i.e., 1682 x 943), of which 1.5M is unknown. If we apply
iterative smoothing algorithm on the Movielens 100K dataset, in
total 1.5 million predictive models (each built on 1.6 million
ratings) would need to be constructed within each iteration.
Therefore, applying the full iterative smoothing approach may not
be feasible even for datasets that are not very large. In order to
overcome this complexity issue, in the next section we propose a
variation of the iterative smoothing algorithm that offers
significant scalability improvements while retaining most of the
stability benefits (as will be shown in the Results section).
 
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3.2 Scalable Iterative Smoothing</title>
      <p>We propose a variation of iterative smoothing approach that
substantially simplifies the original approach and significantly
reduces its computational requirements.</p>
      <p>Figure 5 provides an overview of the proposed scalable iterative
smoothing algorithm. Similarly to the original iterative
smoothing approach, the proposed scalable version also involves
multiple iterations for repeatedly adjusting estimations for each
unknown rating. However, in each iteration, instead of building
one model for each unknown rating, only one single model is built
upon all known ratings and predicted ratings in previous iteration.
Specifically, during k-th iteration (for any k = 1, 2, …), model fk is
built based on training dataset Dk which contains all known
ratings D combined with all predictions Pk-1 on user-item pairs in
S\D computed in the previous iteration. Pk(u,i) denotes the
predicted rating for (u,i) in the k iteration by the collective
inference approach, defined as follows:
, ∈
, ∈ \
,</p>
      <p>, ,
Here R(u,i) represents the known rating that user u gave item i.
As the result of the k-th iteration, for each (u,i)S\D, the model fk
produces a new prediction for (u,i) that is stored as Pk(u,i).
Similarly to the original approach, new predictions are compared
with predictions made in the previous iteration and the procedure
ends either after a fixed number of iterations or when predictions
on unknown ratings converge (i.e., do not change).</p>
      <p>The key difference between this simplified variation and the
original iterative smoothing algorithm is the number of predictive
models built within each iteration k. The original algorithm builds
a separate model for every unknown rating in the rating space, in
order to properly adjust the predicted rating Pk(u,i) using all other
ratings from previous iteration, i.e., all ratings from D as well all
ratings Pk-1(u',i') where (u',i')  (u,i).</p>
      <sec id="sec-5-1">
        <title>Scalable Iterative Smoothing Algorithm:</title>
        <p>Inputs: known ratings data D, # of iterations K, algorithm T
Process:
1. Build model f0 on known ratings D using some standard
recommendation algorithm T, i.e., f0T(D)
2. Apply model f0 to compute predictions P0 for unknown
ratings S\D, i.e., P0(u,i) = f0(u,i) for (u,i)S\D
3. For each iteration k  {1, …, K}
a. Construct dataset Dk by including all known ratings D
and all predicted ratings Pk-1 from the previous iteration,
i.e.,</p>
        <p>Dk = D ∪ Pk-1
b. Build model fk on dataset Dk using T, i.e.,</p>
        <p>fk  T(Dk)
c. For each unknown rating pair (u, i)S\D , make
prediction on (u, i) and store in Pk, i.e.,</p>
        <p>Pk(u, i) = fk(u, i)
4. Output predictions made in the final iteration PK</p>
      </sec>
      <sec id="sec-5-2">
        <title>Output: PK</title>
        <p>Figure 5. Overview of scalable iterative smoothing approach.
In contrast, the simplified algorithm builds only one predictive
model in each iteration, based on the entire rating matrix. In other
words, predicted rating Pk(u,i) is adjusted using all ratings from
previous iteration, i.e., all ratings from D as well as from Pk-1,
including Pk-1(u,i). Thus, in the simplified algorithm, for any
given rating prediction P1(u,i) in the first iteration, the predictive
model is built on a rating data (i.e., D1) that only differs from the
rating data used in the original algorithm by one additional rating
(i.e., D1\{P0(u,i)}). Because the influence of one additional rating
is often subtle, especially when entire rating space is large (i.e., in
settings with large numbers of users and items), the single overall
model build in the simplified algorithm should produce outcomes
similar to the ones produced by individual models built in the
original algorithm, especially in the first iteration. While the
difference between the original and simplified versions of the
iterative smoothing may slowly increase as the number of
iterations grows, the simplified approach still provides significant
performance improvements (both in stability and accuracy), as
demonstrated by the experimental results later in the paper.
Moreover, the runtime complexity of the simplified algorithm is
much lower, making it much more practical from the scalability
perspective. In particular, as only one overall model is built on all
available ratings (i.e., dataset of size |S|) within each iteration, in
total K models are constructed over the course of K iterations.
Thus, the time complexity of the simplified variation is
O(Kt(|S|)). Comparing this to the complexity of the original
algorithm, O(|S\D|Kt(|S|)), the scalable heuristic offers huge
computational improvements (i.e., by roughly |S\D| times). In this
paper, we use scalable iterative smoothing in our experiments.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>4. EXPERIMENTAL RESULTS</title>
    </sec>
    <sec id="sec-7">
      <title>4.1 Overall Process</title>
      <p>Our experiments test the stability improvements achieved by the
proposed meta-algorithmic approach in conjunction with several
popular collaborative filtering techniques.</p>
      <p>
        The experiments follow the two-phase stability computation from
prior literature [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], discussed in Section 2. We used the standard
train-test data splitting approach and divided known ratings data
D into two sets: training data DT (80%) and validation data DV
(20%), where D = DT  DV and DT  DV = . Training set DT
was used for building rating prediction models, while validation
set DV was reserved exclusively for evaluating the predictive
accuracy of the final predictions. Similarly, a randomly chosen
half of the unknown rating space ET was dedicated for the stability
evaluation of rating prediction models during the training phase,
and the other half of the unknown rating space EV was reserved
exclusively for proper evaluation of the stability of the final
predictions. Here, S\D = ET  EV and ET  EV = .
      </p>
      <p>Step 0: Create training and test datasets, i.e., DT, DV, ET, EV.
Step 1: Find the best model parameters:
i. Use a portion (e.g., 75%) of training set DT to build rating
prediction models using iterative smoothing.
ii. Compute model accuracy on other portion (25%) of DT.
iii. Compute model stability on predictions made on ET.
iv. Repeat steps i-iii with various parameter settings for
iterative smoothing (i.e., number of iterations).
v. Find the best parameters for iterative smoothing, i.e., the
specifications that result in best stability and accuracy.
Step 2: Evaluate accuracy and stability of the final predictions
i. Use the best parameter settings for iterative smoothing.
ii. Train the system on the entire training set DT using
iterative smoothing.
iii. Evaluate predictive accuracy on the reserved validation
rating set DV.
iv. Evaluate recommendation stability on the reserved
unknown rating space EV.</p>
      <p>Step 3: Report parameter setting(s) as well as the accuracy and
stability of the final predictions.</p>
      <p>Figure 6. Overall experimental process.</p>
      <p>Additionally, the process of iterative smoothing involves multiple
iterations to adjust the predictions of unknown ratings. One of the
goals in this study is to find whether predictions converge during
the process of iterative smoothing and, if so, when. In addition,
there is possibility that iterative smoothing models can
“overadjust” rating predictions after a number of iterations, in their
attempt to maximize the performance on training data.
Overfitting is a well-known phenomenon which occurs when a
predictive model is fine-tuned to fit the training data (including
the random errors, outliers, and noise in the data) too well, which
typically leads to diminished predictive performance on test data.
Therefore, for a given algorithm, it is necessary to find the best
number of iterations to use on a given dataset in the final iterative
smoothing procedure. In order to find the optimal parameter
settings, we further used the standard train-test data splitting
approach internally within the training data to identify the best
parameter values for the proposed approach, i.e., that result in best
accuracy and stability. The overall experiment process is
summarized in Figure 6.</p>
    </sec>
    <sec id="sec-8">
      <title>4.2 Recommendation Algorithms</title>
      <p>
        In our experiments, we test the proposed approach in conjunction
with four popular recommendation algorithms: the simple
baseline method, classic user- and item-based variations of
neighborhood-based CF approaches, and the matrix factorization
technique. A brief overview of each technique is provided below.
Baseline. In real-world settings, some users may systematically
tend to give higher ratings than others, and some universally liked
items might receive higher ratings than others. Without
normalization, such user and item effects could bias system’s
predictions. Hence, recommender systems often involve a
preprocessing step to remove these “global effects”. One common
practice is to estimate and remove three effects: the overall mean,
the main effect of an item, and the main effect of a user [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Such
“global effects” can serve as a baseline estimate for unknown
rating of corresponding user and item, i.e.,
      </p>
      <p>
        bui = µ + bu + bi ,
where µ is the overall average rating, bu is the average observed
deviation from µ on ratings provided by user u, and bi is the
average observed deviation from µ on ratings given to item i.
Note that, in all of our experiments (i.e., with all other
recommendation algorithms), the ratings data were normalized by
removing these global effects. Moreover, this estimate is often
used as a baseline recommendation technique for comparison with
other recommendation algorithms, i.e., R*(u,i) = bui, and we
investigate its performance in our experiments as well.
User-Based Collaborative Filtering (CF_User). The user-based
nearest-neighbor collaborative filtering approach is a heuristic that
makes predictions of unknown ratings for a user based on the
ratings previously rated by this user’s “nearest neighbors”, i.e.,
other users who have similar rating patterns [
        <xref ref-type="bibr" rid="ref20 ref6">6,20</xref>
        ]. That is, the
value of the unknown rating for user u and item i is usually
computed as an aggregate of the neighbors’ ratings for the same
item i. The most common aggregation approach is the weighted
sum of the neighbors’ ratings, where the similarity of two users is
used as a weight. I.e., the more similar user u' and target user u
are, the more weight will be carried by the rating provided by user
u' on item i in the weighted sum when computing the prediction.
Predicted rating for user u on item i is computed as:
∗,
∑∈
,
      </p>
      <p>∗ ,
∑∈
, |
|
where N(u,i) is a set of “neighbors” with similar rating patterns to
user u and that have provided ratings for item i, simuv is the
similarity between users u and v, and bui is the baseline estimate
for user u on item i. In our implementation, two users must have
rated at least 3 items in common to allow computation of
similarity between them. The similarity between two users is
calculated as Pearson correlation between rating vectors (based on
the commonly rated items) of the two users. Prediction of each
unknown rating is formulated by combining the preferences of 20
most similar users who have rated the same item.</p>
      <p>
        Item-Based Collaborative Filtering (CF_Item). The user-based
collaborative filtering technique also has an analogous item-based
version, where the ratings of the nearest-neighbor items are used
to predict unknown ratings for a given item. Several studies have
presented empirical evidence that item-based algorithms often
provide better predictive accuracy than user-based methods (e.g.,
[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]). Thus, our experiments also test the standard item-based
collaborative filtering in conjunction with the proposed approach.
Similarly to the settings employed in user-based CF, in our
experiments, two items are required to have been rated by 3
common users to allow similarity evaluation between them, and
20 nearest-neighbor items are used to formulate a prediction.
Matrix factorization (SVD). Matrix factorization technique is a
model-based (as opposed to heuristic-based) collaborative
filtering approach that characterizes items and users via a number
of latent factors inferred from known ratings [
        <xref ref-type="bibr" rid="ref15 ref8">8,15</xref>
        ]. This
technique models the U×I rating space as a product of two
submatrices: user preference matrix (U×L) and item feature matrix
(L×I). Each user and item is described by a vector of L latent
variables. In our experiments L is set to be 20. The user vector
indicates the preference of the user for several latent features, and
the item vector represents an item’s importance weights for the
same latent features. Singular value decomposition (SVD)
techniques are used to decompose original rating matrix into the
two sub-matrices in an optimal way that minimizes the resulting
approximation error. After the two sub-matrices are learned using
known ratings, each unknown rating is estimated as a dot-product
of the corresponding user- and item-factors vectors. Many
variations of matrix factorization techniques have been developed
during the recent Netflix Prize competition (e.g., [
        <xref ref-type="bibr" rid="ref14 ref15 ref18 ref21">14,15,18,21</xref>
        ]).
Our experiments focus on the basic underlying version of the
matrix factorization [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]; however, the proposed meta-algorithmic
approach can be applied with any variation of this technique.
      </p>
    </sec>
    <sec id="sec-9">
      <title>4.3 Results: Comparing Iterative Smoothing with Standard Recommendation Techniques</title>
      <p>
        The objective of the experiment is to compare the performance of
the proposed iterative smoothing approach with standard
singlemodel recommendation techniques on several real world datasets.
The first dataset we used is the Movielens 100K dataset [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
which contains 100,000 known ratings on 1682 movies from 943
users (6.3% data density). Our second dataset is a sample
extracted from the Movielens 1M dataset. The original Movielens
1M dataset consists of 1,000,000 ratings for 6040 movies by 3952
users (4.2% data density) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. From this dataset we extracted a
random sample of 3000 users and 3000 movies. Resulted dataset
contains 400,627 known ratings (i.e., 4.45% data density). Our
third dataset is sampled from the Netflix 100M dataset used in the
recent Netflix Prize competition [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Similarly to the second
dataset, we sub-sampled 3000 random users and 3000 random
movies from the original data file. The result data sample consists
of 105,256 known ratings (i.e., 1.17% data density). The three
datasets used in our experiments come from different sources and
have different data characteristics (i.e., size and sparsity). All
movie ratings in the Movielens and Netflix datasets are integer
values between 1 and 5, where 1 represents the least liked movies,
and 5 represents the most liked movies. The datasets used in this
experiment are summarized in Table 1.
      </p>
      <p>Experimental results are consistent across different datasets. On
all three datasets, our proposed meta-algorithmic iterative
smoothing approach outperformed the original recommendation
techniques in both stability and accuracy in the vast majority of
cases. In particular, on average (i.e., across all datasets), iterative
smoothing provided a dramatic 55% improvement over the
original recommendation algorithms in stability (as measured by
RMSS) for CF_User and CF_Item algorithms. Even for the fairly
stable SVD and baseline techniques, on average, iterative
smoothing was able to further improve RMSS by 14%. In terms
of predictive accuracy, on average, iterative smoothing provided
1.4% improvements in RMSE across different algorithms.
CF_Item
Baseline
SVD
CF_User
CF_Item
Baseline
SVD
CF_User
CF_Item
Baseline</p>
    </sec>
    <sec id="sec-10">
      <title>5. CONCLUSIONS AND FUTURE WORK</title>
      <p>This paper introduces a general-purpose, practical
metaalgorithmic approach – based on iterative smoothing – for
improving stability of a variety of traditional recommendation
algorithms. The iterative smoothing approach uses multiple
iterations to repeatedly and explicitly adjust predictions of a
recommendation algorithm based on its other predictions in order
to make them more consistent with each other. We examined the
performance of iterative smoothing approach on several real
world datasets. Our experiments show that the proposed approach
demonstrates effectiveness in their ability to improve stability for
several widely used recommendation algorithms. Perhaps as
importantly, the proposed approach not only does not sacrifice the
predictive accuracy to obtain these stability improvements, but
actually is able to provide some additional accuracy
improvements at the same time.</p>
      <p>This work provides several interesting directions for future
research. This study shows that iterative smoothing can improve
stability for different recommendation algorithms, providing
larger improvements for some algorithms and smaller
improvements for some others. Providing some additional
theoretical understanding of what algorithmic and data
characteristics can lead to larger vs. smaller improvements in
recommendation stability for the proposed approach is an
important direction for future work. Another interesting direction
would be to perform user behavior studies to investigate the value
of stable (i.e., as opposed to unstable) recommendations on users’
usage patterns and acceptance of recommender systems.</p>
    </sec>
    <sec id="sec-11">
      <title>ACKNOWLEDGMENTS</title>
      <p>This research was supported in part by the National Science
Foundation grant IIS-0546443.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Adomavicius</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tuzhilin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2005</year>
          .
          <article-title>"Toward the Next Generation of Recommendation Systems: A Survey of the State-ofthe-Art and Possible Extensions,"</article-title>
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>17</volume>
          , (
          <issue>6</issue>
          ),
          <fpage>734</fpage>
          -
          <lpage>749</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Adomavicius</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and Zhang,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2010</year>
          .
          <article-title>"On the Stability of Recommendation Algorithms,"</article-title>
          <source>ACM Conference on Recommender systems</source>
          , Barcelona, Spain: ACM New York, NY, USA 47-
          <fpage>54</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Balabanovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Shoham</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <year>1997</year>
          .
          <article-title>"Fab: Content-Based, Collaborative Recommendation,"</article-title>
          <source>Comm. of ACM</source>
          ,
          <volume>40</volume>
          , (
          <issue>3</issue>
          ),
          <fpage>66</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Bell</surname>
            ,
            <given-names>R.M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Koren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>"Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights "</article-title>
          <source>Seventh IEEE International Conference on Data Mining</source>
          , Omaha,
          <string-name>
            <surname>NE</surname>
          </string-name>
          , USA.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Bennett</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lanning</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>"The Netflix Prize,"</article-title>
          <source>KDD-Cup and Workshop</source>
          , San Jose, CA.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Breese</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heckerman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kadie</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>1998</year>
          .
          <article-title>"Empirical Analysis of Predictive Algorithms for Collaborative Filtering,"</article-title>
          <source>14th Conf. on Uncertainty in Artificial Intelligence</source>
          , Madison, WI.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D</given-names>
            <surname>'astous</surname>
          </string-name>
          , A., and
          <string-name>
            <surname>Touil</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>1999</year>
          .
          <article-title>"Consumer Evaluations of Movies on the Basis of Critics' Judgments,"</article-title>
          <source>Psychology &amp; Marketing</source>
          ,
          <volume>16</volume>
          ,
          <fpage>677</fpage>
          -
          <lpage>694</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Funk</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2006</year>
          .
          <article-title>"</article-title>
          <source>Netflix Update: Try This at Home."</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Gershoff</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mukherjee</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Mukhopadhyay</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2003</year>
          .
          <article-title>"Consumer Acceptance of Online Agent Advice: Extremity</article-title>
          and
          <string-name>
            <given-names>Positivity</given-names>
            <surname>Effects</surname>
          </string-name>
          ,
          <article-title>"</article-title>
          <source>J. of Consumer Psychology</source>
          ,
          <volume>13</volume>
          , (
          <issue>1</issue>
          &amp;2),
          <fpage>161</fpage>
          -
          <lpage>170</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[10] Grouplens</source>
          <year>2011</year>
          .
          <article-title>"Movielens Data Sets."</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Herlocker</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Konstan</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Terveen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Riedl</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>"Evaluating Collaborative Filtering Recommender Systems,"</article-title>
          <source>ACM Transactions on Information Systems</source>
          ,
          <volume>22</volume>
          , (
          <issue>1</issue>
          ), Jan,
          <fpage>5</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Komiak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Benbasat</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <year>2006</year>
          .
          <article-title>"The Effects of Personalization and Familiarity on Trust and Adoption of Recommendation Agents,"</article-title>
          <source>MIS Quarterly</source>
          ,
          <volume>30</volume>
          , (
          <issue>4</issue>
          ),
          <fpage>941</fpage>
          -
          <lpage>960</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Koren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>"The Bellkor Solution to the Netflix Grand Prize." from http://www</article-title>
          .netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Koren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>"Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model,"</article-title>
          <source>ACM Intl. Conf. on Knowledge Discovery and Data Mining (KDD'08)</source>
          ,
          <fpage>426</fpage>
          -
          <lpage>434</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Koren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bell</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Volinsky</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>"Matrix Factorization Techniques for Recommender Systems,"</article-title>
          <source>IEEE Computer</source>
          ,
          <volume>42</volume>
          ,
          <fpage>30</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Macskassy</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Provost</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>"Classification in Networked Data: A Toolkit and a Univariate Case Study,"</article-title>
          <source>Journal of Machine Learning R</source>
          ,
          <volume>8</volume>
          ,
          <fpage>935</fpage>
          -
          <lpage>983</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>O</given-names>
            <surname>'Donovan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            , and
            <surname>Smyth</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <year>2005</year>
          .
          <article-title>"Trust in Recommender Systems," 10th Intl. Conf. on Intelligent User Interfaces</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Paterek</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>"Improving Regularized Singular Value Decomposition for Collaborative Filtering,"</article-title>
          <source>KDDCup</source>
          ,
          <fpage>39</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Pilaszy</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tikk</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>"Recommending New Movies: Even a Few Ratings Are More Valuable Than Metadata,"</article-title>
          <source>Third ACM Conf. on Recommender systems (RecSys '09)</source>
          ,
          <fpage>93</fpage>
          -
          <lpage>100</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Resnick</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iacovou</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suchak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergstrom</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedl</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>1994</year>
          .
          <article-title>"Grouplens: An Open Architecture for Collaborative Filtering of Netnews</article-title>
          .,
          <source>" Conf. on Comp. Supported Cooperative Work</source>
          ,
          <fpage>175</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Mnih</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>"Bayesian Probabilistic Matrix Factorization Using Markov Chain Monte Carlo,"</article-title>
          <source>25th Intl. Conf. on Machine Learning</source>
          , Helsinki, Finland: ACM,
          <fpage>880</fpage>
          -
          <lpage>887</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Sarwar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karypis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Konstan</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Riedl</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2001</year>
          .
          <article-title>"ItemBased Collaborative Filtering Recommendation Algorithms,"</article-title>
          10th International WWW Conference, Hong Kong,
          <fpage>285</fpage>
          -
          <lpage>295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Shani</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Gunawardana</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>"Evaluating Recommendation Systems," in Recommender Systems Handbook</article-title>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ricci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rokach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shapira</surname>
          </string-name>
          and P.B.
          <source>Kantor (eds.)</source>
          .
          <fpage>257</fpage>
          -
          <lpage>294</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Van Swol</surname>
            ,
            <given-names>L.M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Sniezek</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          <year>2005</year>
          .
          <article-title>"Factors Affecting the Acceptance of Expert Advice,"</article-title>
          <source>British Journal of Social Psychology</source>
          ,
          <volume>44</volume>
          , (
          <issue>3</issue>
          ),
          <fpage>443</fpage>
          -
          <lpage>461</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>