<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Experiences from Implementing Collaborative Filtering in a Web 2.0 Application</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wolfgang Woerndl</string-name>
          <email>woerndl@in.tum.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Helminger</string-name>
          <email>helminger@in.tum.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vivian Prinz</string-name>
          <email>prinzv@in.tum.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TU Muenchen, Chair for Applied Informatics - Cooperative Systems Boltzmannstr.</institution>
          <addr-line>3, 85748 Garching</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2009</year>
      </pub-date>
      <fpage>22</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>The goal of this paper is to report our experiences from integrating item-based collaborative filtering into the Web 2.0 site linkfun.net. We discuss the necessary steps to implement the selected Slope One algorithm in our real world application. It was necessary to conduct performance optimization to allow for recommendations without any delays in page generation on our site. Firstly, we significantly reduced the data model by including only items similarities for pairs of items where both items been rated by at least k users. Secondly, we precomputed recommended items for users. By analyzing the empirical results, we found out that user activity increased on the site after introducing the recommender. In addition, users rated recommended videos higher on average than others which indicates that the recommender allowed users to find preferred videos more effectively.</p>
      </abstract>
      <kwd-group>
        <kwd>recommender systems</kwd>
        <kwd>collaborative filtering</kwd>
        <kwd>slope one</kwd>
        <kwd>performance optimization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Web 2.0 applications such as MySpace, YouTube or FlickR have gained much
interest in industry and academia in recent years. Users can easily upload content such
as videos, photos or links in order to share them with others. However, most current
sites lack structured intelligence and finding meaningful information can be difficult
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Recommender systems and collaborative filtering are techniques to deal with this
problem as they filter information items according to a user’s needs and taste.
      </p>
      <p>In this project, we are investigating the integration of a collaborative
recommendation system in the real world Web 2.0 site www.linkfun.net (Fig. 1)1. This
site allows users to share links to funny content such as videos. While other sites like
YouTube feature all kind of videos, linkfun.net focuses on humorous content. In
addition, linkfun.net does not host the videos itself, but users provide links to content
on various other sites. The overall goal of the work presented in this paper was to
provide good recommendations to users and thus increase user interest in the site and
expanding the community and value of linkfun.net. Notable requirements included the</p>
      <sec id="sec-1-1">
        <title>1 The user interface of linkfun.net is currently in German only</title>
        <p>integration of the recommendation without decreasing the performance of the site,
extra hardware needs or additional effort for the users.</p>
        <p>The rest of this paper is organized as follows. The next section gives some
background on recommender systems and discusses which type of system appears
well suited for our scenario in principle. In section 3, we explain the necessary steps
to integrate the filtering functionality in linkfun.net. In section 4, we present the
empirical results from the analysis of log files. Finally, we conclude the paper with a
short summary and an outlook.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2 Recommender Systems and Collaborative Filtering</title>
      <p>The basic idea of recommender systems is to recommend products like books and
CDs and other items such as restaurants or videos to an active user. To do so, the
system computes the chance that a user likes an item. This is based on information
about the user the items and possibly other data such as contextual information.
Characteristics of recommender algorithms include the quality of recommendations,
storage and runtime complexities, anonymity and extensibility of the model.</p>
      <sec id="sec-2-1">
        <title>2.1 Individual Recommender Systems</title>
        <p>In general, we distinguish between individual and collaborative recommender
systems. Individual recommenders determine fitting items based on the profile of the
active user. Thereby, the system matches explicitly entered or implicitly observed
user preferences and interests with items meta data. Hence, this type of recommender
system is often called content-based recommender system. One way of implementing
this kind of recommender is to use a rule system. However, the content-based
approach is not well suited for Web 2.0 content because additional information about
the items and/or users is required. Moreover, an individual recommender does not fit
the social and collaborative nature of Web 2.0 applications.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 Collaborative Recommender Systems</title>
        <p>The second category of recommender systems are based on collaborative filtering
(CF). CF utilizes the ratings of other users for items, for example a rating on a scale of
1 to 5. The vector of all ratings of a user for various items is called a user’s rating
vector. CF seems appropriate for Web 2.0 because it needs no information about
items and implements the “word of mouth” idea that is also prevalent in Web 2.0.
Users like to express their opinions on content and basic rating schemes already exist
in some sites.</p>
        <p>
          We differentiate between two variants of CF, user- and item-based collaborative
filtering. The recommendation process of user-based CF basically consists of two
steps. First, neighborhood creation: Determine a set of k users that have rated
similarly to the active user in the past. Second, recommendation of new items for the
active user. For neighborhood creation, the active user’s rating vector is compared to
the vectors of all other users. To do so, different metrics have been proposed in the
literature, for example Euclidean distance, cosine similarity or Pearson-Spearman
correlation [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Thus, user-based CF analyzes the available raw data, namely the
useritem matrix of ratings. In the second step, the algorithm selects items, which the
active user has not rated yet, but which have been rated positively in the
neighborhood of the active user. User-based CF has proven very useful and accurate
in applications such as Web shops. However, this type of collaborative recommender
has several drawbacks. First of all, there is the new user problem. User-based CF
cannot generate a suggestive recommendation if the active user has not rated any
items. In a Web 2.0 site such as linkfun.net, new or occasional users may represent a
high share of the user base. A second relevant problem of user-based CF is that the
approach is computationally costly. The approach operates on the raw data of ratings,
which have to be analyzed each time a prediction is computed.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3 Item-based Collaborative Filtering</title>
        <p>
          The alternative approach is item-based CF. Item-based CF does not consider the
similarity of users, but of items [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Thus, the user-item matrix is not analyzed line by
line, but column by column. One significant difference to user-based CF is the
independence from who the active user is: the item similarities can be precomputed to
build an item-item matrix. The item-item matrix is the model of the algorithm.
Therefore, this type of recommendation algorithm is also called model-based
collaborative filtering. One element Si;j of the item-item matrix expresses the
similarity between items i and j, determined from the users’ ratings. The ratings
vector of the active user is then used to recommend items that are similar to items the
active user has rated positively in the past.
        </p>
        <p>It is important to note that item-based CF has little in common with individual,
content-based filtering. This is because the users’ ratings are solely used for
computing the item similarity. Meta data of items is irrelevant. Item-based CF has an
advantage over user-based CF with regard to the complexity of the computation. The
item-item matrix can be calculated as an intermediate result, independently from the
active user. The main advantage of item-based CF is performance, because generating
recommendations from the model instead of the raw data of ratings is much more
efficient.</p>
        <p>
          Slope One [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] is an example of an item-based CF algorithm. The main idea of the
approach is to use differentials of ratings and store them in the item-item matrix.
Slope One uses predictors of the form f(x) = x + b, which precompute the average
difference between the ratings of two items for users who rated both items [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
Consequently, the model can be updated on the fly, without the need to recalculate the
model when a rating is made. In addition, it is not demanding as much information
from new users. One rating is enough to be able to generate recommendation for a
user. Finally, [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] describes a straightforward implementation in PHP using a SQL
database. Linkfun.net was also implemented in PHP and SQL, so we decided to base
our recommender on Slope One. More details on Slope One and our implementation
are given in the next section.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Data Model, Recommendations and Optimization</title>
      <p>In this third section of the paper we explain the design decisions when integrating
Slope One into linkfun.net and implementing the collaborative filtering method.</p>
      <sec id="sec-3-1">
        <title>3.1 Data Model</title>
        <p>The initial situation in linkfun.net was that users were able to give ratings on a scale
from 1 to 5 with 5 being the best grade. However, the application did only save the
aggregated average value for each item. Information about the individual ratings of
users was not kept. Slope One and all other CF algorithms do need the detailed
useritem matrix though.</p>
        <p>Hence, the existing ratings had to be discarded and a new data model had to be
designed. This new model consists of three database tables to store the necessary
information:
- Table rating to store the user ratings for items, with one rating
corresponding to one row in this table.</p>
        <p>Every time a user u rates an item i, our system updates rating and calculates the
impact on dev. To do so, the algorithm first determines all items u has rated,
computes their rating differentials to i and updates the affected entries in dev.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Generating Recommendations</title>
        <p>
          The model can then be used to generate and display the top five recommended items –
primarily videos – when a user accesses linkfun.net. Slope One distinguishes between
non-personalized and personalized recommendations [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Non-personalized
recommendations are not based on collaborative filtering in the strict sense. After a
user rates just one item i, the algorithm searches for items, which have the highest
rating differential to i, i.e. were rated best in comparison to i.
        </p>
        <p>
          To predict the rating for a particular item i for an active user u, the algorithm first
selects the set of items rs, which were rated by u and also by at least one other user. In
the second step, the differentials between the ratings of i and the items in rs are
determined using the item-item matrix, respectively our database table dev. Finally,
the differentials are summed up and divided by the number of items in rs. This results
in a predicted rating for i. Note that the predicted rating is possibly higher than the
highest grade, for instance 5 on a 1-5 scale. This is because Slope One works on
rating differences [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. However, it is only important to compare the predicted values
of items to each other to be able to generate a ranked list of items for the personalized
recommendations.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3 Optimization</title>
        <p>The number of elements in the item-item matrix – i.e. the table dev – soon grew to
over 3.4 million entries with about 2000 items (Table 1). This is due to the fact that
with more ratings of users, more and more items receive at least one rating and more
and more similarities between item pairs can be calculated. This led to delays in
handling ratings and computing recommendations. It hurt the overall user experience
on linkfun.net because page generation was noticeably slowed. This necessitated
performance optimization.</p>
        <sec id="sec-3-3-1">
          <title>We implemented two solutions to improve performance:</title>
          <p>1. Reducing the data model and the number of entries of the table dev
2. Precomputing recommended items for users</p>
          <p>
            The basic idea to minimize the size of the item-item matrix is to only compute and
store item similarities for pairs of items where both items been rated by at least k
users [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. The obvious drawback is that some items, which have received few ratings,
are not represented in the item-item matrix anymore. Thus, these items are no longer
available for recommendation. Table 1 shows the number of entries in dev and the
number of available videos for certain values of the threshold k of necessary ratings.
At that point of time, a reasonable value for k was k=4. This value reduces the
numbers of entries in dev significantly from 3404049 to 605561 while keeping about
80% of items available for recommendations.
          </p>
          <p>Despite this reduction, generating recommendations when a user hits the
corresponding link took still too much time. Hence, the recommendation had to be
precomputed. To do so, we created a new database table precomp_recom that
stores five recommended items for every user. This table is updated according to the
following schedule:
- Once per day, the recommended items for all users are recalculated. This
procedure takes about 15 minutes. It is performed during the night when less
users access the site.
- The recommendations are updated every 5 minutes for users that have rated
items since the last update.</p>
          <p>The second condition ensures that recommendations are updated promptly and
regularly for active users and reflect their latest ratings. In any case, the model, i.e. the
item-item matrix as basis for the recommendations, may be slightly out of date. Yet
this fact has to be accepted to allow for instant recommendations by precomputing
them.</p>
          <p>In general, CF algorithms may suffer from cold start problems with new users or
new items. For example, new items cannot be recommended until they receive at least
one rating. This was not a problem in our case. Most of our items were rated within
hours and thus were potentially considered for recommendations reasonably soon. As
far as new users are concerned, newly registered users to linkfun.net were shown a
message that they need to rate at least one video to obtain recommendations.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Empirical Results</title>
      <p>In this section, we analyze the log files to evaluate the effect of the CF function on
linkfun.net. We were in particular interested in two questions:
- User activity: what were the effects of the recommender system on user
activity?
- Quality of recommendations: were recommended videos rated higher on
average than other items?</p>
      <sec id="sec-4-1">
        <title>4.1 User Activity</title>
        <p>At time of research, there were 490 registered users of linkfun.net, although some of
the registered users frequented the site rather seldom. 150 users rated at least one item
and 50 users actively used the recommendation function. Overall, there were 10500
ratings and 100000 times a video was played by users. About 60% of the video
playbacks occurred after using the recommendation function.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2 Quality of Recommendations</title>
        <p>As far as the quality of recommendations is concerned, we investigated the ratings the
users gave to the videos. Overall, the videos were rated high on average with many
videos receiving the best grade. This may be due to the fact that linkfun.net is a
specialized community where users only provide links to funny content that may cater
to a similar taste. We noticed a trend that this high rating average increased even
further after the introduction of the recommender function (Fig. 3). Our assumption is
that the rating and recommendation scheme allowed users to find preferred videos
more effectively.</p>
        <p>To evaluate the quality of recommendations in more detail, we compared the
ratings of videos that were recommended to a user with the ratings of videos that were
not in the list of recommendations. The latter category consists of videos accessed
from the homepage of the site or from a “newest videos” section. We found out that
recommended videos received higher grades: the average rating was 4.4 in
comparison to 4.0 for non-recommended videos.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5 Conclusion</title>
      <p>In this paper, we have described our experiences from integrating the item-based
collaborative filtering algorithm Slope One into the Web 2.0 site linkfun.net. We
explained the necessary steps including the data model. After we have done some
performance optimization, the recommender function ran smoothly on the site,
without any delays in user experience or additional hardware requirements. The
performance optimization included reducing the data model of item similarities and
precomputing recommended items for users. Overall, the Slope One algorithm proved
to be very practicable and fitting for the examined site. By evaluating our
implementation, we found out that user activity increased on the site after introducing
the recommender. In addition, users rated recommended videos higher on average
than others.</p>
      <p>
        As far as related work is concerned, there are several applications which also use
SlopeOne in practice. For example, InDiscover (http://www.indiscover.net) is a site
for promoting independent musician and recommending new music to interested
customers. Their system uses RACOFI which is a framework for rule-based
collaborative filtering partly based on Slope One [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. However, there is no published
information about performance optimization or empirical results. There is plenty of
research in improving recommender systems, mostly with a focus on prediction
quality [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], but there are few reports on experiences from applying recommender
algorithms in practical Web 2.0 applications. Leimstoll and Stormer discuss in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
how collaborative filtering can be integrated in online shops in principle. Their
proposal is somewhat similar to our approach, although no experiences from real
world applications are reported.
      </p>
      <p>One goal of our planned future activities is to integrate implicit ratings that can be
observed from user behavior. So far, all recommendations are based on explicit
ratings users have made. We are currently investigating methods to measure the exact
time a user spends with a video. When a user is watching a video until completion,
one can assume that she liked the video, which relates to a good rating. If the user
cancels the playback soon after the start, we would assign a low rating. Subsequently,
we want to measure whether recommendation based on these implicit ratings derived
from user observation performs as well as the explicit user ratings.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          -J.:
          <source>Building Web 2.0. IEEE Computer</source>
          , vol.
          <volume>40</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>101</fpage>
          -
          <lpage>102</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Adomavicius</surname>
          </string-name>
          , G.;
          <string-name>
            <surname>Tuzhilin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Towards the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions</article-title>
          .
          <source>IEEE Transactions on Knowledge, and Data Engineering</source>
          , vol.
          <volume>17</volume>
          , no.
          <issue>6</issue>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Sarwar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Karypis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Konstan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Riedl</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Item-based Collaborative Filtering Recommendation Algorithms</article-title>
          .
          <source>Proc. 10th International Conference on World Wide Web (WWW10)</source>
          , Hong Kong,
          <string-name>
            <surname>China</surname>
          </string-name>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lemire</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>MacLachlan</surname>
          </string-name>
          , A.:
          <article-title>Slope One Predictors for Online Rating-based Collaborative Filtering</article-title>
          .
          <source>SIAM Data Mining</source>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lemire</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGrath</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Implementing a Rating-Based Item-to-Item Recommender System in PHP/SQL</article-title>
          . Available at: http://www.daniel-lemire.com/fr/documents/publications/webpaper.pdf (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ball</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boley</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greene</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Howse</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemire</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGrath</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : RACOFI:
          <article-title>Rule-Applying Collaborative Filtering Systems</article-title>
          . IEEE/WIC COLA'
          <volume>03</volume>
          ,
          <string-name>
            <surname>Halifax</surname>
          </string-name>
          , Canada, (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pu</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bridge</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mobasher</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ricci</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <source>Proceedings of the 2008 ACM Conference on Recommender Systems (RecSys '08)</source>
          . ACM (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Leimstoll</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stormer</surname>
          </string-name>
          , H.:
          <article-title>Collaborative Recommender Systems for Online Shops</article-title>
          .
          <source>AMCIS</source>
          <year>2007</year>
          ,
          <article-title>Keystone</article-title>
          , CO (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>