Improving FolkRank With
                                    Item-Based Collaborative Filtering

                               Jonathan Gemmell, Thomas Schimoler, Maryam Ramezani,
                                        Laura Christiansen, Bamshad Mobasher
                                                            Center for Web Intelligence
                                                       School of Computing, DePaul University
                                                                Chicago, Illinois, USA
                         {jgemmell, tschimo1, mramezani, lchris10, mobasher}@cdm.depaul.edu

ABSTRACT                                                                             photographs and music respectively. Other less popular tagging
Collaborative tagging applications allow users to annotate online                    applications serve niche communities enabling users to tag blogs,
resources. The result is a complex tapestry of interrelated users, re-               business documents or scholarly articles.
sources and tags often called a folksonomy. Folksonomies present                         At the heart of collaborative tagging is the post; a user describes
an attractive target for data mining applications such as tag recom-                 a resource with a set of tags. A collection of posts results in a com-
menders. A challenge of tag recommendation remains the adapta-                       plex network of interrelated users, resources and tags commonly
tion of traditional recommendation techniques originally designed                    referred to as a folksonomy [16]. Users are able to navigate this
to work with two dimensional data. To date the most successful                       network free from a rigid conceptual hierarchy.
recommenders have been graph based approaches which explicitly                           Despite the freedom users enjoy, the size of a folksonomy often
connects all three components of the folksonomy.                                     hampers the userŠs exploration. Data mining applications such as
    In this paper we speculate that graph based tag recommenda-                      recommenders can assist the user by reducing a burdensome num-
tion can be improved by coupling it with item-based collaborative                    ber of items to a smaller collection related the user’s interests. In
filtering. We motive this hypothesis with a discussion of informa-                   this work we focus on tag recommendation, the suggestion of tags
tional channels in folksonomies and provide a theoretical explana-                   during the annotation process.
tion of the additive potential for item-based collaborative filtering.                   Tag recommendation reduces the cognitive effort from genera-
We then provided experimental results on hybrid tag recommenders                     tion to recognition. Users are therefore encouraged to tag more
built from graph models and other techniques based on popularity,                    frequently, apply more tags to a resource, reuse common tags and
user-based collaborative filtering and item-based collaborative fil-                 use tags the user had not previously considered. User error is re-
tering.                                                                              duced by eliminating capitalization inconsistencies, punctuation er-
    We demonstrate that a hybrid recommender built from a graph                      rors, misspellings and other discrepancies. The final result is a
based model and item-based collaborative filtering outperforms its                   cleaner denser dataset that is useful in its own right or for further
constituent recommenders. Furthermore the inability of the other                     data mining applications.
recommenders to improve upon the graph-based approach suggests                           Despite the richness offered by folksonomies, they also present
that they offer information already included in the graph based                      unique challenges for tag recommenders. Traditional recommen-
model. These results confirm our conjecture. We provide exten-                       dation strategies, often developed to work with two dimensional
sive evaluation of the hybrids using data collected from three real                  data, must be adapted to work with the three dimensional nature
world collaborative tagging applications.                                            of folksonomies. Otherwise they risk disregarding potentially use-
                                                                                     ful information. To date the most successful tag recommenders are
                                                                                     graph-based models, which exploits the user-defined links between
1.     INTRODUCTION                                                                  the users, resources and tags.
   Collaborative tagging has emerged as a popular method for or-                         In this work we propose augmenting the graph based approach
ganizing and sharing online content with user-defined keywords.                      with item-based collaborative filtering. We offer a discussion of in-
Delicious1 , Flickr2 and Last.fm3 are among the most popular des-                    formation channels in folksonomies to motivate this proposal. The
tinations on the Web allowing users to annotate bookmarks, digital                   graph based model covers the user-resource, user-tag, and resource-
                                                                                     tag channels. Item-based collaborative filtering, on the other hand,
1                                                                                    focuses on tags previously applied by the user to resources simi-
  delicious.com
2
  www.flickr.com                                                                     lar to the query resource. It therefore includes resource-resource
3
  www.last.fm                                                                        information not explicitly contained in the graph model. Addition-
                                                                                     ally, the user-tag information utilized by item-based collaborative
                                                                                     filtering is more oriented to query resource.
                                                                                         We construct hybrid tag recommenders composed of the graph
Permission to make digital or hard copies of all or part of this work for            models and other techniques including popularity models, user-
personal or classroom use is granted without fee provided that copies are            based collaborative filtering and item-based collaborative filtering.
not made or distributed for profit or commercial advantage and that copies           The graph based recommender coupled with item-based collabo-
bear this notice and the full citation on the first page. To copy otherwise, to      rative filtering produces better results than either produce alone,
republish, to post on servers or to redistribute to lists, requires prior specific
                                                                                     strengthening our theory that that item-based collaborative filtering
permission and/or a fee.
RecSys’09, October 22–25, 2008, New York City, New York.                             contains information that is absent in the graph based model. More-
Copyright 2009 ACM 978-1-60558-093-7/08/10 ...$5.00.
over the other hybrids do not improve upon the graph based model        as coherent topics can aid in the personalization of search and navi-
suggesting that the information they contain are already adequately     gation. Further support for the utility of clustering is offered in [4]
represented by the graph based approach.                                where improvement in search through clustering is theorized. In
   The rest of this paper is organized as follows. In Section 2 we      [7] we adapted K-Nearest Neighbor for tag recommendation and
describe related works. A brief survey of the tag recommenders we       showed incorporating user tagging habits into recommendation can
employ in our experiments is given in Section 3. The use of hybrid      improve K-Nearest Neighbor.
recommenders is motivated in Section 4 where we discuss infor-             General criteria for a good tagging system including high cover-
mational channels in folksonomies. Section 5 details how tag rec-       age of multiple channels, high popularity and least-effort are pre-
ommenders may be compounded to produce hybrid recommenders.             sented in [31]. They categorize tags as content-based tags, context-
Our experimental evaluation is presented in Section 6, including a      based tags, attribute tags, subjective tags, and organizational tags
description of our datasets, our methodology and a discussion of        and use a probabilistic method to recommend tags. In [2] the au-
our findings. Finally in Section 7 we present our conclusions and       thors propose a classification algorithm for tag recommendation.
lay a foundation for future work.                                       Semantic tag recommendation systems in the context of a seman-
                                                                        tic desktop are explored in [1]. Clustering to make real-time tag
2.    BACKGROUND AND RELATED WORK                                       recommendation is developed in [25].
   The term folksonomy was coined by [28], a play on folk and
taxonomy. While the term is new, [29] argues that collaborative         3.    TAG RECOMMENDATION
tagging in merely a renaissance of manual indexing. However, the           Here we first provide a model of folksonomies, then review sev-
scope and connectivity of the Internet permits tagging to rise to a     eral common recommendation techniques which we employ in our
level heretofore unrealized.                                            evaluation. A folksonomy can be described as a four-tuple:
   In [16] the attractiveness of tagging is outlined: serendipitous
browsing, a low entry cost, utilizing the wisdom of the crowd, and                                D = hU, R, T, Ai                          (1)
a sense of community. Moreover, he argues that tagging allows ob-
jects to be categorized under multiple tags, unfettered from tradi-     where, U is a set of users; R is a set of resources; T is a set of
tional taxonomies. He also discusses two obstacles: tag ambiguity       tags; and A is a set of annotations, represented as user-tag-resource
in which a tag has several meanings and tag redundancy in which         triples:
several tags have the same meaning.
   As collaborative tagging applications have gained in popular-                       A ⊆ {hu, r, ti : u ∈ U, r ∈ R, t ∈ T }               (2)
ity researchers have explored and characterized the tagging phe-           A folksonomy can, therefore, be viewed as a tripartite hyper-
nomenon. In [15] and [10] the authors studied the information           graph [17] with users, tags, and resources represented as nodes and
dynamics of Delicious, one of the most popular folksonomies. The        the annotations represented as hyper-edges connecting a user, a tag
authors discussed how tags have been used by individual users over      and a resource.
time and how tags for an individual resource stabilize over time. In       Aggregate projections of the data can be constructed, reducing
[15] the authors provide an overview of the phenomenon and offer        the dimensionality but sacrificing information [22]. The relation
reasons why both folksonomies and taxonomies will have a place          between resources and tags, RT , can be formulated such that each
in the future of information access.                                    entry, RT (r, t), is the weight associated with the resource, r, and
   There have been many recent research investigations into recom-      the tag, t. This weight may be binary, merely showing that one or
mendation within folksonomies. Unlike traditional recommender           more users have applied that tag to the resource. In this work we
systems which have a two-dimensional relation between users and         assume RT (r, t) to be the number of users that have applied t to
items, tagging systems have a three dimensional relation between        the r:
users, tags and resources. Recommender systems can be used to
recommend each of the dimensions based on one or two of the other
dimensions. In [26] the authors apply user-based and item-based                    RTtf (r, t) = |{a = hu, r, ti ∈ A : u ∈ U }|             (3)
collaborative filtering to recommend resources in a tagging system         Analogous two-dimensional projections can be constructed for
and uses tags as an extension to the user-item matrices. Tags are       UT in which the weights correspond to users and tags, and UR in
used as context information to recommend resources in [19] and          which the weights correspond to users and resources.
[18].                                                                      Many authors have attempted to exploit the data model for rec-
   In [13] user-based collaborative filtering is compared to a graph-   ommendation in folksonomies. In traditional recommendation al-
based recommender based on the PageRank algorithm for tag rec-          gorithms the input is often a user, u, and the output is a set of items,
ommendation. The authors in [11] use association rules to rec-          I. Tag recommendation differs in that the input is both a user and
ommend tags and introduce an entropy-based metric to define how         a resource. The output remains a set of items, in this case a set
predictable a tag is. In [14] the title of a resource, the posts of a   of recommended tags, Tr . Given a user-resource pair, the recom-
resource and the user’s vocabulary are used to recommend tags.          mendation set is constructed by calculating a weight for each tag,
   User-defined tags and co-occurrence are employed by [24] to          w(u, r, t), and recommending the top n tags.
recommend tags to users on Flickr. The assumption is that the user
has already assigned a set of tags to a photo and the recommender       3.1     Popularity Based Approaches
uses those tags to recommend more tags. The authors in [6] have           We consider two popularity based models which rely on the fre-
completed a similar study and introduce a classification for tag rec-   quency a tag is used. PopRes ignores the user and relies on the
ommendation. Probabilistic models have been used in recommen-           popularity of a tag within the context of a particular resource. We
dation in folksonomies in [20] and [30]. Moreover, [20] uses Prob-      define the resource based popularity measure as:
abilistic Latent Semantic Analysis for resource discovery and [30]
uses single aspect PLSA for tag recommendation.                                                   |{a = hu, r, ti ∈ A : u ∈ U }|
   Previously, in [8, 9], we demonstrated how tag clusters serving              w(u, r, t) =                                                (4)
                                                                                               |{a = hu, r, ti ∈ A : u ∈ U, t ∈ T }|
   PopUser, on the other hand, ignores the resource and focuses on
the frequency of a tag within the user profile. We define the user
based popularity measure as:

                          |{a = hu, r, ti ∈ A : r ∈ R}|
        w(u, r, t) =                                              (5)
                       |{a = hu, r, ti ∈ A : r ∈ R, t ∈ T }|
   Popularity based recommenders require little online computa-
tion. Models are built offline and can be incrementally updated.
However both these models focus on a single channel of the folk-
sonomy and may not incorporate otherwise relevant information
into the recommendation.

3.2    User-Based Collaborative Filtering
   User-based K-nearest neighbor is a commonly used recommen-
dation algorithm in Information Retrieval that can be modified for
use in folksonomies. Applications may model users by recency,                  Figure 1: Informational channels of a folksonomy.
authority, linkage or vector space models. In this work we focus on
the vector space model [21] and describe the user as a vector over
either the tag space or the resource space.                                                           PS
   KNN_UT models the user, u, as a vector over the set of tags                                         s sim(s, r) ∗ d(u, s, t)
                                                                                       w(u, r, t) =                                         (7)
where the weight in each dimension corresponds to the occurrence                                                  k
of the tag in the user profile as it is defined by the two dimensional   where d(u, s, t) will equal 1 if the user has applied t to s and 0
projection U T (u, t). Other methods may be used to model the            otherwise. This recommender focuses entirely on the user’s tagging
user, such as a vector over the set of resources or a combination        habits. Unlike the user-based filtering methods, it may be able to
of tags and resources. Several techniques may be used to calculate       identify tags that are common to the user but rarely used by others.
the similarity between vectors such as Jaccard similarity or cosine      However, it lacks the ability to discover relevant tags from other
similarity [27]. In this work we rely on cosine similarity.              users. Depending on the size of the user profile, this recommender
   Using the similarity measure a neighborhood, N , of the k most        will also scale well to larger datasets, particularly if the resource-
similar users is constructed such that they have all previously an-      resource similarity matrix if calculated offline.
notated the query resource, r. A weight for each tag is calculated
as:                                                                      3.4 FolkRank
                                                                            FolkRank was proposed in [12]. It computes a PageRank vector
                            PN                                           from the tripartite graph of the folksonomy. This graph is generated
                              n sim(u, n) ∗ d(n, r, t)
             w(u, r, t) =                                         (6)    by regarding U ∪ R ∪ T as the set of vertices. Edges are defined
                                         k
                                                                         by the three two-dimensional projections of the hyper-graph, RT ,
where d(n, r, t) is 1 if the neighbor, n, has annotated the query        U R and U T .
resource, r, with the tag t. Otherwise it is 0.                             If we regard the adjacency matrix of this graph, W , (normalized
   Traditional user-based collaborative filtering requires a compar-     to be column-stochastic), a damping factor, d, and a preference vec-
ison between the query user and every other user. However, since         tor, p, then we iteratively compute the PageRank vector, w, in the
the adapted algorithm considers only those users that have anno-         usual manner: w = dAw + (1 − d)p.
tated the query resource, the number of similarities to calculate is        However due to the symmetry inherent in the graph, this basic
drastically reduced. The popularity of resources in folksonomies         PageRank may focus too heavily on the most popular elements.
follows the power law and the great majority of resources will ben-      The FolkRank vector is taken as a difference between two compu-
efit from this reduced reduction in computation, while a few will        tations of PageRank: one with and one without a preference vector.
require additional computational effort. As a result the algorithm       Tag recommendations are generated by biasing the preference vec-
scales well with large datasets.                                         tor towards the query user and resource [13]. These elements are
   However, since the algorithm relies on the collaboration of other     given a substantial weight while all other elements have uniformly
users it may be the case that a tag cannot be recommended because        small weights.
it does not appear in a neighbor’s profile. While the personalization       PageRank has proven to be one of the top performing tag recom-
offered by user-based filtering is an important component for the        menders. However, it imposes steep computational costs.
recommender, it lacks the ability to reflect the habits and patterns
of the larger crowd.                                                     4.    INFORMATIONAL CHANNELS
3.3    Item-Based Collaborative Filtering                                      OF FOLKSONOMIES
   KNN_RT models resources as a vector over the tag space. Give             The model of a folksonomy suggests several informational chan-
a resource and a tag, we define the weight as the entry of the two       nels which may be exploited by data mining applications such as
dimensional projection, RT (r, t), the number of times r has been        tag recommenders. The relation between users, resources and tags
tagged with t. When a user selects a resource to annotate, the co-       generate a complex network of interrelated items as shown in Fig-
sine similarity between it and every resource in the user profile is     ure 1.
calculated. A neighborhood of the k most similar resources, S, is           The channel between resources and tags reveals a highly descrip-
then constructed. We then define the item-based collaborative fil-       tive model of the resources. The accumulation of many users’ opin-
tering measure as:                                                       ions (often numbered in the thousands or millions) results in a rich-
                                                                          which combine pairs of recommenders in a linear model. Each
                                                                          model is trained separately. Given a user, u, and a resource, r, the
                                                                          hybrid queries both components for each tag in the folksonomy.
                                                                          The results is W (u, r, t) which contains the weights for all tags.
                                                                          In order to ensure that weight assignments for each recommenda-
                                                                          tion approach are on the same scale, we normalize the weights in
                                                                          W (u, r, t) to 1 producing W 0 (u, r, t).
                                                                             Originally, these weights were used to select the top n items for
                                                                          the recommendation set. In this case, however, the weights are
                                                                          combined in a linear model as:

                                                                                    w(u, r, t) = β ∗ wa0 (u, r, t) + α ∗ wb0 (u, r, t)       (8)
                                                                          where β = 1 − α. These coefficients are used to control the contri-
Figure 2: The effect of k in KNN_UT on recall and precision for           bution of the two recommenders. When α is set to 0, recommender
a recommendation set of 5 tags. Users are modeled as a vector             a acts alone. In the case that α is set to 0.5, each recommender
over the tag space.                                                       contributes equally to the final weight. For each hybrid, α must
                                                                          be empirically tuned to achieve the maximum synergy between the
                                                                          components. The tags are then resorted by the new weight, and the
ness which taxonomies are unable to approximate. Conversely the           top n tags are recommended for the annotation.
tags themselves are characterized by the resources to which they
have been assigned.                                                       6.    EXPERIMENTAL EVALUATION
   As users annotate resource with tags they define their interests in      In this section we describe the methods used to gather and pre-
as much as they describe a resource. The user-tag channel therefore       process our datasets. Our testing methodology is outlined. We pro-
reveals the users’ interests and provides opportunities for data min-     vide a discussion of how we tuned variables for each algorithm and
ing algorithms to offer a high degree of personalization. Likewise        describe the experiments on the weighted hybrid recommenders.
a user may be defined by the resources which he has annotated as          Finally, we discuss our observations.
in the user-resource channel.
   These primary channels can be used to produce secondary in-            6.1     Datasets
formational channels. The user-user channel can be constructed
by modeling users as a vector of tags or as a vector of resources              Folksonomy     Delicious (5%)      Citeulike     Bibsonomy
and applying a similarity measure such as cosine similarity. Many                    Users              7,665         2,051            357
variations exist. However the result reveals a network of users that             Resources             15,612         5,376          1,738
can be explored directly or incorporated into further data mining                     Tags              5,746         3,343          1,573
approaches. The resource-resource and tag-tag channels provide                       Posts            720,788       42,278          19,909
similar utility, presenting navigational opportunities for users to ex-        Annotations          2,762,235      105,873          54,848
plore similar resources or neighborhoods of tags.
   The success of tag recommenders hinge on their ability to incor-                                Table 1: Datasets
porate all of these informational channels. A simple recommender
such as PopRes focuses only on the tag-resource channel, whereas
PopUser includes only the information between tags and users.                We provide an extensive evaluation of the hybrid recommenders
   Collaborative filtering techniques include additional channels but     using data from three real collaborative tagging applications: Deli-
increase the computational overhead. KNN_UT discovers a set of            cious, Citeulike, and Bibsonomy.
neighbors, thereby covering the user-user channel. It then focuses
on tags those neighbors applied to the query resource covering the        6.1.1 P -Core Processing
user-resource and resource-tag channels. FolkRank, on the other              By P -core processing users, resources and tags are removed
hand, explicitly defines the relation between users, resources and        from the dataset in order to produce a residual dataset that guaran-
tags in its adjacency matrix. While FolkRank has proven to be             tees each user, resource and tag occur in at least p posts [3]. Here
among most effective tag recommenders, augmenting it with algo-           we define a post to include a user, a resource, and every tag the user
rithms that incorporate complimentary informational channels may          has applied to the resource.
improve its performance.                                                     By removing infrequent users, resources and tags noise in the
                                                                          data is reduced. Uncommon items whether they be tags used by
                                                                          only a few users, unpopular resources, or inactive users are elim-
5.    HYBRID RECOMMENDERS                                                 inated from consideration. Because of their scarcity these are the
   The multiple informational channels of folksonomies present an         very items likely to confound recommenders. Moreover by elim-
attractive target for hybrid recommenders. Hybrids combine sev-           inating infrequent items the size of the dataset is dramatically re-
eral recommenders together to produce a new recommender. The              duced allowing the application of data mining techniques that might
constituent recommenders are freed from the burden of the cover-          otherwise be computationally impractical.
ing all the available informational channels and may instead focus
on only a few. The hybrid then ties these recommenders together.          6.1.2     Delicious
A successful hybrid creates a synergistic blend of its constituent          Delicious is a popular collaborative tagging application in which
parts producing superior results that they could not achieve alone.       users annotate URLs. On 10/19/2008, 198 of the most popular tags
   In this paper we focus on weighted hybrid recommenders [5]             were taken from the user interface. For each of these tags the 2,000
Figure 3: The effect of alpha on the hybrid recommenders on the Delicious, Citeulike and Bibsonomy datasets. Results are shown
using recall and precision on a recommendation set of five tags.


most recent annotations including the contributors of the annota-      recommendation strategies. Some tag recommendation techniques
tions were collected. The social network for these contributors was    such as FolkRank are so computational intensive that larger sam-
explored recursively collecting 524,790 usernames.                     plings of the data are not feasible. In order to best compare the
   From 10/20/2008 to 12/15/2008 the complete profiles of the users    recommenders, the 5% sampling was used on all reported experi-
were collected. Each user profile consisted of a collection of anno-   ments. A P -core of 20 was taken from the sample and is reported
tations including the resource, tags and date of the original book-    in Table 1.
mark. The top 100 most prolific users were visually inspected;
twelve were removed from the data because their annotation count       6.1.3       Citeulike
was many orders of magnitude larger than other users and were             Citeulike is a popular online tool used by researchers to manage
therefore suspected to be Web-bots.                                    and discover scholarly references. They make their dataset freely
   Due to memory and time constraints, 5% of the user profiles was     available to download4 . On 2/17/2009 the most recent snapshot
randomly selected. Still this dataset remains far larger than either   was downloaded. The data contains anonymous user ids and posts
the following Bibsonomy or Citeulike datasets. Experiments on          for each user including resources, the date and time of the posting
larger samplings reveal near identical trends for several of the tag
                                                                       4
                                                                           www.citeulike.org/faq
and the tags applied to the resource. A P -core of 5 was taken. The
characteristics of the dataset are described in Table 1.

6.1.4      Bibsonomy
   This dataset was provided by Bibsonomy5 for use in the Euro-
pean Conference on Machine Learning and Principles and Prac-
tice of Knowledge Discovery in Databases (ECML-PKDD) 2009
Challenge. Bibsonomy was originally launched as a collaborative
tagging application allowing users to organize and share scholarly
references. It has since expanded its scope allowing users to anno-
tate URLs.
   The data includes all public bookmarks and publication posts
of Bibsonomy until 2009-01-01. The data was cleaned by remov-
ing all characters which are neither numbers nor letters from tags.
Additionally the system tags imported, public, systemimported, nn        Figure 4: A comparison of tag recommender techniques in De-
and systemunfiled where removed. A P -core of 5 was used. Table          licious.
1 relates the features of the dataset.

6.2      Experimental Methodology
   We have adopted the test methodology as described in [13]. In
this approach, called LeavePostOut, a single post is randomly re-
moved from each user’s profile. The training set is then comprised
of the remaining posts, while the test set contains one post per user.
Each test case consists of a user, u, a resource, r, and all the tags
the user has applied to that resource. These tags, Th , are analogous
to the holdout set commonly used in Information Retrieval. The
tag recommendation algorithms accept the user-resource pair and
return an ordered set of recommended tags, Tr .
   For evaluation we adopt the common recall are precision mea-
sures as is common in Information Retrieval. Recall measures the
percentage of items in the holdout set that appear in the recommen-
dation set. It is a measure of completeness and is defined as:
                                                                         Figure 5: A comparison of tag recommender techniques in Ci-
                                                                         teulike.
                        r = |Th ∩ Tr |/|Th |                      (9)
   Precision measures the percentage of items in the recommenda-
tion set that appear in the holdout set. It measures the exactness of
the recommendation algorithm and is defined as:

                        p = |Th ∩ Tr |/|Tr |                     (10)
   For each evaluation metric the average value is calculated across
all test cases.

6.3      Experimental Results
   Here we present our experimental results beginning with the tun-
ing of variables. The experiments with user-based collaborative fil-
tering require the tuning of k, the number of neighbors.
   Figure 2 shows the relation between k and the evaluation met-
rics recall and precision for a recommendation set of size 5. The        Figure 6: A comparison of tag recommender techniques in Bib-
Delicious dataset was used for this experiment. As k increases so        sonomy.
does recall and precision. However this improvement suffers from
diminishing returns until a k of 50 offers little more benefit than
a k of 20. This trend was observed for K-Nearest Neighbor ex-            other four recommenders. The left hand side of each graph shows
periments in the other two datasets as well. As such, all KNN_UT         the hybrid recommenders when α is set to 0 in which case FolkRank
experiments were completed using a k of 20.                              dominates the hybrid. As α increases more weight is given to the
   Item-based collaborative filtering also requires the tuning of k,     other recommenders until finally when α reaches 1, FolkRank plays
in this case the number of similar resources in the user profile to      no part in the recommendation.
include in the neighborhood. After empirical analysis we found 15           For all datasets, item-based collaborative filtering contributes to
to produce the best performance on all datasets.                         recall and precision of its hybrid. For example in the Delicious
   Figure 3 shows the tuning of α for the hybrid recommenders.           experiment when α is set to 0.4, recall for a recommendation set
Each hybrid is a linear combination of FolkRank and one of the           of five tags is 6% higher than FolkRank achieves alone and 13%
5
    www.bibsonomy.org                                                    higher than KNN_RT achieves alone.
   In the Delicious experiments, a hybrid built with PopUser offers       sonalized user-resource channels covered by item-based collabora-
a slight improvement, while it has a more dramatic improvement on         tive filtering compliment the channels utilized by FolkRank. The in-
Citeulike. These observations reveal that the personalization of the      ability of other recommenders to improve upon FolkRank provides
user-tag channel strongly incorporated into KNN_RT and PopUser            evidence that FolkRank sufficiently incorporates the informational
offers information lacking in FolkRank. While PopUser boosts all          channels covered by those recommenders.
of the user’s tags, KNN_RT focuses on tags related to the resource           Future work will involve investigating alternative hybrid tag rec-
being annotated accounted for its increased performance. On the           ommenders. New recommenders that cover other informational
other hand PopRes does not appear to provide any additional benefit       channels will be considered. Finally, alternative methods for hy-
to FolkRank. Indeed, FolkRank contains this information in the            bridizing recommenders will be explored.
utilization of the RT matrix.
   These two results reveal that the weights given to the query re-       8.    ACKNOWLEDGMENTS
source and query user in the FolkRank algorithm achieve different            This work was supported in part by the National Science Foun-
results. The weight applied to the resource immediately activates         dation Cyber Trust program under Grant IIS-0430303 and a grant
tags strongly associated with the resource. The result is similar to      from the Department of Education, Graduate Assistance in the Area
that achieved in PopRes, hence PopRes offers little assistance to its     of National Need, P200A070536.
hybrid. However, the weight applied to the query user disperses
through the graph activating all of the user’s tags relevant or irrele-
vant to user’s present context. KNN_RT, on the other hand, focuses
                                                                          9.    REFERENCES
                                                                           [1] B. Adrian, L. Sauermann, and T. Roth-Berghofer. Contag: A
on tags applied to resources similar to the query. Hence, it includes
                                                                               semantic tag recommendation system. In T. Pellegrini and
the resource-resource channel missing in FolkRank. The hybrid is               S. Schaffert, editors, Proceedings of I-Semantics’ 07, pages
able to be personalized but also be more context specific.                     pp. 297–304. JUCS, 2007.
   KNN_UT does not appear to offer any additional information
                                                                           [2] P. Basile, D. Gendarmi, F. Lanubile, and G. Semeraro.
that FolkRank did not already contain, even though it includes user-
                                                                               Recommending smart tags in a social bookmarking system.
resource information in the neighborhood selection, user-resource
                                                                               In Bridging the Gep between Semantic Web and Web 2.0
information in the cosine similarity and resource-tag information in
                                                                               (SemNet 2007), pages 22–29, 2007.
the recommendation step. This reveals that the way in which the in-
formational channels is equally important. Additionally KNN_UT             [3] V. Batagelj and M. Zaveršnik. Generalized cores. Arxiv
selects neighbors that are similar to the query user, utilizing the            preprint cs/0202039, 2002.
user-user channel. However, this channel does not appear to be             [4] G. Begelman, P. Keller, and F. Smadja. Automated Tag
beneficial to tag recommendation.                                              Clustering: Improving search and exploration in the tag
   After analysis of the effect of α on the hybrids we selected the            space. Proceedings of the Collaborative Web Tagging
best α for the FolkRank-KNN_RT hybrid. For Delicious we used an                Workshop at WWW, Volume 6, 2006.
α of 0.4. For Citeulike and Bibsonomy used an α of 0.5. Figures            [5] R. Burke. Hybrid recommender systems: Survey and
4 through 6 compare tag recommenders along with the hybrid. Re-                experiments. User Modeling and User Adapted Interaction,
call and precision are plotted for recommendation sets of size one             12(4):331–370, 2002.
through ten. For all datasets the hybrid outperforms its constituent       [6] N. Garg and I. Weber. Personalized, interactive tag
parts.                                                                         recommendation for flickr. In RecSys ’08: Proceedings of the
   We also observe a difference in the effect that constituent recom-          2008 ACM conference on Recommender systems, pages
menders have across the datasets. Delicious users tag Web pages                67–74, New York, NY, USA, 2008. ACM.
and their topics cover a wide array of topics. Citeulike users tag         [7] J. Gemmell, T. Schimoler, M. Ramezani, and B. Mobasher.
scholarly articles and often focus on their area of expertise. In              Adapting k-nearest neighbor for tag recommendation in
fact we can see in Figures 4 and 5 the dramatic difference between             folksonomies. Intelligent Techniques for Web Personalization
PopRes and PopUser.                                                            & Recommender Systems, 2009.
   In Delicious PopRes outperforms PopUser, whereas in Citeu-              [8] J. Gemmell, A. Shepitsen, B. Mobasher, and R. Burke.
like the opposite is true. The user’s focus on a narrow subject                Personalization in Folksonomies Based on Tag Clustering.
area in Citeulike make the user-tag channel a informative predictor,           Intelligent Techniques for Web Personalization &
whereas the topic variety in the profiles of Delicious users make the          Recommender Systems, 2008.
resource-tag channel more reliable.                                        [9] J. Gemmell, A. Shepitsen, B. Mobasher, and R. Burke.
   This analysis is underscored by the success KNN_RT hybrid has               Personalizing navigation in folksonomies using hierarchical
on the Delicious datasets where PopUser hybrid fairs poorly. Be-               tag clustering. In Proceedings of the 10th international
cause KNN_RT focuses on those tags applied to resources similar                conference on Data Warehousing and Knowledge Discovery.
to the query resource it offers context appropriate tags. In Citeu-            Springer, 2008.
like, where users have a narrow focus, this context provides little       [10] S. A. Golder and B. A. Huberman. Usage patterns of
additional benefit and the PopUser hybrid performs nearly as well              collaborative tagging systems. Journal of Information
as the KNN_RT hybrid. Bibsonomy users tags both citations and                  Science, 32(2):198, 2006.
web pages; its results fall between those of the other two datasets.      [11] P. Heymann, D. Ramage, and H. Garcia-Molina. Social tag
                                                                               prediction. In SIGIR ’08: Proceedings of the 31st annual
7.    CONCLUSIONS                                                              international ACM SIGIR conference on Research and
    We have demonstrated that tag recommenders may be combined                 development in information retrieval, pages 531–538, New
to form weighted hybrids that perform better than either performs              York, NY, USA, 2008. ACM.
alone. Moreover FolkRank one of the most successful tag recom-            [12] A. Hotho, R. Jaschke, C. Schmitz, and G. Stumme.
menders to date can be augmented with item-based collaborative                 Information retrieval in folksonomies: Search and ranking.
filtering to produce superior results. The resource-resource and per-          Lecture Notes in Computer Science, 4011:411, 2006.
[13] R. Jaschke, L. Marinho, A. Hotho, L. Schmidt-Thieme, and        [28] T. Vander Wal. Folksonomy definition and wikipedia.
     G. Stumme. Tag Recommendations in Folksonomies.                      vanderwal. net, 2005.
     LECTURE NOTES IN COMPUTER SCIENCE, 4702:506,                    [29] J. Voss. Tagging, Folksonomy & Co-Renaissance of Manual
     2007.                                                                Indexing? Arxiv preprint cs/0701072, 2007.
[14] M. Lipczak. Tag recommendation for folksonomies oriented        [30] R. Wetzker, W. Umbrath, and A. Said. A hybrid approach to
     towards individual users. In Proceedings of the                      item recommendation in folksonomies. In ESAIR ’09:
     ECML/PKDD 2008 Discovery Challenge Workshop, part of                 Proceedings of the WSDM ’09 Workshop on Exploiting
     the European Conference on Machine Learning and                      Semantic Annotations in Information Retrieval, pages 25–29,
     Principles and Practice of Knowledge Discovery in                    New York, NY, USA, 2009. ACM.
     Databases, 2008.                                                [31] Z. Xu, Y. Fu, J. Mao, and D. Su. Towards the semantic web:
[15] G. Macgregor and E. McCulloch. Collaborative tagging as a            Collaborative tag suggestions. Collaborative Web Tagging
     knowledge organisation and resource discovery tool. Library          Workshop at WWW2006, Edinburgh, Scotland, May, 2006.
     Review, 55(5):291–300, 2006.
[16] A. Mathes. Folksonomies-Cooperative Classification and
     Communication Through Shared Metadata. Computer
     Mediated Communication, (Doctoral Seminar), Graduate
     School of Library and Information Science, University of
     Illinois Urbana-Champaign, December, 2004.
[17] P. Mika. Ontologies are us: A unified model of social
     networks and semantics. Web Semantics: Science, Services
     and Agents on the World Wide Web, 5(1):5–15, 2007.
[18] R. Y. Nakamoto, S. Nakajima, J. Miyazaki, S. Uemura, and
     H. Kato. Investigation of the effectiveness of tag-based
     contextual collaborative filtering in website recommendation.
     In Advances in Communication Systems and Electrical
     Engineering, pages 309–318. Springerlink, 2008.
[19] R. Y. Nakamoto, S. Nakajima, J. Miyazaki, S. Uemura,
     H. Kato, and Y. Inagaki. Reasonable tag-based collaborative
     filtering for social tagging systems. In WICOW ’08:
     Proceeding of the 2nd ACM workshop on Information
     credibility on the web, pages 11–18, New York, NY, USA,
     2008. ACM.
[20] A. Plangprasopchok and K. Lerman. Exploiting social
     annotation for automatic resource discovery. CoRR,
     abs/0704.1675, 2007.
[21] G. Salton, A. Wong, and C. Yang. A vector space model for
     automatic indexing. Communications of the ACM,
     18(11):613–620, 1975.
[22] C. Schmitz, A. Hotho, R. Jaschke, and G. Stumme. Mining
     association rules in folksonomies. In Proc. IFCS 2006
     Conference, pages 261–270. Springer, 2006.
[23] B. Sigurbjörnsson and R. van Zwol. Flickr tag
     recommendation based on collective knowledge. pages
     327–336, 2008.
[24] B. Sigurbjörnsson and R. van Zwol. Flickr tag
     recommendation based on collective knowledge. In WWW
     ’08: Proceeding of the 17th international conference on
     World Wide Web, pages 327–336, New York, NY, USA,
     2008. ACM.
[25] Y. Song, Z. Zhuang, H. Li, Q. Zhao, J. Li, W.-C. Lee, and
     C. L. Giles. Real-time automatic tag recommendation. In
     SIGIR ’08: Proceedings of the 31st annual international
     ACM SIGIR conference on Research and development in
     information retrieval, pages 515–522, New York, NY, USA,
     2008. ACM.
[26] K. H. L. Tso-Sutter, L. B. Marinho, and L. Schmidt-Thieme.
     Tag-aware recommender systems by fusion of collaborative
     filtering algorithms. In SAC ’08: Proceedings of the 2008
     ACM symposium on Applied computing, pages 1995–1999,
     New York, NY, USA, 2008. ACM.
[27] C. Van Rijsbergen. Information Retrieval.
     Butterworth-Heinemann Newton, MA, USA, 1979.