Improving FolkRank With Item-Based Collaborative Filtering Jonathan Gemmell, Thomas Schimoler, Maryam Ramezani, Laura Christiansen, Bamshad Mobasher Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA {jgemmell, tschimo1, mramezani, lchris10, mobasher}@cdm.depaul.edu ABSTRACT photographs and music respectively. Other less popular tagging Collaborative tagging applications allow users to annotate online applications serve niche communities enabling users to tag blogs, resources. The result is a complex tapestry of interrelated users, re- business documents or scholarly articles. sources and tags often called a folksonomy. Folksonomies present At the heart of collaborative tagging is the post; a user describes an attractive target for data mining applications such as tag recom- a resource with a set of tags. A collection of posts results in a com- menders. A challenge of tag recommendation remains the adapta- plex network of interrelated users, resources and tags commonly tion of traditional recommendation techniques originally designed referred to as a folksonomy [16]. Users are able to navigate this to work with two dimensional data. To date the most successful network free from a rigid conceptual hierarchy. recommenders have been graph based approaches which explicitly Despite the freedom users enjoy, the size of a folksonomy often connects all three components of the folksonomy. hampers the userŠs exploration. Data mining applications such as In this paper we speculate that graph based tag recommenda- recommenders can assist the user by reducing a burdensome num- tion can be improved by coupling it with item-based collaborative ber of items to a smaller collection related the user’s interests. In filtering. We motive this hypothesis with a discussion of informa- this work we focus on tag recommendation, the suggestion of tags tional channels in folksonomies and provide a theoretical explana- during the annotation process. tion of the additive potential for item-based collaborative filtering. Tag recommendation reduces the cognitive effort from genera- We then provided experimental results on hybrid tag recommenders tion to recognition. Users are therefore encouraged to tag more built from graph models and other techniques based on popularity, frequently, apply more tags to a resource, reuse common tags and user-based collaborative filtering and item-based collaborative fil- use tags the user had not previously considered. User error is re- tering. duced by eliminating capitalization inconsistencies, punctuation er- We demonstrate that a hybrid recommender built from a graph rors, misspellings and other discrepancies. The final result is a based model and item-based collaborative filtering outperforms its cleaner denser dataset that is useful in its own right or for further constituent recommenders. Furthermore the inability of the other data mining applications. recommenders to improve upon the graph-based approach suggests Despite the richness offered by folksonomies, they also present that they offer information already included in the graph based unique challenges for tag recommenders. Traditional recommen- model. These results confirm our conjecture. We provide exten- dation strategies, often developed to work with two dimensional sive evaluation of the hybrids using data collected from three real data, must be adapted to work with the three dimensional nature world collaborative tagging applications. of folksonomies. Otherwise they risk disregarding potentially use- ful information. To date the most successful tag recommenders are graph-based models, which exploits the user-defined links between 1. INTRODUCTION the users, resources and tags. Collaborative tagging has emerged as a popular method for or- In this work we propose augmenting the graph based approach ganizing and sharing online content with user-defined keywords. with item-based collaborative filtering. We offer a discussion of in- Delicious1 , Flickr2 and Last.fm3 are among the most popular des- formation channels in folksonomies to motivate this proposal. The tinations on the Web allowing users to annotate bookmarks, digital graph based model covers the user-resource, user-tag, and resource- tag channels. Item-based collaborative filtering, on the other hand, 1 focuses on tags previously applied by the user to resources simi- delicious.com 2 www.flickr.com lar to the query resource. It therefore includes resource-resource 3 www.last.fm information not explicitly contained in the graph model. Addition- ally, the user-tag information utilized by item-based collaborative filtering is more oriented to query resource. We construct hybrid tag recommenders composed of the graph Permission to make digital or hard copies of all or part of this work for models and other techniques including popularity models, user- personal or classroom use is granted without fee provided that copies are based collaborative filtering and item-based collaborative filtering. not made or distributed for profit or commercial advantage and that copies The graph based recommender coupled with item-based collabo- bear this notice and the full citation on the first page. To copy otherwise, to rative filtering produces better results than either produce alone, republish, to post on servers or to redistribute to lists, requires prior specific strengthening our theory that that item-based collaborative filtering permission and/or a fee. RecSys’09, October 22–25, 2008, New York City, New York. contains information that is absent in the graph based model. More- Copyright 2009 ACM 978-1-60558-093-7/08/10 ...$5.00. over the other hybrids do not improve upon the graph based model as coherent topics can aid in the personalization of search and navi- suggesting that the information they contain are already adequately gation. Further support for the utility of clustering is offered in [4] represented by the graph based approach. where improvement in search through clustering is theorized. In The rest of this paper is organized as follows. In Section 2 we [7] we adapted K-Nearest Neighbor for tag recommendation and describe related works. A brief survey of the tag recommenders we showed incorporating user tagging habits into recommendation can employ in our experiments is given in Section 3. The use of hybrid improve K-Nearest Neighbor. recommenders is motivated in Section 4 where we discuss infor- General criteria for a good tagging system including high cover- mational channels in folksonomies. Section 5 details how tag rec- age of multiple channels, high popularity and least-effort are pre- ommenders may be compounded to produce hybrid recommenders. sented in [31]. They categorize tags as content-based tags, context- Our experimental evaluation is presented in Section 6, including a based tags, attribute tags, subjective tags, and organizational tags description of our datasets, our methodology and a discussion of and use a probabilistic method to recommend tags. In [2] the au- our findings. Finally in Section 7 we present our conclusions and thors propose a classification algorithm for tag recommendation. lay a foundation for future work. Semantic tag recommendation systems in the context of a seman- tic desktop are explored in [1]. Clustering to make real-time tag 2. BACKGROUND AND RELATED WORK recommendation is developed in [25]. The term folksonomy was coined by [28], a play on folk and taxonomy. While the term is new, [29] argues that collaborative 3. TAG RECOMMENDATION tagging in merely a renaissance of manual indexing. However, the Here we first provide a model of folksonomies, then review sev- scope and connectivity of the Internet permits tagging to rise to a eral common recommendation techniques which we employ in our level heretofore unrealized. evaluation. A folksonomy can be described as a four-tuple: In [16] the attractiveness of tagging is outlined: serendipitous browsing, a low entry cost, utilizing the wisdom of the crowd, and D = hU, R, T, Ai (1) a sense of community. Moreover, he argues that tagging allows ob- jects to be categorized under multiple tags, unfettered from tradi- where, U is a set of users; R is a set of resources; T is a set of tional taxonomies. He also discusses two obstacles: tag ambiguity tags; and A is a set of annotations, represented as user-tag-resource in which a tag has several meanings and tag redundancy in which triples: several tags have the same meaning. As collaborative tagging applications have gained in popular- A ⊆ {hu, r, ti : u ∈ U, r ∈ R, t ∈ T } (2) ity researchers have explored and characterized the tagging phe- A folksonomy can, therefore, be viewed as a tripartite hyper- nomenon. In [15] and [10] the authors studied the information graph [17] with users, tags, and resources represented as nodes and dynamics of Delicious, one of the most popular folksonomies. The the annotations represented as hyper-edges connecting a user, a tag authors discussed how tags have been used by individual users over and a resource. time and how tags for an individual resource stabilize over time. In Aggregate projections of the data can be constructed, reducing [15] the authors provide an overview of the phenomenon and offer the dimensionality but sacrificing information [22]. The relation reasons why both folksonomies and taxonomies will have a place between resources and tags, RT , can be formulated such that each in the future of information access. entry, RT (r, t), is the weight associated with the resource, r, and There have been many recent research investigations into recom- the tag, t. This weight may be binary, merely showing that one or mendation within folksonomies. Unlike traditional recommender more users have applied that tag to the resource. In this work we systems which have a two-dimensional relation between users and assume RT (r, t) to be the number of users that have applied t to items, tagging systems have a three dimensional relation between the r: users, tags and resources. Recommender systems can be used to recommend each of the dimensions based on one or two of the other dimensions. In [26] the authors apply user-based and item-based RTtf (r, t) = |{a = hu, r, ti ∈ A : u ∈ U }| (3) collaborative filtering to recommend resources in a tagging system Analogous two-dimensional projections can be constructed for and uses tags as an extension to the user-item matrices. Tags are UT in which the weights correspond to users and tags, and UR in used as context information to recommend resources in [19] and which the weights correspond to users and resources. [18]. Many authors have attempted to exploit the data model for rec- In [13] user-based collaborative filtering is compared to a graph- ommendation in folksonomies. In traditional recommendation al- based recommender based on the PageRank algorithm for tag rec- gorithms the input is often a user, u, and the output is a set of items, ommendation. The authors in [11] use association rules to rec- I. Tag recommendation differs in that the input is both a user and ommend tags and introduce an entropy-based metric to define how a resource. The output remains a set of items, in this case a set predictable a tag is. In [14] the title of a resource, the posts of a of recommended tags, Tr . Given a user-resource pair, the recom- resource and the user’s vocabulary are used to recommend tags. mendation set is constructed by calculating a weight for each tag, User-defined tags and co-occurrence are employed by [24] to w(u, r, t), and recommending the top n tags. recommend tags to users on Flickr. The assumption is that the user has already assigned a set of tags to a photo and the recommender 3.1 Popularity Based Approaches uses those tags to recommend more tags. The authors in [6] have We consider two popularity based models which rely on the fre- completed a similar study and introduce a classification for tag rec- quency a tag is used. PopRes ignores the user and relies on the ommendation. Probabilistic models have been used in recommen- popularity of a tag within the context of a particular resource. We dation in folksonomies in [20] and [30]. Moreover, [20] uses Prob- define the resource based popularity measure as: abilistic Latent Semantic Analysis for resource discovery and [30] uses single aspect PLSA for tag recommendation. |{a = hu, r, ti ∈ A : u ∈ U }| Previously, in [8, 9], we demonstrated how tag clusters serving w(u, r, t) = (4) |{a = hu, r, ti ∈ A : u ∈ U, t ∈ T }| PopUser, on the other hand, ignores the resource and focuses on the frequency of a tag within the user profile. We define the user based popularity measure as: |{a = hu, r, ti ∈ A : r ∈ R}| w(u, r, t) = (5) |{a = hu, r, ti ∈ A : r ∈ R, t ∈ T }| Popularity based recommenders require little online computa- tion. Models are built offline and can be incrementally updated. However both these models focus on a single channel of the folk- sonomy and may not incorporate otherwise relevant information into the recommendation. 3.2 User-Based Collaborative Filtering User-based K-nearest neighbor is a commonly used recommen- dation algorithm in Information Retrieval that can be modified for use in folksonomies. Applications may model users by recency, Figure 1: Informational channels of a folksonomy. authority, linkage or vector space models. In this work we focus on the vector space model [21] and describe the user as a vector over either the tag space or the resource space. PS KNN_UT models the user, u, as a vector over the set of tags s sim(s, r) ∗ d(u, s, t) w(u, r, t) = (7) where the weight in each dimension corresponds to the occurrence k of the tag in the user profile as it is defined by the two dimensional where d(u, s, t) will equal 1 if the user has applied t to s and 0 projection U T (u, t). Other methods may be used to model the otherwise. This recommender focuses entirely on the user’s tagging user, such as a vector over the set of resources or a combination habits. Unlike the user-based filtering methods, it may be able to of tags and resources. Several techniques may be used to calculate identify tags that are common to the user but rarely used by others. the similarity between vectors such as Jaccard similarity or cosine However, it lacks the ability to discover relevant tags from other similarity [27]. In this work we rely on cosine similarity. users. Depending on the size of the user profile, this recommender Using the similarity measure a neighborhood, N , of the k most will also scale well to larger datasets, particularly if the resource- similar users is constructed such that they have all previously an- resource similarity matrix if calculated offline. notated the query resource, r. A weight for each tag is calculated as: 3.4 FolkRank FolkRank was proposed in [12]. It computes a PageRank vector PN from the tripartite graph of the folksonomy. This graph is generated n sim(u, n) ∗ d(n, r, t) w(u, r, t) = (6) by regarding U ∪ R ∪ T as the set of vertices. Edges are defined k by the three two-dimensional projections of the hyper-graph, RT , where d(n, r, t) is 1 if the neighbor, n, has annotated the query U R and U T . resource, r, with the tag t. Otherwise it is 0. If we regard the adjacency matrix of this graph, W , (normalized Traditional user-based collaborative filtering requires a compar- to be column-stochastic), a damping factor, d, and a preference vec- ison between the query user and every other user. However, since tor, p, then we iteratively compute the PageRank vector, w, in the the adapted algorithm considers only those users that have anno- usual manner: w = dAw + (1 − d)p. tated the query resource, the number of similarities to calculate is However due to the symmetry inherent in the graph, this basic drastically reduced. The popularity of resources in folksonomies PageRank may focus too heavily on the most popular elements. follows the power law and the great majority of resources will ben- The FolkRank vector is taken as a difference between two compu- efit from this reduced reduction in computation, while a few will tations of PageRank: one with and one without a preference vector. require additional computational effort. As a result the algorithm Tag recommendations are generated by biasing the preference vec- scales well with large datasets. tor towards the query user and resource [13]. These elements are However, since the algorithm relies on the collaboration of other given a substantial weight while all other elements have uniformly users it may be the case that a tag cannot be recommended because small weights. it does not appear in a neighbor’s profile. While the personalization PageRank has proven to be one of the top performing tag recom- offered by user-based filtering is an important component for the menders. However, it imposes steep computational costs. recommender, it lacks the ability to reflect the habits and patterns of the larger crowd. 4. INFORMATIONAL CHANNELS 3.3 Item-Based Collaborative Filtering OF FOLKSONOMIES KNN_RT models resources as a vector over the tag space. Give The model of a folksonomy suggests several informational chan- a resource and a tag, we define the weight as the entry of the two nels which may be exploited by data mining applications such as dimensional projection, RT (r, t), the number of times r has been tag recommenders. The relation between users, resources and tags tagged with t. When a user selects a resource to annotate, the co- generate a complex network of interrelated items as shown in Fig- sine similarity between it and every resource in the user profile is ure 1. calculated. A neighborhood of the k most similar resources, S, is The channel between resources and tags reveals a highly descrip- then constructed. We then define the item-based collaborative fil- tive model of the resources. The accumulation of many users’ opin- tering measure as: ions (often numbered in the thousands or millions) results in a rich- which combine pairs of recommenders in a linear model. Each model is trained separately. Given a user, u, and a resource, r, the hybrid queries both components for each tag in the folksonomy. The results is W (u, r, t) which contains the weights for all tags. In order to ensure that weight assignments for each recommenda- tion approach are on the same scale, we normalize the weights in W (u, r, t) to 1 producing W 0 (u, r, t). Originally, these weights were used to select the top n items for the recommendation set. In this case, however, the weights are combined in a linear model as: w(u, r, t) = β ∗ wa0 (u, r, t) + α ∗ wb0 (u, r, t) (8) where β = 1 − α. These coefficients are used to control the contri- Figure 2: The effect of k in KNN_UT on recall and precision for bution of the two recommenders. When α is set to 0, recommender a recommendation set of 5 tags. Users are modeled as a vector a acts alone. In the case that α is set to 0.5, each recommender over the tag space. contributes equally to the final weight. For each hybrid, α must be empirically tuned to achieve the maximum synergy between the components. The tags are then resorted by the new weight, and the ness which taxonomies are unable to approximate. Conversely the top n tags are recommended for the annotation. tags themselves are characterized by the resources to which they have been assigned. 6. EXPERIMENTAL EVALUATION As users annotate resource with tags they define their interests in In this section we describe the methods used to gather and pre- as much as they describe a resource. The user-tag channel therefore process our datasets. Our testing methodology is outlined. We pro- reveals the users’ interests and provides opportunities for data min- vide a discussion of how we tuned variables for each algorithm and ing algorithms to offer a high degree of personalization. Likewise describe the experiments on the weighted hybrid recommenders. a user may be defined by the resources which he has annotated as Finally, we discuss our observations. in the user-resource channel. These primary channels can be used to produce secondary in- 6.1 Datasets formational channels. The user-user channel can be constructed by modeling users as a vector of tags or as a vector of resources Folksonomy Delicious (5%) Citeulike Bibsonomy and applying a similarity measure such as cosine similarity. Many Users 7,665 2,051 357 variations exist. However the result reveals a network of users that Resources 15,612 5,376 1,738 can be explored directly or incorporated into further data mining Tags 5,746 3,343 1,573 approaches. The resource-resource and tag-tag channels provide Posts 720,788 42,278 19,909 similar utility, presenting navigational opportunities for users to ex- Annotations 2,762,235 105,873 54,848 plore similar resources or neighborhoods of tags. The success of tag recommenders hinge on their ability to incor- Table 1: Datasets porate all of these informational channels. A simple recommender such as PopRes focuses only on the tag-resource channel, whereas PopUser includes only the information between tags and users. We provide an extensive evaluation of the hybrid recommenders Collaborative filtering techniques include additional channels but using data from three real collaborative tagging applications: Deli- increase the computational overhead. KNN_UT discovers a set of cious, Citeulike, and Bibsonomy. neighbors, thereby covering the user-user channel. It then focuses on tags those neighbors applied to the query resource covering the 6.1.1 P -Core Processing user-resource and resource-tag channels. FolkRank, on the other By P -core processing users, resources and tags are removed hand, explicitly defines the relation between users, resources and from the dataset in order to produce a residual dataset that guaran- tags in its adjacency matrix. While FolkRank has proven to be tees each user, resource and tag occur in at least p posts [3]. Here among most effective tag recommenders, augmenting it with algo- we define a post to include a user, a resource, and every tag the user rithms that incorporate complimentary informational channels may has applied to the resource. improve its performance. By removing infrequent users, resources and tags noise in the data is reduced. Uncommon items whether they be tags used by only a few users, unpopular resources, or inactive users are elim- 5. HYBRID RECOMMENDERS inated from consideration. Because of their scarcity these are the The multiple informational channels of folksonomies present an very items likely to confound recommenders. Moreover by elim- attractive target for hybrid recommenders. Hybrids combine sev- inating infrequent items the size of the dataset is dramatically re- eral recommenders together to produce a new recommender. The duced allowing the application of data mining techniques that might constituent recommenders are freed from the burden of the cover- otherwise be computationally impractical. ing all the available informational channels and may instead focus on only a few. The hybrid then ties these recommenders together. 6.1.2 Delicious A successful hybrid creates a synergistic blend of its constituent Delicious is a popular collaborative tagging application in which parts producing superior results that they could not achieve alone. users annotate URLs. On 10/19/2008, 198 of the most popular tags In this paper we focus on weighted hybrid recommenders [5] were taken from the user interface. For each of these tags the 2,000 Figure 3: The effect of alpha on the hybrid recommenders on the Delicious, Citeulike and Bibsonomy datasets. Results are shown using recall and precision on a recommendation set of five tags. most recent annotations including the contributors of the annota- recommendation strategies. Some tag recommendation techniques tions were collected. The social network for these contributors was such as FolkRank are so computational intensive that larger sam- explored recursively collecting 524,790 usernames. plings of the data are not feasible. In order to best compare the From 10/20/2008 to 12/15/2008 the complete profiles of the users recommenders, the 5% sampling was used on all reported experi- were collected. Each user profile consisted of a collection of anno- ments. A P -core of 20 was taken from the sample and is reported tations including the resource, tags and date of the original book- in Table 1. mark. The top 100 most prolific users were visually inspected; twelve were removed from the data because their annotation count 6.1.3 Citeulike was many orders of magnitude larger than other users and were Citeulike is a popular online tool used by researchers to manage therefore suspected to be Web-bots. and discover scholarly references. They make their dataset freely Due to memory and time constraints, 5% of the user profiles was available to download4 . On 2/17/2009 the most recent snapshot randomly selected. Still this dataset remains far larger than either was downloaded. The data contains anonymous user ids and posts the following Bibsonomy or Citeulike datasets. Experiments on for each user including resources, the date and time of the posting larger samplings reveal near identical trends for several of the tag 4 www.citeulike.org/faq and the tags applied to the resource. A P -core of 5 was taken. The characteristics of the dataset are described in Table 1. 6.1.4 Bibsonomy This dataset was provided by Bibsonomy5 for use in the Euro- pean Conference on Machine Learning and Principles and Prac- tice of Knowledge Discovery in Databases (ECML-PKDD) 2009 Challenge. Bibsonomy was originally launched as a collaborative tagging application allowing users to organize and share scholarly references. It has since expanded its scope allowing users to anno- tate URLs. The data includes all public bookmarks and publication posts of Bibsonomy until 2009-01-01. The data was cleaned by remov- ing all characters which are neither numbers nor letters from tags. Additionally the system tags imported, public, systemimported, nn Figure 4: A comparison of tag recommender techniques in De- and systemunfiled where removed. A P -core of 5 was used. Table licious. 1 relates the features of the dataset. 6.2 Experimental Methodology We have adopted the test methodology as described in [13]. In this approach, called LeavePostOut, a single post is randomly re- moved from each user’s profile. The training set is then comprised of the remaining posts, while the test set contains one post per user. Each test case consists of a user, u, a resource, r, and all the tags the user has applied to that resource. These tags, Th , are analogous to the holdout set commonly used in Information Retrieval. The tag recommendation algorithms accept the user-resource pair and return an ordered set of recommended tags, Tr . For evaluation we adopt the common recall are precision mea- sures as is common in Information Retrieval. Recall measures the percentage of items in the holdout set that appear in the recommen- dation set. It is a measure of completeness and is defined as: Figure 5: A comparison of tag recommender techniques in Ci- teulike. r = |Th ∩ Tr |/|Th | (9) Precision measures the percentage of items in the recommenda- tion set that appear in the holdout set. It measures the exactness of the recommendation algorithm and is defined as: p = |Th ∩ Tr |/|Tr | (10) For each evaluation metric the average value is calculated across all test cases. 6.3 Experimental Results Here we present our experimental results beginning with the tun- ing of variables. The experiments with user-based collaborative fil- tering require the tuning of k, the number of neighbors. Figure 2 shows the relation between k and the evaluation met- rics recall and precision for a recommendation set of size 5. The Figure 6: A comparison of tag recommender techniques in Bib- Delicious dataset was used for this experiment. As k increases so sonomy. does recall and precision. However this improvement suffers from diminishing returns until a k of 50 offers little more benefit than a k of 20. This trend was observed for K-Nearest Neighbor ex- other four recommenders. The left hand side of each graph shows periments in the other two datasets as well. As such, all KNN_UT the hybrid recommenders when α is set to 0 in which case FolkRank experiments were completed using a k of 20. dominates the hybrid. As α increases more weight is given to the Item-based collaborative filtering also requires the tuning of k, other recommenders until finally when α reaches 1, FolkRank plays in this case the number of similar resources in the user profile to no part in the recommendation. include in the neighborhood. After empirical analysis we found 15 For all datasets, item-based collaborative filtering contributes to to produce the best performance on all datasets. recall and precision of its hybrid. For example in the Delicious Figure 3 shows the tuning of α for the hybrid recommenders. experiment when α is set to 0.4, recall for a recommendation set Each hybrid is a linear combination of FolkRank and one of the of five tags is 6% higher than FolkRank achieves alone and 13% 5 www.bibsonomy.org higher than KNN_RT achieves alone. In the Delicious experiments, a hybrid built with PopUser offers sonalized user-resource channels covered by item-based collabora- a slight improvement, while it has a more dramatic improvement on tive filtering compliment the channels utilized by FolkRank. The in- Citeulike. These observations reveal that the personalization of the ability of other recommenders to improve upon FolkRank provides user-tag channel strongly incorporated into KNN_RT and PopUser evidence that FolkRank sufficiently incorporates the informational offers information lacking in FolkRank. While PopUser boosts all channels covered by those recommenders. of the user’s tags, KNN_RT focuses on tags related to the resource Future work will involve investigating alternative hybrid tag rec- being annotated accounted for its increased performance. On the ommenders. New recommenders that cover other informational other hand PopRes does not appear to provide any additional benefit channels will be considered. Finally, alternative methods for hy- to FolkRank. Indeed, FolkRank contains this information in the bridizing recommenders will be explored. utilization of the RT matrix. These two results reveal that the weights given to the query re- 8. ACKNOWLEDGMENTS source and query user in the FolkRank algorithm achieve different This work was supported in part by the National Science Foun- results. The weight applied to the resource immediately activates dation Cyber Trust program under Grant IIS-0430303 and a grant tags strongly associated with the resource. The result is similar to from the Department of Education, Graduate Assistance in the Area that achieved in PopRes, hence PopRes offers little assistance to its of National Need, P200A070536. hybrid. However, the weight applied to the query user disperses through the graph activating all of the user’s tags relevant or irrele- vant to user’s present context. KNN_RT, on the other hand, focuses 9. REFERENCES [1] B. Adrian, L. Sauermann, and T. Roth-Berghofer. Contag: A on tags applied to resources similar to the query. Hence, it includes semantic tag recommendation system. In T. Pellegrini and the resource-resource channel missing in FolkRank. The hybrid is S. Schaffert, editors, Proceedings of I-Semantics’ 07, pages able to be personalized but also be more context specific. pp. 297–304. JUCS, 2007. KNN_UT does not appear to offer any additional information [2] P. Basile, D. Gendarmi, F. Lanubile, and G. Semeraro. that FolkRank did not already contain, even though it includes user- Recommending smart tags in a social bookmarking system. resource information in the neighborhood selection, user-resource In Bridging the Gep between Semantic Web and Web 2.0 information in the cosine similarity and resource-tag information in (SemNet 2007), pages 22–29, 2007. the recommendation step. This reveals that the way in which the in- formational channels is equally important. Additionally KNN_UT [3] V. Batagelj and M. Zaveršnik. Generalized cores. Arxiv selects neighbors that are similar to the query user, utilizing the preprint cs/0202039, 2002. user-user channel. However, this channel does not appear to be [4] G. Begelman, P. Keller, and F. Smadja. Automated Tag beneficial to tag recommendation. Clustering: Improving search and exploration in the tag After analysis of the effect of α on the hybrids we selected the space. Proceedings of the Collaborative Web Tagging best α for the FolkRank-KNN_RT hybrid. For Delicious we used an Workshop at WWW, Volume 6, 2006. α of 0.4. For Citeulike and Bibsonomy used an α of 0.5. Figures [5] R. Burke. Hybrid recommender systems: Survey and 4 through 6 compare tag recommenders along with the hybrid. Re- experiments. User Modeling and User Adapted Interaction, call and precision are plotted for recommendation sets of size one 12(4):331–370, 2002. through ten. For all datasets the hybrid outperforms its constituent [6] N. Garg and I. Weber. Personalized, interactive tag parts. recommendation for flickr. In RecSys ’08: Proceedings of the We also observe a difference in the effect that constituent recom- 2008 ACM conference on Recommender systems, pages menders have across the datasets. Delicious users tag Web pages 67–74, New York, NY, USA, 2008. ACM. and their topics cover a wide array of topics. Citeulike users tag [7] J. Gemmell, T. Schimoler, M. Ramezani, and B. Mobasher. scholarly articles and often focus on their area of expertise. In Adapting k-nearest neighbor for tag recommendation in fact we can see in Figures 4 and 5 the dramatic difference between folksonomies. Intelligent Techniques for Web Personalization PopRes and PopUser. & Recommender Systems, 2009. In Delicious PopRes outperforms PopUser, whereas in Citeu- [8] J. Gemmell, A. Shepitsen, B. Mobasher, and R. Burke. like the opposite is true. The user’s focus on a narrow subject Personalization in Folksonomies Based on Tag Clustering. area in Citeulike make the user-tag channel a informative predictor, Intelligent Techniques for Web Personalization & whereas the topic variety in the profiles of Delicious users make the Recommender Systems, 2008. resource-tag channel more reliable. [9] J. Gemmell, A. Shepitsen, B. Mobasher, and R. Burke. This analysis is underscored by the success KNN_RT hybrid has Personalizing navigation in folksonomies using hierarchical on the Delicious datasets where PopUser hybrid fairs poorly. Be- tag clustering. In Proceedings of the 10th international cause KNN_RT focuses on those tags applied to resources similar conference on Data Warehousing and Knowledge Discovery. to the query resource it offers context appropriate tags. In Citeu- Springer, 2008. like, where users have a narrow focus, this context provides little [10] S. A. Golder and B. A. Huberman. Usage patterns of additional benefit and the PopUser hybrid performs nearly as well collaborative tagging systems. Journal of Information as the KNN_RT hybrid. Bibsonomy users tags both citations and Science, 32(2):198, 2006. web pages; its results fall between those of the other two datasets. [11] P. Heymann, D. Ramage, and H. Garcia-Molina. Social tag prediction. In SIGIR ’08: Proceedings of the 31st annual 7. CONCLUSIONS international ACM SIGIR conference on Research and We have demonstrated that tag recommenders may be combined development in information retrieval, pages 531–538, New to form weighted hybrids that perform better than either performs York, NY, USA, 2008. ACM. alone. Moreover FolkRank one of the most successful tag recom- [12] A. Hotho, R. Jaschke, C. Schmitz, and G. Stumme. menders to date can be augmented with item-based collaborative Information retrieval in folksonomies: Search and ranking. filtering to produce superior results. The resource-resource and per- Lecture Notes in Computer Science, 4011:411, 2006. [13] R. Jaschke, L. Marinho, A. Hotho, L. Schmidt-Thieme, and [28] T. Vander Wal. Folksonomy definition and wikipedia. G. Stumme. Tag Recommendations in Folksonomies. vanderwal. net, 2005. LECTURE NOTES IN COMPUTER SCIENCE, 4702:506, [29] J. Voss. Tagging, Folksonomy & Co-Renaissance of Manual 2007. Indexing? Arxiv preprint cs/0701072, 2007. [14] M. Lipczak. Tag recommendation for folksonomies oriented [30] R. Wetzker, W. Umbrath, and A. Said. A hybrid approach to towards individual users. In Proceedings of the item recommendation in folksonomies. In ESAIR ’09: ECML/PKDD 2008 Discovery Challenge Workshop, part of Proceedings of the WSDM ’09 Workshop on Exploiting the European Conference on Machine Learning and Semantic Annotations in Information Retrieval, pages 25–29, Principles and Practice of Knowledge Discovery in New York, NY, USA, 2009. ACM. Databases, 2008. [31] Z. Xu, Y. Fu, J. Mao, and D. Su. Towards the semantic web: [15] G. Macgregor and E. McCulloch. Collaborative tagging as a Collaborative tag suggestions. Collaborative Web Tagging knowledge organisation and resource discovery tool. Library Workshop at WWW2006, Edinburgh, Scotland, May, 2006. Review, 55(5):291–300, 2006. [16] A. Mathes. Folksonomies-Cooperative Classification and Communication Through Shared Metadata. Computer Mediated Communication, (Doctoral Seminar), Graduate School of Library and Information Science, University of Illinois Urbana-Champaign, December, 2004. [17] P. Mika. Ontologies are us: A unified model of social networks and semantics. Web Semantics: Science, Services and Agents on the World Wide Web, 5(1):5–15, 2007. [18] R. Y. Nakamoto, S. Nakajima, J. Miyazaki, S. Uemura, and H. Kato. Investigation of the effectiveness of tag-based contextual collaborative filtering in website recommendation. In Advances in Communication Systems and Electrical Engineering, pages 309–318. Springerlink, 2008. [19] R. Y. Nakamoto, S. Nakajima, J. Miyazaki, S. Uemura, H. Kato, and Y. Inagaki. Reasonable tag-based collaborative filtering for social tagging systems. In WICOW ’08: Proceeding of the 2nd ACM workshop on Information credibility on the web, pages 11–18, New York, NY, USA, 2008. ACM. [20] A. Plangprasopchok and K. Lerman. Exploiting social annotation for automatic resource discovery. CoRR, abs/0704.1675, 2007. [21] G. Salton, A. Wong, and C. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613–620, 1975. [22] C. Schmitz, A. Hotho, R. Jaschke, and G. Stumme. Mining association rules in folksonomies. In Proc. IFCS 2006 Conference, pages 261–270. Springer, 2006. [23] B. Sigurbjörnsson and R. van Zwol. Flickr tag recommendation based on collective knowledge. pages 327–336, 2008. [24] B. Sigurbjörnsson and R. van Zwol. Flickr tag recommendation based on collective knowledge. In WWW ’08: Proceeding of the 17th international conference on World Wide Web, pages 327–336, New York, NY, USA, 2008. ACM. [25] Y. Song, Z. Zhuang, H. Li, Q. Zhao, J. Li, W.-C. Lee, and C. L. Giles. Real-time automatic tag recommendation. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 515–522, New York, NY, USA, 2008. ACM. [26] K. H. L. Tso-Sutter, L. B. Marinho, and L. Schmidt-Thieme. Tag-aware recommender systems by fusion of collaborative filtering algorithms. In SAC ’08: Proceedings of the 2008 ACM symposium on Applied computing, pages 1995–1999, New York, NY, USA, 2008. ACM. [27] C. Van Rijsbergen. Information Retrieval. Butterworth-Heinemann Newton, MA, USA, 1979.