Adapting K-Nearest Neighbor for Tag Recommendation in Folksonomies Jonathan Gemmell, Thomas Schimoler, Maryam Ramezani, Bamshad Mobasher Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA {jgemmell, tschimoler, mramezani, mobasher}@cdm.depaul.edu Abstract users. Delicious1 supports users as they bookmark URLs. Flickr2 allows users to upload, share and manage online pho- Folksonomies, otherwise known as Collaborative Tagging tographs. Citeulike3 enables researchers to manage and dis- Systems, enable Internet users to share, annotate and search for online resources with user selected labels called tags. Tag cover scholarly references. Still other Collaborative Tagging recommendation, the suggestion of an ordered set of tags dur- Systems specialize in music, blogs and business documents. ing the annotation process, reduces the user effort from a At the core of Collaborative Tagging Systems is the post: keyboard entry to a mouse click. By simplifying the anno- a user describes a resource with a set of tags. These tation process tagging is promoted, noise in the data is re- tags may be descriptive (“Folksonomies”), subjective (“awe- duced through the elimination of discrepancies that result in some”), organizational (“toread”) or completely idiosyn- redundant tags, and ambiguous tags may be avoided. Tag rec- cratic (“jfgwh”). Taken in isolation an individual annotation ommenders can suggest tags that maximize utility, offer tags allows a user to organize web resources for later use: re- the user may not have previously considered or steer users sources can be easily sorted, aggregated and retrieved. Re- toward adopting a core vocabulary. In sum, tag recommenda- tion promotes a denser dataset that is useful in its own right sources may be annotated with multiple tags allowing a re- or can be exploited by a myriad of data mining techniques for source to be identified with several topic areas rather than additional functionality. pigeonholed in a single directory. Moreover users need not While there exists a long history of recommendation algo- conform to a predefined vocabulary or rigid hierarchy but rithms, the data structure of a Folksonomy is distinct from may annotate a resource with any tag they wish thereby re- those found in traditional recommendation problems. We first ducing user effort and limiting the entry cost. explore two data reduction techniques, p-core processing and However the utility of tagging extends beyond their im- Hebbian deflation, then demonstrate how to adapt K-Nearest mediate use. Taken as a whole, the aggregation of many Neighbor for use with Folksonomies by incorporating user, annotations results in a complex network of interrelated resource and tag information into the algorithm. We further users, resources and tags often referred to as a Folkson- investigate multiple techniques for user modeling required to omy (Mathes 2004). The opportunity to explore the Folk- compute the similarity among users. Additionally we demon- strate that tag boosting, the promoting of tags previously ap- sonomy unburdened by a preconceived navigational or con- plied by a user to a resource, improves the coverage and ac- ceptual hierarchy is crucial to the utility and popularity of curacy of K-Nearest Neighbor. Folksonomies. Users are able to share or discover resources These techniques are evaluated through extensive experimen- through the collaborative network and connect to other users tation using data collected from two real Collaborative Tag- with related interests. Collaborative Tagging Systems can ging Web sites. Finally the modified K-Nearest Neighbor identify groups of like-minded users, catering not only to algorithm is compared with alternative techniques based on mainstream but also to non-conventional users who are often popularity and link analysis. We find that K-Nearest Neigh- under-served by traditional Web tools. Furthermore, users bor modified for use with Folksonomies generates excellent may enjoy the social aspects of collaborative tagging, at- recommendations, scales well with large datasets, and is ap- tracted by a sense of community not offered by either on- plicable to both narrow and broadly focused Folksonomies. tologies or search engines. A distinct advantage of Folksonomies is the richness of Introduction the user profiles. If a user is interested enough in a resource to annotate it, the tag describes the user as much as it de- Folksonomies, also known as Collaborative Tagging Sys- scribes the resource. As users annotate resources, the system tems, have emerged as a powerful trend allowing Internet is able to track their interests. These profiles are a powerful users to share, annotate and explore online resources through tool for data mining algorithms. personalized labels. Several Collaborative Tagging Sys- tems have recently gained popularity attracting millions of 1 delicious.com 2 Copyright c 2009, Association for the Advancement of Artificial www.flickr.com 3 Intelligence (www.aaai.org). All rights reserved. www.citeulike.org Even though tags offer many benefits both in the short Folksonomies. The authors discussed how tags have been and long term, they also present unique challenges for rec- used by individual users over time and how tags for an in- ommendation algorithms. Most Collaborative Tagging Ap- dividual resource stabilize over time. They also discussed plications permit unsupervised tagging; users are free to use two semantic difficulties: tag redundancy, when multiple any tag they wish to describe a resource. This is often done tags have the same meaning, and tag ambiguity, when a sin- to reduce the entry cost of using the application and make gle tag has multiple meanings. (Macgregor and McCulloch collaborative tagging more user-friendly. Unsupervised tag- 2006) provide an overview of the phenomenon and explore ging can result in tag redundancy in which several tags have reasons why both Folksonomies and Ontologies will have a the same meaning or tag ambiguity in which a single tag place in the future of information access. has many different meanings. Such inconsistencies can con- There have been several recent research investigations found users as they attempt to utilize the Folksonomy. into recommendation within Folksonomies. Unlike tradi- Tag recommendation provides a means to overcome these tional recommender systems which have a two-dimensional problems. It reduces the user effort to a mouse click rather relation between users and items, tagging systems have than a keyboard entry. By reducing the effort users are en- a three dimensional relation between users, tags and re- couraged to tag more frequently, apply more tags to an in- sources. Recommender systems can be used to recommend dividual resource, reuse common tags, and perhaps use tags each of the dimensions based on one or two of the other the user had not previously considered. Moreover, user error dimensions. (Tso-Sutter, Marinho, and Schmidt-Thieme is reduced by eliminating redundant tags caused by capital- 2008) applies user-based and item-based collaborative fil- ization inconsistencies, punctuation errors, misspellings and tering to recommend resources in a tagging system and uses other discrepancies. The tag recommender can further pro- tags as an extension to the user-item matrices. (Nakamoto et mote a core tag vocabulary steering the user toward adopt- al. 2008a) and (Nakamoto et al. 2008b) use tags as context ing certain tags while not imposing any strict rules. The tag information to recommend resources. recommender may even avoid ambiguous tags in favor of Other researchers have studied tag recommendation in tags that offer greater information value. The final result is folksonomies. (Jaschke et al. 2007) compares user- a cleaner, denser dataset that is useful in its own right or for based collaborative filtering and a graph-based recom- further data mining techniques. mender based on the Pagerank algorithm to recommend per- However the data generated through Collaborative Tag- sonalized tags. (Heymann, Ramage, and Garcia-Molina ging differs from that common in recommendation algo- 2008) use association rules to recommend tags and intro- rithms. The introduction of tags manifests a third dimen- duce an entropy-based metric to find how predictable a tag sion which must be integrated into recommenders that tra- is. (Lipczak 2008) uses the title of a resource, the posts ditionally incorporate only two dimensions. In this paper of a resource and the user’s vocabulary to recommend tags. we demonstrate how K-Nearest Neighbor may be adapted The results show that tags retrieved from the user’s vocab- to recommend tags in Folksonomies. We describe how both ulary outperform recommendations driven by resource in- user and resource information can be directly applied in the formation. However the experiment was performed on data algorithm improving both coverage and accuracy while re- from Bibsonomy, a Folksonomy focused on scientific pub- ducing computational costs. In addition several user models lications, and thus might not be applicable to multi-domain are explored including vectors over the set of tags, vectors data that cover. over the set of resources, combinations of these two and fea- (Xu et al. 2006) presents general criteria for a good tures derived through Hebbian deflation. tagging system including high coverage of multiple facets, The rest of this paper is organized as follows. We be- high popularity and least-effort. They categorize tags to gin by presenting some related work involving the use of content-based tags, context-based tags, attribute tags, sub- recommendations in Folksonomies. We explore the data jective tags, and organizational tags and use a probabilistic structure of folksonomies and discuss two feature reduc- method to recommend tags. (Basile et al. 2007) proposes tion techniques: p-core processing and Hebbian deflation. a classification algorithm for tag recommendation. (Sig- We then outline the basic approach used for recommend- urbjörnsson and van Zwol 2008) uses a co-occurrence-based ing tags in Folksonomies and propose modifications to K- technique to recommend tags for photos in Flickr. The as- Nearest Neighbor. After a discussion of the datasets and a sumption is that the user has already assigned a set of tags description of the experimental methodology, we evaluate to a photo and the recommender uses those tags to recom- the proposed modifications using data collected from two mend more tags. (Adrian, Sauermann, and Roth-Berghofer real world Folksonomies. Finally, we compare the modified 2007) suggests a semantic tag recommendation system in algorithm with alternative strategies based on popularity and the context of a semantic desktop. (Song et al. 2008) uses link analysis. clustering to make real-time tag recommendation. Related Work Data Structures of Folksonomies As Collaborative Tagging Applications have gained in popu- In this section we define the three-dimensional data struc- larity researchers have started to explore and characterize the ture of a Folksonomy and describe how to construct two- tagging phenomenon. In (Macgregor and McCulloch 2006) dimensional projections. We further explore two data reduc- and (Golder and Huberman 2006) the authors studied the tion techniques for use with Folksonomies. The first is a data information dynamics of Delicious, one of the most popular selection technique, P -core processing, that reduces the size of the data by removing users, resources and tags that occur P -Core Processing less than a predefined number of times. The second is a data Through P -Core Processing users, resources and tags are re- extraction technique, Hebbian Deflation, which produces a moved from the dataset in order to produce a residual dataset new set of features from a two-dimensional projection of the that guarantees each user, resource and tag occur in at least p Folksonomy. posts (Batagelj and Zaveršnik 2002) (Jaschke et al. 2007). Here we define a post to include a user, a resource, and every Data Models tag the user has applied to the resource. The data structure of a Folksonomy differs from that com- The algorithm iterates through the posts, counting the oc- mon to most traditional recommendation algorithms. A currence of users, resources and tags. If the occurrence of Folksonomy can be described as a four-tuple D: these items does not meet the requisite value, p, all occur- rences of the item and the posts in which it appears are re- moved from the dataset. Since removing a post based on D = hU, R, T, Ai , (1) the occurrence of one item reduces the count for the other where, U is a set of users; R is a set of resources; T is a set items in the post, several passes through the dataset may be of tags; and A is a set of annotations, represented as user- required. The result is a smaller denser dataset. tag-resource triples: Several reasons exist to use P -Core Processing. By re- moving infrequent users, resources and tags noise in the data is reduced; uncommon items whether they be tags used by A ⊆ {hu, r, ti : u ∈ U, r ∈ R, t ∈ T } (2) only a few users, unpopular resources, or unproductive users A Folksonomy can, therefore, be viewed as a tripartite are eliminated from consideration. Because of their scarcity, hyper-graph (Mika 2007) with users, tags, and resources these are the very items likely to confound recommenders. represented as nodes and the annotations represented as Moreover, by eliminating infrequent items, the size of the hyper-edges connecting a user, a tag and a resource. data may be dramatically reduced allowing the application The tripartite nature of Folksonomies make it ill suited of the data mining techniques that would otherwise be com- for many traditional data mining techniques. User based K- putationally impractical. In short, P -Core Processing offers Nearest Neighbor for example requires a means to measure a means reduce noise and focus on the dense regions of the the similarity between users. The introduction of a third di- data. mension confounds this task. Aggregate projections of the data can be constructed, Hebbian Deflation reducing the dimensionality by sacrificing information. Whereas P -Core Processing offers a means to reduce the (Schmitz et al. 2006) The relation between resources and size of a dataset through feature selection, feature extraction tags can be formulated as a two-dimensional projection, RT , techniques generate entirely new features. Many feature ex- such that each entry, RT (r, t), is the weight associated with traction techniques exist such as Principle Component Anal- the resource, r, and the tag, t. This weight may be binary, ysis and Singular Value Decomposition. However, despite merely showing that one or more users have applied that tag the utility these techniques provide, their computational cost to the resource, or it may be finer grained using the number and memory requirements make them impractical for use on of users that have applied that tag to the resource: extremely large datasets common in Folksonomies. Hebbian Deflation (Oja and Karhunen 1985) offers a means to ap- RTtf (r, t) = |{a = hu, r, ti ∈ A : u ∈ U }| (3) proximate these feature extraction techniques with reduced computational and memory needs. Such a measure is equivalent to term frequency or tf com- Given a two-dimensional projection of the Folksonomy mon in Information Retrieval. Similarly, term frequency * and a preselected number of features, F, Hebbian Deflation inverse document frequency or tf*idf (Salton and Buckley produces two smaller matrices that approximate the origi- 1988) can be adapted for use with two-dimensional projec- nal projection. Consider for example the two-dimensional tions: projection RT; Hebbian Deflation will produce two new ma- trixes, RX and TX, such that: RTtf∗idf (r, t) = RTtf (r, t) ∗ log(|R|/nr ) (4) X F The tf*idf multiplies the tag frequency by the relative dis- RT (r, t) ≈ RXri ∗ T Xti (5) tinctiveness of the tag. The distinctiveness is measured by i=0 the log of the total number of resources, |R|, divided by the Since F is often far smaller than either the number of re- number of resources to which that tag was applied, nr . Sim- sources or the number of tags the resultant matrixes require ilar two-dimensional projections can be constructed for UT many orders of magnitude less space than the original pro- in which the weights correspond to users and tags, and UR jection. Analogous features can be extracted from UR and in which the weights correspond to users and resources. UT. While a portion of the information is lost through this pro- The algorithm requires many inputs: F , the number of cess the result is a two-dimensional matrix which can be features to be extracted; epochs the number of epochs to readily applied to existing data-mining algorithms such as train each feature; and lr the learning rate at which the fea- K-Nearest Neighbor. tures are adjusted. Features are extracted one at a time. To begin, the fea- the user had not considered. Moreover by eliminating the tures are set to random weights. Then, for each resource-tag keyboard entry tag recommendation reduces capitalization pair in RT the actual weight is compared with the current inconsistencies, punctuation errors, misspellings, and other approximation. The features are adjusted based upon the discrepancies that add noise to the data. degree of the error, the learning rate and the corresponding The recommendation of high quality tags benefits the user feature in the opposing matrix. These adjustments are re- beyond the annotation process itself. Resources are often peated for the predetermined number of epochs before the tagged for future reference; by providing the most relevant feature is finally adopted and the next feature is trained. tags retrieval of the resources are made easier. Moreover, Input: RT , a projection of a Folksonomy; F , the when resources are tagged in order to characterize content, number of features to derive; epochs, the the recommendation of clear descriptive tags can improve number of epochs to train each feature; lr, the the online experience for other users that are navigating the learning rate site. This is particularly relevant for resources that are not Output: RX and T X, the Hebbian features easily evaluated by computers such as photos, videos, and for f = 1 → features do music. for epoch = 1 → epochs do Tag recommendation may also be used to assert control foreach (r, t) ∈ RT do on tag usage without encumbering the user with a strict vo- P cabulary. Ambiguous tags such as can be avoided in pref- aprox = fi=1 RXri ∗ T Xti error = RT (r, t) − approx erence of less ambiguous tags Moreover, the recommender RX(r, f) += lr ∗ error ∗ T X(t, f) can offer tags with a higher level of detail. The overuse of T X(t, f) += lr ∗ error ∗ RX(r, f) redundant tags can also be thwarted by consistently recom- end mending highly used tags while eschewing their less used end counterparts. The result is cleaner denser dataset that is use- ful in own right for navigation, or for further data mining end techniques. return RX and T X; Algorithm 1: Hebbian Deflation Basic Recommendation Like many learning algorithms, Hebbian Deflation may result in overfitting. Overfitting occurs when the features In traditional recommendation algorithms the input is often over specialize in the training data at the expense of general- a user, u, and the output is a set of items, I. The user experi- ization. In order to combat this, a percentage of the training ence is improved if this set of items is relevant to the user’s data may be held out. At each epoch, the Root Mean Square needs. Error (RMSE), is calculated both for the ability of the fea- Tag recommendation in Folksonomies however differs in tures to approximate the training data and for their ability to that the input is both a user, u, and a resource, r. The output approximate the holdout set. If it is observed that the RMSE remains a set of items, in this case a recommended set of is rising for the holdout data even as it is dropping for the tags, Tr . One of the difficulties presented by tag recommen- training data, overfitting can be assumed. Then the training dation is the means to incorporate both user and resource of the current feature may be halted and the training of the information into the recommendation algorithm. next feature can begin. Perhaps the simplest recommendation strategy is merely Feature extraction through Hebbian Deflation offers many to recommend the most commonly used tags in the Folk- benefits. It may offer a means to represent the data in a sonomy. However such a strategy ignores both user and re- smaller space, but the features themselves may offer insights source information. into the data. Domain experts can analyze features for their Alternatively given a user-resource pair a recommender characteristics. Users may navigate over the reduced feature may ignore the user and recommend the most popular tags space rather than the larger Folksonomy. Users, resources for that particular resource. This strategy is strictly resource and tags may be modeled as a vector over the set of features dependent and ignores the tagging habits of the user. In a allowing reduced computation. Finally, Hebbian Deflation similar fashion a recommender may ignore the resource and may reveal similarities among items in a Folksonomy that recommend the most popular tags for that particular user. remained hidden when the data was expressed as a projec- While such an algorithm would include tags frequently ap- tion using tf or tf*idf. plied by the user, it ignores the resource information and may recommend tags irrelevant to the current resource. Tag Recommendation in Folksonomies An algorithm for tag recommendation in Folksonomies therefore requires a means to include both user and resource Recommendation algorithms serve a vital role in Web ap- information in the process so that the recommendation set plications allowing users to focus on a few relevant items includes tags that are relevant to the resource and also repre- rather than being overwhelmed by a large unordered set of sent the user’s tagging practice. mostly inappropriate options. Tag recommendation in Folk- sonomies reduces the user effort to a mouse click rather than a keyboard entry. This reduction in effort encourages tag- K-Nearest Neighbor ging more resources, promotes the application of multiple User Based K-Nearest Neighbor is a commonly used rec- tags to a resource, and may present the user with useful tags ommendation algorithm in Information Retrieval that can be modified to include both user and resource information. Tra- document frequency. In this work we focus on term fre- ditionally it finds a set of users similar to a query user. From quency. Similarly a user can be modeled as a vector over these neighbors a set of recommended items it constructed. the set of resources where each weight, w(ri ), corresponds We can modify this approach by ignoring users that have to the importance of a particular resource, ri . not tagged the query resource. Once a neighborhood of sim- ilar users has been discovered, the algorithm considers only u~r = hw(r1 ), w(r2 )...w(r|R|)i (7) on those tags that have been applied to the query resource and calculates a weight for each tag, wt , the average simi- Both of these models however ignore a portion of the user larity of the neighbors that have applied the tag to the query profile. A user model consisting merely of tags does not con- resource. Thus the algorithm is resource driven through both sider to which resources those tags have been applied. And the selection of neighbors and the selection of tags. Still it a user model consisting only of resources does not include remains user driven in that neighbors are determined through the tags applied to them. a user model. The user model may be extended to include both tags and Input: uq , a query user; rq , a query resource; k, resources. A new vector can be obtained by concatenating number of neighbors to consider; n, the number the two previously mentioned vectors. of tags to recommend Output: Tr , a set of recommended tags foreach u ∈ U that has annotated rq do ut+r ~ = hw(t1 )...w(t|T |), w(r1 )...w(r|R|)i (8) su = similarity(u, uq ) end While this model does include both tags and resources, Let N be k nearest neighbors to uq ; the model does not specify which tags were applied to which foreach u ∈ N do resources. However, the tags and resources may be tightly foreach t that u applied to rq do coupled in a vector over all tag-resource pairs where each wt += su /k weight, w(tri ), is one if the user has applied tag, t, to the end resource, r, and zero otherwise. end Sort tags by wt ; u(tr) ~ = hw(t1 r1 ), w(t1 r2 )...w(t|T |r|R|)i (9) Let Tr be the top n tags; Return Tr However these user models risk becoming exceedingly Algorithm 2: K-Nearest Neighbor Modified for Folk- sonomies large and extremely sparse. Hebbian features may be used to combat this sparsity. For example, features extracted from K-Nearest Neighbor is considered a lazy algorithm; the either UR or UT may be used to construct the user model: bulk of its computation takes place after the query. Tra- ditional approaches would require a comparison between the query user and every other user. However, since the uHebbian ~ = hf1 , f2 ...f|F |i (10) adapted algorithm for K-Nearest Neighbor considers only those users that have annotated the query resource, the num- These extracted features can greatly reduce the computa- ber of similarities to calculate is drastically reduced. The tional costs of calculating similarities since the number of popularity of resources in Folksonomies follows the power features is far smaller than the size of the original matrix. law and the great majority of resources will benefit from this Moreover feature extraction may discover hidden relation- reduced reduction in computation, while a few will require ships and identify similarities among users that the previ- additional computational effort. As a result the adapted K- ously described models would not capture. Nearest Neighbor scales well with large datasets, a trait not Several techniques exist to calculate the similarity be- shared by many other recommendation algorithms. tween vectors such as Jaccard similarity or Cosine similarity (Van Rijsbergen 1979). In this work we focus on cosine sim- User Models ilarity. Applications vary in the way they model users. Possible methods include recency, authority, linkage or vector space Boosting Tags models. In this work we focus on the vector space model In most traditional recommendation approaches the recom- (Salton, Wong, and Yang 1975) adapted from the Informa- mendation set would not include an item the user has already tion Retrieval discipline to work with Folksonomies. Each used. However in Collaborative Tagging Applications users user, u, can be modeled as a vector over the set of tags, often reuse tags. Previously used tags are then an important where each weight, w(ti ), in each dimension corresponds clue for the recommendation algorithm. to the importance of a particular tag, ti . We propose a boosting factor, b, that can be used to pro- mote tags in the user profile. As an additional step to the u~t = hw(t1 ), w(t2 )...w(t|T |)i (6) modified K-Nearest Neighbor recommender, b is added to In calculating the vector weights a variety of measures can the weight of the tag if the user has previously applied that be used: binary, term frequency or term frequency*inverse tag to another resource. foreach t ∈ T that any u ∈ N has applied to rq do the annotations were collected. This resulted in 99,864 dis- if (uq has applied t) then tinct usernames. wt = wt + b For each user, the “Network” and “Fans” were explored end recursively collecting additional usernames. A user’s Net- end work consists of the other users that the user has explicitly Algorithm 3: Optional Step Including Boost K-Nearest chosen to watch. Conversely a Fan is another user that has Neighbor explicitly chosen to watch the user. This resulted in a total of 524,790 usernames. From 10/20/2008 to 12/15/2008 the complete profiles of FolkRank all 524,790 users were collected. Each user profile con- In (Hotho et al. 2006) the authors proposed an adaptation of sisted of a collection of annotations including the resource, link analysis to the Folksonomy data structure. They have tags and date of the original bookmark. The top 100 most called this technique Folkrank since it computes a Pagerank prolific users were visually inspected; twelve were removed vector from the tripartite graph induced by the Folksonomy. from the data because their annotation count was many or- This graph is generated by regarding U ∪ R ∪ T as the set of ders of magnitude larger than other users and were therefore vertices. Edges are defined by the two-dimensional projec- suspected to be Web-bots. tions, UT, UR and RT. Due to memory and time constraints, 10% of the user pro- If we regard the adjacency matrix of this graph, W , (nor- files was randomly selected. A P -core of 20 was derived malized to be column-stochastic), a damping factor, d, and a such that each user, resource and tag appear in at least 20 preference vector, p, then we compute the Pagerank vector, posts where a post is defined as a user, resource and all tags w, in the usual manner: that user applied to the resource. The result was a dataset with 18,105 users, 42,646 re- w = dAw + (1 − d)p (11) sources and 13,053 tags. There are 2,309,426 annotations and 8,815,545 triples. The average number of tags in a post However due to the symmetry inherent in the tripartite is 3.82. graph, this basic Pagerank can too easily focus on the most Citeulike is a popular online tool used by researchers to popular elements in the Folksonomy. The Folkrank vector manage and discover scholarly references. They make their is taken as a difference between two computations of Pager- dataset freely available to download4. On 2/17/2009 the ank: one with a preference vector and one without the pref- most recent snapshot was downloaded with data extending erence vector. back to 5/30/2007. The data contains anonymous user ids In order to generate tag recommendations Folkrank uti- and posts for each user including resources, the date and lizes the preference vector to bias the algorithm towards the time of the posting and the tags applied to the resource. The query user and resource(Jaschke et al. 2007). These ele- original dataset contains 41,689 users, 1,370,729 resource ments are given a substantial weight in the preference vector and 284,389 tags. where all other elements have uniformly small weights. Because of its relatively small size and sparse data, the We have included this method as a benchmark as it has Citeulike data cannot support a P -core of 20. Instead a P - been shown to be an effective method of generating tag rec- core of 5 was derived. This reduced the size of the data to ommendations. However it has a distinct disadvantage in 2,051 users, 5,376 resource and 3,343 tags. There are 42,277 that it requires a complete computation of the Pagerank vec- annotations and 105,873 triples. The average number of tags tor for each query. This makes the method problematic when in a post is 2.50. working with data from large Folksonomies. Folksonomy Delicious Citeulike Experimental Evaluation Users 18,105 2,051 Here we describe the methods used to gather data for the Resources 42,646 5,376 experiments and provide details of our datasets. We then Tags 13,053 3,343 discuss modifications to N -Fold Cross Validation for Folk- Posts 2,309,427 42,277 sonomies and describe our experimental methodology. We Annotations 8,815,545 105,873 briefly discus the common metrics recall and precision and then detail the results of our experiments. Table 1: Datasets Data Sets An important distinction between the two datasets is their We validate our approach through extensive evaluation of focus. Users in Delicious are able to tag any URL avail- the proposed modifications using data from two real Collab- able on the Web. As such an individual’s interests are often orative Tagging Applications: Delicious and Citeulike. varied encompassing many topics. In Citeulike however re- Delicious is a popular Website in which users annotate searchers tag scholarly publications and their tagging is of- URLs. On 10/19/2008, 198 of the most popular tags were ten focused in their area of expertise. taken from the user interface. For each of these tags the 4 2,000 most recent annotations including the contributors of http://www.citeulike.org/faq/data.adp The data available for Delicious is also far more abundant. on a 2% overtraining-holdout set showed little improvement Using only a fraction of the data scraped from the Website, over 50 features. Additionally, 2000 epochs were selected the Delicious dataset still has more than fifty times the an- to train each feature, but in all cases the algorithm halted the notations in the Citeulike dataset. Moreover, the Delicious training when it detected overfitting and proceeded to the is far denser supporting a P -core of 20 rather than a P -core next feature. of 5. Table 2 shows selected tags along with their six most sim- ilar neighbors. (Salton, Wong, and Yang 1975) has demon- Experimental Methodologies strated similar results using co-occurrence, Folkrank and We implemented an extension of N -Fold Cross Validation context metrics. Similarities are calculated by representing for Folksonomies. Each user profile was divided among n each tag as a vector of weights over the Hebbian features folds, each fold containing approximately 1/n of each user’s and calculating the cosine similarity between two tags. posts. A post includes the user, a resource and all tags the Initial examination of the nearest tags lends credence to user applied to that resource. Models were built using n − 1 the assumption that Hebbian Deflation can be used to dis- folds of the data, while the posts in the remaining fold served cover meaningful features. For example the nearest tags as test cases. to “photo” are clearly redundant tags. However other tech- Each test case consists of a user, u, a resource, r, and all niques such as stemming and thesaurus tables could provide the tags the user has applied to that resource. These tags, Th , the same utility. are analogous to the holdout set commonly used in Informa- The example “toread” demonstrates that this method tion Retrieval. The tag recommendation algorithms accept goes beyond what other techniques might provide. Where the user-resource pair and return an ordered set of recom- “photo” had been a descriptive tag, “toread” is an organiza- mended tags, Tr . From the holdout set and recommendation tional tag. The ability of the Hebbian features to match it set utility metrics were calculated. with things that will in fact be read and other organization For each evaluation metric the average value was calcu- tags illustrated the effectiveness of Hebbian Deflation. lated across all test cases of an individual fold. The average Moreover in the example of “folksonomies” show how was then calculated across all folds. Experiments completed highly related words can be discovered through the Heb- on Delicious consisted of 10 folds, while experiments on Ci- bian features. “Tags” are of course a crucial part of “folk- teulike had 5 folds. sonomies” that provide “classification” by providing “meta- The exception to this methodology are the experiments data.” completed for Folkrank. Due to the steep computational re- Domain specific words like “mac” or “osx” are difficult quired for this approach only one post from each user was to relate, but again Hebbian features are able to highlight placed in the testing set. Experiments were then run on this their similarity. Perhaps the most intriguing example is the single testing set as described in (Hotho et al. 2006). subjective tag “cool.” Hebbian features are able to find two other subjective tags with high similarity, “interesting” and Experimental Metrics “fun”. Recall is a common metric for evaluating the utility of rec- Tags can also be modeled from features extracted from ommendation algorithms. It measures the percentage of UT; Similarly users and resources may be modeled from items in the holdout set that appear in the recommendation features extracted from the projections. In all cases similar set. Recall is a measure of completeness and is defined as: trends are observed. Table 3 shows the same experiment completed with data recall = |Th ∩ Tr |/|Th| (12) from Citeulike. Again RT matrix was built from count data. The learning rate is set to .001. As before, 2000 epochs were Precision is another common metric for measuring the selected, but RMSE testing on a holdout set halted the train- usefulness of recommendation algorithms. It measures the ing of features when overfitting was detected. 100 features percentage of items in the recommendation set that appear were extracted, but only the first 20 showed progress in re- in the holdout set. Precision measures the exactness of the ducing RMSE. recommendation algorithm and is defined as: The related tags in Citeulike show similar trends to those found in Delicious. However because of its sparsity and rel- precision = |Th ∩ Tr |/|Tr | (13) atively small size, it appears to be more difficult to extract features. Nevertheless visual inspection of the tags reaffirms Hebbian Features the assumption that Hebbian Deflation and cosine similarity A fundamental assumption of user models based upon Heb- offer a method to discover meaningful features. bian Deflation is that meaningful features can be extracted While this paper proposes using Hebbian features to from the two-dimensional projections. In order to affirm model users, these features offer many other uses for Folk- this assumption we have taken the two-dimensional projec- sonomies. Hebbian features might provide a means for users tion of the Delicious dataset, RT, using the tag counts as the to navigate the Folksonomy. After selecting a user, resource weights and performed Hebbian Deflation to generate RX or tag the system could present additional items based upon and T X. A learning rate of .001 was chosen as this value the Hebbian features. The user may then select an item from allows the features to converge quickly and smoothly. Ini- that list and explore other users, resources or tags that are tially 100 features were built, but examination of the RMSE similar. By repeating this process the user could traverse photo shopping 0.789 photos 0.803 Shopping 0.737 photography 0.780 shop 0.652 pictures 0.560 buy 0.640 foto 0.530 google 0.637 Photo 0.519 store 0.627 fotos 0.469 handmade toread folksonomy 0.886 article 0.807 tagging 0.816 articles 0.786 tags 0.811 advice 0.691 tag 0.810 Bookmarks 0.635 classification 0.808 to read 0.574 metadata 0.805 interesting 0.523 folksonomies Figure 1: The effect of k in KN N on recall and precision mac cool for a recommendation set of 5 tags. Users are modeled as a 0.849 osx 0.711 interesting vector over the tag space. 0.840 Mac 0.673 fun 0.778 apple 0.621 imported 0.768 OSX 0.608 how-to 0.653 macosx 0.595 article 0.650 Apple 0.571 useful Table 2: Selected Delicious tags and their nearest neighbors using cosine similarity and 50 Hebbian features extracted from the RTc matrix datamining networks 0.981 computational 0.840 social networks 0.935 conceptual 0.792 classification 0.933 data-mining 0.751 community 0.930 webservices 0.740 socialnetworks Figure 2: The effect of boosting previously used tags in 0.925 tools 0.738 graph KN N on recall and precision for a recommendation set of 0.915 data mining 0.737 functional 5 tags. Users are modeled as a vector over the tag space. Table 3: Selected Citeulike tags and their nearest neighbors using cosine similarity and 20 Hebbian features extracted techniques. from the RTc matrix The experiments with K-Nearest Neighbor require the tuning of two key variables: k, the number of neighbors, and b, the boosting factor. the Folksonomy or focus in on a particular domain. Heb- Figure 1 shows the relation between k and the the evalua- bian features might be particularly useful in this task since tion metrics recall and precision for a recommendation set of they are able to uncover relationships in the Folksonomy size 5. The Delicious dataset was used for this experiment. that other methods, such as co-occurrence, are unable to dis- Weights in the user vector are calculated as the frequency of cover. tags or tf. This experiment does not include any boosting The individual features themselves might be interesting. factor for previously used tags. As k increases so does re- A domain expert could analyze the features in an effort to call and precision. However this improvement suffers from understand how the Folksonomy is growing, what aspects diminishing returns until a k of 100 offers little more benefit are dominating, and which users are having the most impact. than a k of 50. This trend was observed for all K-Nearest Moreover this reduced yet rich feature space can be uti- Neighbor experiments. As such, all K-Nearest Neighbor ex- lized in a variety of data mining tasks: recommendation, per- periments were completed using a k of 50. Similar results sonalization, search, navigation, etc. are observed for the Citeulike dataset. Figure 2 demonstrates the effectiveness of the boosting Experimental Results modification for K-Nearest Neighbor. The modification Here we present our experimental results beginning with the gives extra weight to those tags the user has previously ap- tuning of variables. We discuss the impact of the boost vari- plied to a resource. This experiment is completed with a k able in the quality of the K-Nearest Neighbor algorithm, of 50; b is adjusted in the range of 0 through 0.20 at 0.025 then provide an in depth comparison of the recommendation increments. Both recall and precision for a recommendation set of 5 tags are sharply improved when b in increased to cision for a recommendation set of size 1 jumps from 40.4% 0.05. Afterward, the effect of the boosting parameter slowly to 50.9%, a dramatic increase of 10.5%. However, this im- diminishes. provement diminishes as the size of the recommendation is Table 4 provides a more detailed view of the effect boost- increased. Precision for a recommendation set of 5 climbs ing can have. For example without any boosting factor the from 18.4% to 20.8%, a 2.4% improvement. For a recom- precision for a recommendation set of size 3 is 43.5%. With mendation set of size 10, the improvement shrinks to 0.3%. the boosting factor, precision is increased to 45.9%, an im- An examination of recall shows similar signs. For a rec- mediate 3.4% gain. For all size recommendation sets preci- ommendation set of size 1, the improvement is 5.4%. The sion is increased. The boosting factors enables K-Nearest improvement drops to 4.7% for a recommendation set of size Neighbor to become more precise. 5 and drops further to an improvement of 1.4% when N is Likewise recall increases across the board. For example increased to 10. recall given a recommendation set of size 10 jumps from In general boosting tags based upon previous usage is 66.9% to 69.5%, a 2.6% increase. Boosting therefore ap- demonstrated to add additional utility to the K-Nearest pears to increase the completeness of the recommendation Neighbor algorithm. Yet differences in the improvements set. between Delicious and Citeulike offer additional insights. First the average number of tags per posts in Delicious is Delicious 3.82. Citeulike has less with 2.5 tags per post, only 65% the KN N − U T, b = 0.00 KN N − U T, b = 0.05 number found in Delicious. As a result Citeulike presents N Rec. Prec. N Rec. Prec. a smaller target for tag recommendation. Moreover since 1 0.211 0.567 1 0.232 0.606 the holdout set is smaller for Citeulike, boosting will have 2 0.332 0.489 2 0.361 0.519 a greater impact on smaller recommendation sets than on 3 0.418 0.435 3 0.450 0.459 larger sets when contrasted with Delicious. This is observed 4 0.485 0.394 4 0.516 0.413 as the improvement to precision garnered from boosting 5 0.538 0.361 5 0.568 0.376 drops from 10.5% when the recommendation set contains 6 0.580 0.333 6 0.609 0.345 1 tag to only 0.3% when the recommendation set consists of 7 0.613 0.307 7 0.641 0.317 10 tags. Improvements in Delicious, on the other hand, are 8 0.636 0.282 8 0.664 0.291 more stable dropping from 3.9% to 1.8%. An examination of 9 0.653 0.259 9 0.682 0.268 recall shows a parallel trend; in both cases recall improves 10 0.669 0.239 10 0.695 0.248 as the size of the recommendation set increases subject to Citeulike diminishing returns. However the effect of diminishing re- KN N − U T, b = 0.00 KN N − U T, b = 0.05 turns hampers Citeulike recommendations earlier and more 1 0.201 0.404 1 0.255 0.509 strongly than it affects Delicious. These observations sug- 2 0.292 0.309 2 0.355 0.377 gest that care must be taken when comparing recommenda- 3 0.346 0.252 3 0.407 0.299 tion algorithms across multiple Folksonomies. 4 0.383 0.213 4 0.439 0.247 Furthermore the two datasets have a markedly different 5 0.407 0.184 5 0.456 0.208 focus. Delicious users are able to annotate any URL on the 6 0.427 0.162 6 0.465 0.178 Web often tagging resources across many different topics. 7 0.444 0.145 7 0.474 0.157 Citeulike users on the other hand are focused on scholarly 8 0.457 0.131 8 0.479 0.139 publications and often focus primarily on their area of ex- 9 0.467 0.120 9 0.484 0.125 pertise. This observation suggests that recommendation al- 10 0.474 0.110 10 0.488 0.113 gorithms giving added weight to resources may be more ap- propriate for Delicious since user information may befuddle Table 4: The recall and precision for the top 10 recom- the recommendation algorithm by incorporating unrelated mended tags of K-Nearest Neighbor applied to the Deli- tags. Conversely if a user in Citeulike has annotated items cious and Citeulike datasets. Users are modeled as vectors related to his research and does not stray from this topic, the over the tag space. Vector weights are computed as the tag user profile based on tags could offer exceptional utility. frequency. k was set to 50. N is number of tags recom- Evidence for this analysis is provided in the difference of mended. Detailed results for two different boost values, 0.00 precision and recall between the two datasets. For a rec- and 0.05, are presented. ommendation set of size 1, boosting a user’s previously as- signed tags offers a 3.9% gain in Delicious and a 10.5% gain in Citeulike. This trend continues as N increases until the This behavior is consistent with all K-Nearest Neighbor improvements gradually diminish. This stark contrast sug- experiments conducted on the Delicious dataset and across gests that recommendation algorithms augmented by boost- all user models and all values for k; boosting results in an ing tags offer more gain for focused Folksonomies such as approximate 2.0% to 4.0% increase in recall and precision. Citeulike than broader Folksonomies such as Delicious. The optimum value for b is between 0.05 and 0.075. For Having ascertained the optimal values for k and the boost- consistency all other K-Nearest Neighbor experiments are ing factor we turn our attention to the user models for K- run using a boosting factor of 0.05. Nearest Neighbor. Experimental results are shown in Fig- Similar results were discovered using Citeulike data. Pre- ure 3 detailing the recall and precision for recommendation Figure 3: The comparison of tag recommender strategies for Delicious. sets of size 1 through 10. For comparison purposes rec- Similar trends are observed for Citeulike except for a ommendation algorithms based on popularity are included. few notable exceptions. First recommendations that rely on “Most Popular” recommends the most popular tags in the the popularity of a tag given a user outperform recommen- dataset. “Most popular by User” recommends the most pop- dations based on the popularity of a resource. This reaf- ular tag for a particular user. “Most popular by Resource” firms our notion that the focused nature of Citeulike is bet- recommends the most popular tags for a given resource. K- ter suited for algorithms that rely on user-tag information Nearest Neighbor models users as a vectors over the tag whereas resource-tag information is critical in broader Folk- space (KNN-UT), vectors over the resource space (KNN- sonomies where a user’s annotations cover multiple topic ar- UR), a concatenation of the two (KNN-UR+T), a strict com- eas. bination of every resource-tag pair (KNN-U(RT)) and as fea- Folkrank outperforms other methods as a measure of re- tures derived through Hebbian deflation on either the UT or call to a larger extent than in the Delicious dataset, but fur- UR matrix (KNN-UT-Hebbian or KNN-UT-Hebbian). For ther trails as a measure of precision. all K-Nearest Neighbor models, k is set to 50 and the boost As in Delicious, K-Nearest Neighbor which treats users factor is set to .05. “Folkrank” is provided for further com- as vectors over the set of tags performs strongly in Citeu- parison and adapts link analysis to the folksonomy structure, like, whereas the effectiveness of other approaches vary. The recommending tags through manipulation of the preference ability of this model to outperform other methods in both vector. datasets should be noted. This is due to the inherent com- The approach that merely recommends tags that are pop- prehensiveness of the KNN-UT algorithm. ular throughout the Folksonomy achieves poor results in the Only those users that have tagged the query resource are Delicious dataset. Recommending popular tags for a spe- considered for the neighborhood resulting in an algorithm cific user fairs better, but is clearly out done by recommend- that focuses on user-resource information. Then only those ing popular tags given a specific resource. Nearly all user tags that have been applied to the query resource are consid- models for K-Nearest Neighbor surpass these techniques ered for the recommendation set focusing on the resource- based upon popularity. In particular models that treat each tag connections. Finally by treating user models as vectors user as a vector over the set of tags appear to perform best for over the tag space the recommender incorporates user-tag re- the Delicious dataset. Folkrank offers additional complete- lationships. Consequently the KNN-UT algorithm accounts ness as seen by its superior recall, but offers less specificity for all three aspects of the Folksonomy and is adaptable to as measured by it precision. many Folksonomies that may require an emphasis on spe- Figure 4: The comparison of tag recommender strategies for Citeulike. cific relationships. Not surprisingly the user models that per- hour using K-Nearest Neighbor. form nearly as well as KNN-UT are KNN-UR+T and KNN- UT-Hebbian which share the same characteristic. Though Conclusions and Future Work KNN-UT-Hebbian does perform relatively poorly in Citeu- like, likely due to the fact that this dataset is sparse and the In this work we have proposed using K-Nearest Neighbor Hebbian features are more difficult to extract. for tag recommendation in Folksonomies. Due to the unique Beyond recall and precision, time constraints should data structure of Folksonomies, modifications are required be considered when evaluating tag recommendation algo- to adapt the algorithm. Neighbors are selected only if they rithms. Recommenders based on popularity can perform have tagged the query resource and tags are selected for the much of the computation offline thereby streamlining the recommendation set only if they have been applied by the recommendation process. K-Nearest Neighbor on the other neighbor to the query resource. These modifications tie user- hand is often referred to as a lazy algorithm since it per- resource and resource-tag information into the algorithm forms the bulk of its computation during the query process. while it dramatically reduces the computational costs. There However since the proposed modifications to the algorithm exists a myriad of ways in which to calculate user similarity; limit the number of similarities that must be calculated to We have found that cosine similarity between users modeled only those users that have tagged the query resource, the as vectors over the tag space performs well. This model in- algorithm scales very well with large datasets. The compu- corporates user-tag information into the algorithm. By in- tational cost may be reduced further if Hebbian features are cluding all three relationships inherent in Folksonomies, the extracted from the Folksonomy thereby reducing the length algorithm is robust for both broad and narrow Folksonomies. of vectors used for calculating cosine similarity. Hebbian In addition, K-Nearest Neighbor can be improved by boost- deflation however is appropriate only when the data is dense ing tags the user has previously used. The performance of enough that meaningful features can be extracted. K-Nearest Neighbor exceeds that of recommendation algo- Folkrank, while it performs well in tag recommendation, rithms based on popularity, while the running time makes it is hampered by computational costs requiring a complete computationally viable for large real world Folksonomies. calculation of the Pagerank vector for each query. For exam- In the future we plan to investigate alternative tag rec- ple to compute 18,105 recommendations required 80 hours ommendation strategies and study resource or user recom- of computation on a modern dual-core desktop. In contrast mendation algorithms. Other approaches such as associa- 2,309,427 recommendations were completed in less than 1 tion rules mining and neural networks are worth considering for recommendation in Folksonomies. Probabilistic Latent Nakamoto, R. Y.; Nakajima, S.; Miyazaki, J.; Uemura, S.; Semantic Analysis offers an alternative means to derive fea- Kato, H.; and Inagaki, Y. 2008a. Reasonable tag-based col- tures from the Folksonomy. Feature extraction of any sort laborative filtering for social tagging systems. In WICOW presents intriguing opportunities in search, navigation, per- ’08: Proceeding of the 2nd ACM workshop on Information sonalization and recommendation. credibility on the web, 11–18. New York, NY, USA: ACM. Nakamoto, R. Y.; Nakajima, S.; Miyazaki, J.; Uemura, S.; Acknowledgments and Kato, H. 2008b. Investigation of the effectiveness of This work was supported in part by the National Science tag-based contextual collaborative filtering in website rec- Foundation Cyber Trust program under Grant IIS-0430303 ommendation. In Advances in Communication Systems and and a grant from the Department of Education, Graduate As- Electrical Engineering, 309–318. Springerlink. sistance in the Area of National Need, P200A070536. Oja, E., and Karhunen, J. 1985. On stochastic approxima- tion of the eigenvectors and eigenvalues of the expectation References of a random matrix. Journal of Mathematical Analysis and Adrian, B.; Sauermann, L.; and Roth-Berghofer, T. 2007. Applications 106(1):69–84. Contag: A semantic tag recommendation system. In Salton, G., and Buckley, C. 1988. Term-weighting ap- Pellegrini, T., and Schaffert, S., eds., Proceedings of I- proaches in automatic text retrieval. Information Process- Semantics’ 07, pp. 297–304. JUCS. ing and Management: an International Journal 24(5):513– Basile, P.; Gendarmi, D.; Lanubile, F.; and Semeraro, G. 523. 2007. Recommending smart tags in a social bookmarking Salton, G.; Wong, A.; and Yang, C. 1975. A vector system. In Bridging the Gep between Semantic Web and space model for automatic indexing. Communications of Web 2.0 (SemNet 2007), 22–29. the ACM 18(11):613–620. Batagelj, V., and Zaveršnik, M. 2002. Generalized cores. Schmitz, C.; Hotho, A.; Jaschke, R.; and Stumme, G. 2006. Arxiv preprint cs/0202039. Mining association rules in folksonomies. In Proc. IFCS Golder, S. A., and Huberman, B. A. 2006. Usage patterns 2006 Conference, 261–270. Springer. of collaborative tagging systems. Journal of Information Sigurbjörnsson, B., and van Zwol, R. 2008. Flickr tag Science 32(2):198. recommendation based on collective knowledge. In WWW Heymann, P.; Ramage, D.; and Garcia-Molina, H. 2008. ’08: Proceeding of the 17th international conference on Social tag prediction. In SIGIR ’08: Proceedings of the World Wide Web, 327–336. New York, NY, USA: ACM. 31st annual international ACM SIGIR conference on Re- Song, Y.; Zhuang, Z.; Li, H.; Zhao, Q.; Li, J.; Lee, W.-C.; search and development in information retrieval, 531–538. and Giles, C. L. 2008. Real-time automatic tag recom- New York, NY, USA: ACM. mendation. In SIGIR ’08: Proceedings of the 31st annual Hotho, A.; Jaschke, R.; Schmitz, C.; and Stumme, G. 2006. international ACM SIGIR conference on Research and de- Information retrieval in folksonomies: Search and ranking. velopment in information retrieval, 515–522. New York, The Semantic Web: Research and Applications 4011:411– NY, USA: ACM. 426. Tso-Sutter, K. H. L.; Marinho, L. B.; and Schmidt-Thieme, Jaschke, R.; Marinho, L.; Hotho, A.; Schmidt-Thieme, L.; L. 2008. Tag-aware recommender systems by fusion of and Stumme, G. 2007. Tag Recommendations in Folk- collaborative filtering algorithms. In SAC ’08: Proceedings sonomies. LECTURE NOTES IN COMPUTER SCIENCE of the 2008 ACM symposium on Applied computing, 1995– 4702:506. 1999. New York, NY, USA: ACM. Lipczak, M. 2008. Tag recommendation for folksonomies Van Rijsbergen, C. 1979. Information Retrieval. oriented towards individual users. In Proceedings of Butterworth-Heinemann Newton, MA, USA. the ECML/PKDD 2008 Discovery Challenge Workshop, Xu, Z.; Fu, Y.; Mao, J.; and Su, D. 2006. Towards the part of the European Conference on Machine Learning semantic web: Collaborative tag suggestions. Collabo- and Principles and Practice of Knowledge Discovery in rative Web Tagging Workshop at WWW2006, Edinburgh, Databases. Scotland, May. Macgregor, G., and McCulloch, E. 2006. Collaborative tagging as a knowledge organisation and resource discov- ery tool. Library Review 55(5):291–300. Mathes, A. 2004. Folksonomies-Cooperative Classi- fication and Communication Through Shared Metadata. Computer Mediated Communication, (Doctoral Seminar), Graduate School of Library and Information Science, Uni- versity of Illinois Urbana-Champaign, December. Mika, P. 2007. Ontologies are us: A unified model of social networks and semantics. Web Semantics: Science, Services and Agents on the World Wide Web 5(1):5–15.