Understanding the user: Personomy translation for tag recommendation Robert Wetzker1 , Alan Said1 , and Carsten Zimmermann2 1 Technische Universität Berlin, Germany 2 University of San Diego, USA Abstract. This paper describes our approach to the challenge of graph- based tag recommendation in social bookmarking services. Along the ECML PKDD 2009 Discovery Challenge, we design a tag recommender that accurately predicts the tagging behavior of users within the Bibson- omy bookmarking service. We find that the tagging vocabularies among folksonomy users differ radically due to multilingual aspects as well as heterogeneous tagging habits. Our model overcomes the prediction prob- lem resulting from these heterogeneities by translating user vocabular- ies, so called personomies, to the global folksonomy vocabulary and vice versa. Furthermore we combine our user-centric translation approach with item-centric methods to achieve more accurate solutions. Since our method is purely graph-based, it can also readily be applied to other folksonomies. 1 Introduction Over the last years, social bookmarking services, such as Delicious3 , Bibsonomy4 and CiteULike5 , have grown rapidly in terms of usage and perceived value. One distinguishing feature provided by these services is the concept of tagging - the labeling of content with freely chosen keywords (tags). Tagging enables users to describe and categorize resources in order to organize their bookmark collections and ease later retrieval. Social bookmarking services are therefore the classic ex- ample of collaborative tagging communities, so called folksonomies [1][2]. The consumer-centric (collaborative) tagging aspect differentiates social bookmark- ing from other content sharing community services, such as Flickr6 or YouTube7 , where tags are generally assigned by the content creator [3]. Most folksonomy solutions assist users during the bookmarking process by recommending tags. Thus, a user can select recommended tags from various sets in addition to entering tags manually. Despite their positive effect on us- ability, these recommenders are effective tools to limit tag divergence within 3 http://delicious.com 4 http://bibsonomy.org 5 http://citeulike.org 6 http://flickr.com 7 http://www.youtube.com folksonomies as they are generally considered to lower the ratio of misspellings and to increase the likelihood of tag reassignments. The design of a folksonomy tag recommender was one of the tasks of the ECML PKDD 2009 Discovery Challenge8 . In the following sections, we describe our solution to this task as submitted. Our approach is based on the observation that the tag vocabularies of users, their personomies, differ within a folksonomy. This heterogeneity is mainly caused by differences in the tags users constantly assign to categorize content and the multilingualism of the user base, as apparent for Bibsonomy. To overcome the problems caused by this heterogeneity, we propose a tagging model that trans- lates the personomy of each user to the folksonomy vocabulary and vice versa. We find that our model is highly accurate as it characterizes an item by its tag spectrum before translating this spectrum to a user’s personomy. We then combine the translational model with item-centric tag models to improve per- formance. This paper is structured as follows: The introductory section presents a graph model for the underlying data structure and explains the actual goals of the chal- lenge. This is followed by an analysis of different properties of the Bibsonomy dataset with respect to their impact on tag recommendation. Section 3 intro- duces and discusses the tag vocabularies found within folksonomies, before we present our recommendation algorithm and evaluation results. 1.1 Modeling folksonomies According to [2], a folksonomy can be described as a tuple F := (I, T, U, Y ), where I = {i1 , . . . , ik }, T = {t1 , . . . , tl } and U = {u1 , . . . , um } are finite sets of items, tags and users, and Y is a ternary relation whose elements are called tag assignments. A tag assignment (TAS) is defined by the authors as relation Y ⊆ U × T × I, so that the tripartite folksonomy hypergraph is given by G = (V, E), where V = I ∪ T ∪ U and E = {(i, t, u)|(i, t, u) ∈ Y }. The set of all bookmarks is then given as BM = {(i, u)|∃t : (i, t, u) ∈ Y }. This graph structure is characteristic for all folksonomies. 1.2 The challenge: Graph-based tag recommendations (Task 2) The ECML PKDD 2009 Discovery Challenge consists of different tasks related to the problem of tag recommendation. Testbeds for all tasks are different snap- shots of the Bibsonomy bookmarking service. Our solution contributes to the task of ”Graph-Based Recommendations (Task 2)“ as it does not consider the content of the given resources. The recommendation task in this setting resem- bles the problem of link prediction within G given a user and an item node. The recommender thus needs to estimate P (t|i, u), the probability of observing a given tag when a combination of user and item has been observed. 8 http://www.kde.cs.uni-kassel.de/ws/dc09 1.3 Related work One of the first works on tag recommendation in folksonomies is [4], where the authors compare the performance of co-occurrence based tag recommenders and more complex recommenders based on the FolkRank algorithm [2]. Furthermore they report only minor improvements of the FolkRank method over the co- occurrence approaches. Parts of their analysis is performed on a snapshot of the Bibsonomy dataset. Further related work was presented by the participants of last years challenge on tag recommendation. The authors of [5] enrich tag vocabularies by terms extracted from all bookmarks’ meta data, such as user given descriptions or the abstract, author, year etc. information in case of publications. More similar to our work is the approach in [6]. The team combines the keywords found within a resource’s title and the actual tags previously assigned to a resource with the tags from a user’s personomy. Each vocabulary is mapped to the global tag vocabulary using co-occurrence tables. This mapping is similar to our translation process. However, the fusion of different sources differs from our method and no user-centric optimization of parameters was reported. 2 The dataset The Bibsonomy bookmarking service allows its users to bookmark URLs and publications in parallel. This hybrid approach makes Bibsonomy different from other bookmarking communities such as Delicious or CiteULike. For each web bookmark, participants are given the URL, the title and an optional descrip- tion of the resource as provided by the user during the bookmarking process. Bookmarked publications generally come with information about the title, the authors, the abstract or other common bibliographic attributes. The complete- ness of this information is not guaranteed, and many attribute fields are left empty. Table 1 gives an overview of the node statistics found within the dataset for p-core levels one and two9 . Table 1. Different node set sizes of the Bibsonomy dataset for p-core levels 1 and 2. p-core |E| |BM | |BMBIB | |BMU RL | |I| |T | |U | 1 1,401,104 421,928 263,004 158,924 378,378 93,756 3,617 2 253,615 64,120 41,268 22,852 22,389 13,276 1,185 For the construction of our recommender we ignore the meta-data attached to the bookmarked content and only consider the graph G. Furthermore, we do not distinguish between URLs and publications, but merge both node sets to the item set I. Both decisions result in a loss of information which potentially 9 The p-core of a folksonomy graph has the characteristic that all contained nodes appear in at least p bookmarks. See [4] for details. 106 106 items items tags tags 5 users 5 users 10 10 104 104 # count # count 103 103 102 102 101 101 100 0 100 0 10 101 102 103 104 105 10 101 102 103 104 105 node degree node degree (a) 1-core (b) 2-core Fig. 1. Node degree distributions within the full Bibsonomy dataset (a) and the dataset at p-core level 2. Node degree distributions found within folksonomies generally exhibit power law characteristics with few highly connected nodes and many nodes with low occurrence. This characteristic is obtained when cutting the graph to its level 2-core. reduces the overall accuracy of our recommender. However, we believe this loss to be compensated by the general applicability of our approach and the improved validity of the presented results. Figure 1 shows the node degree distributions of the full dataset and its 2- core. As previously reported for other folksonomies (e.g. [2][7]), we find the node degree distributions of Bibsonomy to exhibit power law characteristics, with very few and very frequent nodes on one end and many infrequent nodes on the other. This characteristic is basically obtained when reducing G to its 2-core. However, we find that this reduction drastically reduces the impact of some previously very influential users. These users tend to have a rather ”organizational“ background and bookmark large collections of similar resources which are of no interest to other users. Most of these resources therefore fall victim to the p-core reduction, which also explains the drastic cut on items in Table 1. 3 Tag vocabularies As users bookmark items only once10 , we cannot estimate P (t|i, u) directly from previous observations. Instead, we base our estimates on other distributions. The most basic sources are the overall tag distribution and the tags previously assigned by the user, or to the item. Global tag distributions. This is the distribution of all observed tag assign- ments within the training data. By definition, each tag of the test data has 10 Note that a small percentage of user item combinations found in the given dataset occur in more than one bookmark. to occur within this distribution. A recommender that only considers the global tag distribution would assume P (t|u, i) ≈ P (t). We will refer to such a recommender as MostPopular recommender. Item tag distributions. This is the distribution of previously assigned tags for a given item. It was shown that the tag distributions of items converge to a characteristic tag spectrum over time. Furthermore, as reported by [8] and [9], the resulting tag distributions often follow a power law with few tags being assigned very frequently and most tags occurring in the long tail. If we neglect the personalization aspect of tagging, we can recommend tags by assuming P (t|i, u) ≈ P (t|i). However, our observation of P (t|i) within the training data may be limited, i.e. information about most tags will be missing. User tag distributions (Personomies). Each user develops his own vocabu- lary of tags over time called his personomy. Users will generally be interested in reassigning previously used tags as this will simplify content search later on. The interest in tag convergence often results in the frequent assignment of a limited number of category tags, and it was shown that user vocabu- laries develop power law characteristics over time [10]. A personomy based recommender would assume P (t|i, u) ≈ P (t|u). Once again, the distribution P (t|u) estimated over the training data is likely to miss a variety of tags especially for users with few bookmarks. The authors of [4] report that a tag model which combines user and item tag distributions into a unified distribution achieves sufficient recommendation accuracy. We will consider a hybrid recommder with P (t|i, u) ≈ αP (t|i) + (1 − α)P (t|u) as an additional baseline approach during our evaluations (MostPopu- lar2d ). 4 Our approach The design of our tag recommender is based on two intuitive assumptions: 1. Tags are personalized. Different users will assign different tags to the same item. This effect cannot only be explained by statistical variance. In- stead, we find users developing their own category tags over time. One of the implications for the tag recommendation task is the problem of recommend- ing the right version of a tag, especially in cases where synonymous tags exist. This includes different spellings, such as ”web20“ versus ”web2.0“. Even though semantically equal, these will be different for a user who as- signs tags for content categorization. Furthermore, especially in multilingual folksonomies, we find that users often assign keywords from their mother tongue. This is of particular importance for Bibsonomy, where many users seem to come from Germany, with the effect that the tag distribution is a mixture of German and English words. Whereas some users tagged a site as ”searchengine“ related we also find the German translation (”suchmas- chine“) among the frequent tags. 2. Tags describe items. The authors of [11] report that the vast majority of assigned tags on Delicious identify the topic, the type or the owner of a URL. We can therefore assume that users will assign personomy tags depending on the item. This assumption is a simplification as it excludes tags, such as ”toread“ or ”self“, which actually refer to the user item relationship instead of the item itself. However, we believe that these tags cannot be easily pre- dicted based on the given training data but require deeper knowledge about the user. Luckily, as reported by [12] for Delicious, usage context and self reference tags are relatively scarce compared to descriptive tags. These assumptions directly influenced the basic design decisions for our rec- ommender which suggests tags from a user’s personomy with respect to the community opinion about the underlying item. 4.1 Translating personomies We assume that each user has a distinctive vocabulary of tags. These tags can be translated to the community tag vocabulary by looking at co-occurrences within the shared item space. We are thus interested in the probability P (tu |t, u) that a user will assign a tag tu from his personomy as next tag given that the item was tagged as t by another user. Based on previous knowledge about the users tagging behavior we can estimate P (tu |t, u) as X P (tu |t, u) = P (tu |i, u)P (i|t), (1) i∈Iu where Iu is the set of items previously bookmarked by the user. Based on P (tu |t, u) we can now translate the global folksonomy language to the personomy of a user. For the recommendation task we are interested in the probability P (tu |i, u) for previously unseen user item combinations. For a new item with an observed tag vocabulary of P (t|i), the probability that a user will assign one of his tags next is given as X P (tu |i, u) = P (tu |t, u)P (t|i). (2) t∈Ti Note that tu ∈Tu P (tu |i, u) = 1 is only true if P P all tags in Ti have a mapping in Tu which is rather unlikely. Instead we expect tu ∈Tu P (tu |i, u) to decrease the more the given item deviates from the items previously bookmarked by the user. 4.2 The tag recommender The tag recommender we propose, selects tags coming from three sources: the personomy of a user where tags are weighted by P (tu |i, u), the item vocabu- lary (P (t|i)) and the global vocabulary (P (t)). Including the item vocabulary is important, as many tags may be item specific and are thus unlikely contained within the personomy. We also include the global tag distribution P (t) for cases where little is known about user and item alike. We assume a weighted linear combination of sources and estimate P (t|i, u) as P (t|i, u) ≈ αu P (tu |i, u) + αi P (t|i) + (1 − αu − αi )P (t), (3) with 0 ≤ αu , αi , αu + αi ≤ 1. We then recommend the N tags with highest probability P (t|i, u), where N is a parameter that needs to be optimized together with αu and αi . Optimization can take place on a global or a user-centric scale. Global optimization. For the globally optimized model, we assume equal parameter settings for all users. As the Bibsonomy dataset is rather small, we can use a brute-force approach to find the combination of N , αu and αi that maximizes the F-measure. We do so by performing a 10-fold cross-validation on the training data. We call this user-centric tag recommender with global optimization U CG . User-centric optimization. As users are heterogeneous, it is not intuitive to assume shared parameter preferences. Instead, it seems straightforward to optimize parameters for each user separately. Once again, we do so by performing a cross-validation on the training data. We then use a brute-force approach to find the combination of N , αu and αi that maximizes the F-measure of each user averaged over all folds. We will refer to the user-centric tag recommender with local optimization as U CL . 5 Evaluation We trained the models of all recommender types on the 2-core version of the dataset. The parameters of the MostPopular2d, U CG and the U CL models were fine-tuned in a 10-fold cross-validation as described above. For the MostPopu- lar2d recommender we found an α value of 0.5 to perform best. For the user- centric tag recommender with global optimization the maximal F1 measure was achieved when setting αu = 0.6 and αi = 0.4. The weight of the global tag dis- tribution thus resulted 0 which means that including the global vocabulary did not yield performance gains. For the MostPopular2d as well as the U CG recom- mender the best number of tags to recommend was 5. Evaluating the accuracy of all recommender types during the cross validation, we found the user-centric tag recommender with local optimization (U CL ) to constantly outperform all other versions. We therefore submitted the predicted tags of the U CL approach as our solution to the challenge. The released test dataset consists of 778 bookmarks from 136 users linking to 667 items. Table 2 presents the achieved F1 measures on the first five ranks for the various recommender types. The U CL recommender we submitted achieved a Table 2. Performance of various recommender types on the test data. Underlined values represent the configuration that performed best during training. The submitted recommender (U CL ) achieved an F1 measure of 0.314. The best F1 measure could have been achieved with a U CG recommender always suggesting 3 tags (bold). Recommender F1 @1 F1 @2 F1 @3 F1 @4 F1 @5 M ostP opular 0.021 0.038 0.051 0.051 0.059 M ostP opular2dα=0.5 0.229 0.286 0.306 0.313 0.310 U CG,αu =0.6,αi =0.4 0.246 0.326 0.335 0.334 0.330 U CL 0.230 0.294 0.306 0.311 0.314 performance of 0.314. This result is somewhat disappointing as it is only slightly above the result of the simpler M ostP opular2d recommender. However, we find that the approach of vocabulary translation is generally superior as the results of the U CG recommender are significantly better. We observe similar performance patterns when looking at the precision/recall curves plotted in Figure 2. Investigating the reasons for the weak performance of the U CL recommender, we find that the user distribution of the test set deviates from the trained one as shown in Figure 3. This deviation is likely to have a negative impact on the prediction quality as parameters have been tuned in expectation of a user distribution similar to the one of the training set. However, this problem is not 0.5 0.45 0.4 0.35 0.3 Precision 0.25 0.2 UCG 0.15 UCL MostPopular2dα=0.5 MostPopular2dα=0 0.1 MostPopular2dα=1 MostPopular 0.05 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Recall Fig. 2. Precision/Recall curves for various recommenders on the provided test data. The curve of the U CL recommender appears ”shorter“ as this recommender suggests a variable number of tags. 0.12 0.1 0.08 % test tiems 0.06 0.04 0.02 0 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 % training items Fig. 3. Relative number of bookmarks per user in test and training data. The correla- tion between the user participation in the test set and the trained distribution is rather low (ρ = 0.24). unique to the U CL recommender but is expected to have a negative impact on the performance of all recommender types. Instead, we believe that the weak performance of the U CL recommender is caused by an inadequate parameter tuning for users less present in the training data but frequent in the test set. Tuning αu , αi and N for these users often results in bad estimates due to missing data. Whereas the implications of these shortcomings are rather minor when users are distributed as in the training data, they seem to become major for the test distribution. The fact that the test set is dominated by users with rather small training vocabularies is also reflected by the performance of the M ostP opular2d rec- ommenders with α set to 0 and 1 as shown in Figure 2. Here, we find that a recommender which only suggest tags from a user’s personomy (α = 1) performs very bad, whereas an item based recommender (α = 0) achieves nearly as good results as the mixture model α = 0.5. This implies, that most tags of the test data are not present within a user’s personomy at training time or, less likely, that the tagging behavior of users drastically changed in the test phase. The inadequate modeling of infrequent users (and items) is an expected shortcoming of a purely graph-based recommendation approach. This is espe- cially true for our personomy translation approach which requires the tags of the given item to have a mapping within the conditional distribution P (tu |t, u) (see equation 2). Incorporating the provided item meta-data may be a promising alternative to improve accuracy in scenarios where little is known about users and items from a graph perspective. 6 Conclusions In this paper, we presented a novel approach to the challenge of graph-based tag recommendation in folksonomies. Building on the assumption that all users of a folksonomy have their own tag vocabulary, our approach translates the per- sonomies of users to the global folksonomy vocabulary. Evaluation results show that this translation helps to significantly improve tag prediction performance. Furthermore, we fine-tuned our model by estimating parameters on a per user basis. Even though this user-centric approach performed rather disappointing during the challenge, we believe that user-level optimization will be essential for the success of future (tag) recommenders. References 1. Wal, T.V.: Folksonomy coinage and definition, February 2, 2007, http://www.vanderwal.net/folksonomy.html. 2. Hotho, A., Jäschke, R., Schmitz, C., Stumme, G.: Information retrieval in folk- sonomies: Search and ranking. In: Proc. of the 3rd European Semantic Web Con- ference (ESWC), Springer (2006) 3. Marlow, C., Naaman, M., Boyd, D., Davis, M.: Ht06, tagging paper, taxonomy, flickr, academic article, to read. In: HYPERTEXT ’06, New York, NY, USA, ACM (2006) 4. Jäschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., Stumme, G.: Tag recom- mendations in folksonomies. In: Lernen - Wissensentdeckung - Adaptivität (LWA) Workshop Proc. (2007) 5. Tatu, M., Srikanth, M., D’Silva, T.: Rsdc’08: Tag recommendations using book- mark content. In: Proc. of the ECML PKDD Discovery Challenge (RSDC08). (2008) 6. Lipczak, M.: Tag recommendation for folksonomies oriented towards individual users. In: Proc. of the ECML PKDD Discovery Challenge (RSDC08). (2008) 7. Heymann, Paul Garcia-Molina, H.: Collaborative creation of communal hierar- chical taxonomies in social tagging systems. Technical report, Computer Science Department, Stanford University (2006) 8. Cattuto, C., Loreto, V., Pietronero, L.: Collaborative tagging and semiotic dy- namics. PNAS 104 (2007) 9. Halpin, H., Robu, V., Shepherd, H.: The complex dynamics of collaborative tag- ging. In: WWW ’07: Proc. of the 16th int. conf. on World Wide Web, New York, NY, USA, ACM (2007) 10. Zhang, D., Mao, R., Li, W.: The recurrence dynamics of social tagging. In: WWW ’09: Proc. of the 18th int. conf. on World wide web, New York, NY, USA, ACM (2009) 11. Golder, S.A., Huberman, B.A.: Usage patterns of collaborative tagging systems. J. of Information Science 32(2) (2006) 12. Bischoff, K., Firan, C.S., Nejdl, W., Paiu, R.: Can all tags be used for search? In: CIKM ’08: Proc. of the 17th ACM conf. on Information and knowledge manage- ment, New York, NY, USA, ACM (2008)