Introduction

Combining Tag Recommendations Based on User History

Ilari T. Nieminen

ilari.nieminen@tkk.fi 0 0 Helsinki University of Technology

This paper describes our attempt at Task 2 of ECML PKDD Discovery Challenge 2009. The task was to predict which tags a given user would use on a given resource using methods that only utilize the graph structure of the training dataset, which was a snapshot of BibSonomy. The approach combines simple recommendation methods by weighting recommendations based on the tagging history of the user.

Collaborative Filtering for Folksonomies

Introduction

Collaborative tagging systems or folksonomies have steadily gained popularity in the recent years. Users are free to choose the tags they want to use, and while this may be a main reason behind the popularity of these systems, it is also one of the biggest problems these systems face. As users come up with new tags they forget the tags they used to use, making it di cult to nd the previously tagged content. Tag recommendation can help both in search and in keeping the users' tagging practices consistent. Tag recommendation can be de ned as the problem of nding suitable tags or labels to a given resource for a given user.

Tag recommendation can be an important element in a folksonomy as it can help users employ the tags consistently as well as help users to use same tags for similar resources. This can improve searching within the users' own resources as well as the folksonomy.

We present a method for tag recommendation that combines several baseline methods and collaborative ltering. Combining the results makes use of the past performance of the recommenders. 2.1 where sim is the cosine similarity The set of recommendations for a given user-resource pair (u; r) is

Projections URY 2 0; 1jUj jRj, ( URY )u;r := 1 i 9t 2 T s:t: (u; r; t) 2 Y and UT Y 2 0; 1jUj jT j, ( UT Y )u;t := 1 i 9r 2 R s:t: (u; r; t) 2 Y let us de ne the \tag neighbourhood" and \resource neighbourhood" of the users. The set of k nearest neighbours for a user u using the neighbourhood matrix X is ( 1 ) ( 2 ) ( 3 ) Nuk := argmaxk sim(xu; xv)

u2U sim(x; y) :=

x y jjxj j jjyj j T 0(u; r) := argmaxn t2T

X sim(xu; xv) (v; r; t) v2Nuk where (v; r; t) := i (v; r; t) 2 Y . 2.2

Baseline Methods

The following are a collection of simple recommendation methods, which do not produce very good recommendations and have few redeeming qualities except that they are computationally inexpensive.

Popular tags for a resource. If the users of the folksonomy are homogenous, this method can be expected to perform almost as well as CF methods. However, if the users have very di erent tagging habits or if people use di erent tags from di erent languages, performance for the minorities can be expected to su er.

Popular tags for a user. Some users use relatively few but obscure tags, which means that the popular tags for resource -recommender will not work. Collaborative recommendations also will not work well, as the user will probably have very few applicable \tag neighbours" and the \resource neighbours" will most likely not use the same tags. For example, user 483 used the tag \allgemein" a total of 2237 times in the 9003 posts. In other words, given a post by this user at random, there is almost a 25% chance it is tagged \allgemein".

Globally popular tags. Recommending the most used tags is perhaps the simplest possible method.

We used several variants of the aforementioned recommenders. These and the method used to combine the recommendations are described in chapter 4.1. 3

Data Description and Preprocessing

The provided training data contains three les: bibtex, bookmark and tas. The bibtex and bookmark les describe the content of the links and BibTeX entries, respectively. The tas le contains the tag assignments. Also provided was the post-core at level 2 [3], which contained a reduced set, which contained only those users, resources and tags that appear at least in two posts. The test set for this task was known to have the users, resources and tags from this set.

We processed bookmarks and BibTeX entries identically. The only information extracted from the \bookmark" and \bibtex" tables were the hash values which identi ed the resources. We used the url hash and simhash1 columns and did not attempt to combine duplicate resources. The url hash considers two resources di erent if there are any di erences in the url, such as a trailing slash.

To retain a slightly better neighbourhoods for the collaborative ltering approach we used full training set to calculate the neighbourhoods, but removed the tags that could not appear in the results. The di erence between this and the post-core at level 2 was that this left several partial posts to the training data.

No e ort was made to separate functional tags (such as \myown" and \toread") from descriptive tags, which are considerably more interesting in tag recommendation.

Some of the most used tags in BibSonomy are used by a small minority, such as \juergen" (3101 posts, 2 users). In total, in the subset of tags that are contained in the post-core 2 there are 273 tags that have been used at least 100 times by at most 5 people. A measure for the popularity of the tag, which takes into account the number of users of a tag can be de ned as popularity(t) = log(Nt) log(Nt ); (4) where Nt is the number of times the tag t has been used and Nt is number of users for the tag t.

This measure can be used to improve tag recommendation methods which would not otherwise give weights to di erent tags.

As can be seen from Table 1, sorting the tags by their \popularity" removes the unlikely tag \zzztosort" while preserving a sensible selection of popular tags.

Results Combining Recommendations

The baseline methods can yield good results on certain users, but they are generally worse than the alternatives. However, combining the baseline results with results from collaborative ltering or other methods can be used to improve the general results. The problem of combining results is in evaluating the trustworthiness of the recommender results.

In tag recommendation, there are multiple \items" that are recommended, and besides the similarity between the user and the neighbours of the user there are few evident factors that could be used to weight the tags when combining di erent methods. In our method, we used the training data to predict the recent posts of the users (1-100 posts, but at most 20% of the user's all posts)

In our approach, we took the arbitrary set of methods shown in Table 2 and assigned weights to di erent tags by calculating the weighted sum over all recommenders using the per-user per-post weighted sum wt :=

X [t 2 T 0] 0:9kfp p (5) where fp is the F-measure of the method p 2 1; ::; 7 on the validation set, and k is the position of the tag in the recommendation. This reduces the weight of the tag slightly so that the methods with smaller F-measure have a better possibility of getting a likely tag in the nal results. The nal recommendation are the ve t 2 T with the highest wt.

Prior to the competition, we performed a test with the training data. The posts were divided into three sets based on the post date. The rst 80% was selected to work as a training set, the following 10% as the validation set and the last 10% were used for testing. The method weights were computed from the validation set. The resulting weights were tested on the test set, showing a modest 5% improvement in the F-measure over the best baseline method in the test. The weights for the methods were assigned to the users in the competition set by generating recommendations for recent posts with all the methods listed in the previous section. The amount of posts was chosen was up to 100 posts, but at most 20% of the user's all posts. After this, the F-measure for each method was used to generate a mixing pro le for each user. Then the recommendations were made for the competition set and these were combined using the equation 5. The results are summarized in Table 3.

One of the baselines (resource tags) outperforms the combined result slightly on the competition set. Some of the recommendations, such as \resource tags", can contain very unlikely tags when the resource itself is tagged only a few times and contains unpopular tags; this was not taken into account when combining the recommendations. A possible solution for this problem is to not recommend unlikely (unpopular) tags if the user hasn't used them in the past. 5

Conclusion

In these experiments, the weights of the recommenders are based on their past performance, but it is likely that there are several features that can be used to estimate these weights from statistical features of the user, such as the average \popularity" of the user's tags and the number of distinct tags. We would like to study these numbers for correlations. Recommendations by other methods, such as FolkRank [1] could be added to improve the performance on the dense parts of the data.

The obtained results were less than stellar; in retrospect, more attention should have been paid to the combining of the results and especially the fact that the results of the recommendations were far from independent. Some method for ltering the results should have been applied, perhaps by modifying the weights for the individual tags by using the information whether the target user has used a certain tag before and how popular the tag is. Simple methods should not be completely neglected, as they can provide useful results for users who do not conform to the tagging practices of the mainline users of the folksonomy. 6

Discussion

F-measure works as a performance measure for tag recommendation to a certain extent, but the utility of tag recommendation methods for usability and search within a folksonomy should be con rmed with user tests. Combining di erent tag recommendation results with di erent weights at di erent times may cause the recommendation to feel inconsistent.

Searching within a folksonomy is sometimes unnecessarily di cult. A part of the problem is that users tend to use only a few tags per post. One improvement for these tagging systems would be to ask for applicability of a set of tags that are similar to the ones user has already chosen. It might make sense to distinguish between the problems of tag prediction, that is, predicting the tags user will choose, and tag recommendation, the problem of nding descriptive tags for a resource. 7

Acknowledgements

The author acknowledges Heikki Kallasjoki's technical assistance and MariSanna Paukkeri's comments. This work was supported by the Academy of Finland through the Adaptive Informatics Research Centre that is a part of the Finnish Centre of Excellence Programme.

1. Jaschke, R., Marinho , L.B. , Hotho , A. , Schmidt-Thieme , L. , Stumme , G.: Tag recommendations in folksonomies . In Kok, J.N. , Koronacki , J., de Mantaras, R.L. , Matwin , S. , Mladenic , D. , Skowron , A., eds. : PKDD . Volume 4702 of Lecture Notes in Computer Science., Springer ( 2007 ) 506 { 514

2. Hotho , A. , Jaschke, R., Schmitz , C. , Stumme , G.: BibSonomy: A social bookmark and publication sharing system . In: Proc. Workshop on Conceptual Structure Tool Interoperability at the Int. Conf. on Conceptual Structures . ( 2006 ) 87 { 102

3. Batagelj , V. , Zaversnik , M. : Generalized cores ( 2002 ) cs .DS/0202039, http://arxiv.org/abs/cs/0202039.