Relational Classi cation for Personalized Tag Recommendation : : : : : : : : :

ECML PKDD Discovery Challenge 2009 (DC09)

Principles

Slovenia

September

264 305

Relational Classi cation for Personalized Tag Recommendation : : : : : : : : :

Leandro Balby Marinho, Christine Preisach, and Lars SchmidtThieme

Measuring Vertex Centrality in Co-occurrence Graphs for Online Social

Tag Recommendation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Ivan Cantador, David Vallet, and Joemon M. Jose

Social Tag Prediction Base on Supervised Ranking Model : : : : : : : : : : : : : :

Hao Cao, Maoqiang Xie, Lian Xue, Chunhua Liu, Fei Teng, and Yalou Huang

Tag Recommendations Based on Tracking Social Bookmarking Systems : :

Szymon Chojnacki A Fast E ective Multi-Channeled Tag Recommender : : : : : : : : : : : : : : : : : : Jonathan Gemmell, Maryam Ramezani, Thomas Schimoler, Laura Christiansen, and Bamshad Mobasher

A modi ed and fast Perceptron learning rule and its use for Tag

Recommendations in Social Bookmarking Systems : : : : : : : : : : : : : : : : : : : :

Anestis Gkanogiannis and Theodore Kalamboukis

Discriminative Clustering for Content-Based Tag Recommendation in

Social Bookmarking Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Malik Tahir Hassan, Asim Karim, Suresh Manandhar, and James Cussens

Time based Tag Recommendation using Direct and Extended Users Sets :

Tereza Iofciu and Gianluca Demartini 3 5 7 17 35 49 59 71 85 99

A Weighting Scheme for Tag Recommendation in Social Bookmarking

Systems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 109

Sanghun Ju and Kyu-Baek Hwang

ARKTiS - A Fast Tag Recommender System Based On Heuristics : : : : : : : 119

Thomas Kleinbauer and Sebastian Germesin

Tag Recommendation using Probabilistic Topic Models : : : : : : : : : : : : : : : : 131

Ralf Krestel and Peter Fankhauser

A Probabilistic Ranking Approach for Tag Recommendation : : : : : : : : : : : 143

Zhen Liao, Maoqiang Xie, Hao Cao, and Yalou Huang

Tag Sources for Recommendation in Collaborative Tagging Systems : : : : : 157

Marek Lipczak, Yeming Hu, Yael Kollet, and Evangelos Milios

Collaborative Tag Recommendation System based on Logistic Regression 173

E. Montan~es, J. R. Quevedo, I. D az, and J. Ranilla

Content- and Graph-based Tag Recommendation: Two Variations : : : : : : : 189

Johannes Mrosek, Stefan Bussmann, Hendrik Albers, Kai Posdziech, Benedikt Hengefeld, Nils Opperman, Stefan Robert, and Gerrit Spira

A Two-Level Learning Hierarchy of Concept Based Keyword Extraction

for Tag Recommendations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 201

Hendri Mur and Klaus Obermayer STaR: a Social Tag Recommender System : : : : : : : : : : : : : : : : : : : : : : : : : : : : 215 Cataldo Musto, Fedelucio Narducci, Marco de Gemmis, Pasquale Lops, and Giovanni Semeraro

Combining Tag Recommendations Based on User History : : : : : : : : : : : : : : 229

Ilari T. Nieminen

Factor Models for Tag Recommendation in BibSonomy : : : : : : : : : : : : : : : : 235

Ste en Rendle and Lars Schmidt-Thieme Content-based and Graph-based Tag Suggestion : : : : : : : : : : : : : : : : : : : : : : 243

Xiance Si, Zhiyuan Liu, Peng Li, Qixia Jiang, and Maosong Sun

RSDC'09: Tag Recommendation Using Keywords and Association Rules : 261

Jian Wang, Liangjie Hong and Brian D. Davison

Understanding the user: Personomy translation for tag recommendation : 275

Robert Wetzker, Alan Said, and Carsten Zimmermann A Tag Recommendation System based on contents : : : : : : : : : : : : : : : : : : : : 285

Ning Zhang, Yuan Zhang, and Jie Tang

A Collaborative Filtering Tag Recommendation System based on Graph : 297

Yuan Zhang, Ning Zhang, and Jie Tang

Since 1999 the ECML PKDD embraces the tradition of organizing a Discovery Challenge, allowing researchers to develop and test algorithms for novel and

real world datasets. This year's Discovery Challenge1 presents a dataset from the eld of social bookmarking to deal with the recommendation of tags. The results submitted by the challenge's participants are presented at an ECML

PKDD workshop on September 7th, 2009, in Bled, Slovenia. The provided dataset has been created using data of the social bookmark

and publication sharing system BibSonomy,2 operated by the organizers of the challenge. The training data was released on March 25th 2009, the test data on

July 6th. The participants had time until July 8th to submit their results. This

gave researchers 14 weeks time to tune their algorithms on a snapshot of a real world folksonomy dataset and 48 hours to compute results on the test data.

To support the user during the tagging process and to facilitate the tagging,

BibSonomy includes a tag recommender. When a user nds an interesting web page (or publication) and posts it to BibSonomy, the system o ers up to ve recommended tags on the posting page. The goal of the challenge is to learn a model which e ectively predicts the keywords a user has in mind when describing a web page (or publication). We divided the problem into three tasks, each of which focusing on a certain aspect. All three tasks get the same dataset for training. It is a snapshot of BibSonomy until December 31st 2008. The dataset is cleaned and consists of two parts, the core part and the complete snapshot.

The test dataset is di erent for each task.

Task 1: Content-Based Tag Recommendations. The test data for this task contains posts, whose user, resource or tags are not contained in the post-core at level 2 of the training data. Thus, methods which can't produce tag recommendations for new resources or are unable to suggest new tags very probably won't produce good results here.

Task 2: Graph-Based Recommendations. This task is especially intended for methods relying on the graph structure of the training data only. The user, resource, and tags of each post in the test data are all contained in the training data's post-core at level 2.

Task 3: Online Tag Recommendations. This is a bonus task which will take place after Tasks 1 and 2. The participants shall implement a recommendation service which can be called via HTTP by BibSonomy's recommender infrastructure when a user posts a bookmark or publication. All participating recommenders are called on each posting process, one of them is chosen to actually deliver the results to the user. We can then measure the performance of the recommenders in an online setting, where timeouts are important and where we can measure which tags the user has clicked on.

1 http://www.kde.cs.uni-kassel.de/ws/dc09/ 2 http://www.bibsonomy.org Results. More than 150 participants registered for the mailing list which enabled them to download the dataset. At the end, we received 42 submissions { 21 for each of the Tasks 1 & 2. Additionally, 24 participants submitted a paper that can be found in the proceedings at hand. We used the F1-Measure common in Information Retrieval to evaluate the

submitted recommendations. Therefore, we rst computed for each post in the test data precision and recall by comparing the rst ve recommended tags against the tags the user has originally assigned to this post. Then we averaged precision and recall over all posts in the test data and used the resulting precision and recall to compute the F1-Measure as f1m = 2prperceicsiisoinon+ rreeccaallll .

The winning team of Task 1 has an f1m of 0:18740, the second and third follow

with 0:18001 and 0:17975. For Task 2, the winner achieved an f1m of 0:35594, followed by 0:33185 and 0:32461. The winner of Task 3 will be announced at the conference and later on the website of the challenge.

Lipczak et al. from Dalhousie University, Halifax, Canada (cf. page 157) are the winners of Task 1. With a method based on the combination of tags from the resource's title, tags assigned to the resource by other users and tags in the user's pro le they reached an f1m of 0:18740 in Task 1 and additionally achieved the third place in Task 2 with an f1m of 0:32461. The system is composed of six recommenders and the basic idea is to augment the tags from the title by related tags extracted from two tag-tag{co-occurrence graphs and from the user's pro le and then rescore and merge them.

The winners of Task 2, Rendle and Schmidt-Thieme from University of Hildesheim, Germany (cf. page 235) achieved an f1m of 0:35594 with a statis

tical method based on factor models. Therefore, they factorize the folksonomy structure to nd latent interactions between users, resources and tags. Using a variant of the stochastic gradient descent algorithm the authors optimize an adaptation of the Bayesian Personal Ranking criterion. Finally, they estimate how many tags to recommend to further improve precision.

The second of Task 1 (Mrosek et al., page 189) harvests tags from sources like Delicious, Google Scholar, and CiteULike. They also employ the full-text of web pages and PDFs. The third (Ju and Hwang, page 109) merges tags which have been earlier assigned to the resource or used by the user as well as resource descriptions by a weighting scheme. Finally, the second of Task 2 (Balby Marinho et al., page 7) uses relational classi cation methods in a semi-supervised learning scenario to recommend tags.

We thank all participants of the challenge for their contributions and the

organizers of the ECML PKDD 2009 conference for their support. Furthermore, we want to thank our sponsors Nokia3 and Tagora4 for supporting the challenge by awarding prizes for the winners of each task. We are looking forward to a very exciting and interesting workshop.

Kassel, August 2009

Folke Eisterlehner, Andreas Hotho, Robert Jaschke

3 http://www.nokia.com/ 4 http://www.tagora-project.eu/ Relational Classification for Personalized Tag Recommendation

Leandro Balby Marinho, Christine Preisach, and Lars Schmidt-Thieme

Information Systems and Machine Learning Lab (ISMLL)

Samelsonplatz 1, University of Hildesheim, D-31141 Hildesheim, Germany {marinho,preisach,schmidt-thieme}@ismll.uni-hildesheim.de http://www.ismll.uni-hildesheim.com/ Abstract. Folksonomy data is relational by nature, and therefore methods that directly exploit these relations are prominent for the tag recommendation problem. Relational methods have been successfully applied to areas in which entities are linked in an explicit manner, like hypertext documents and scientific publications. For approaching the graph-based tag recommendation task of the ECML PKDD Discovery Challenge 2009, we propose to turn the folksonomy graph into a homogeneous post graph and use relational classification techniques for predicting tags. Our approach features adherence to multiple kinds of relations, semi-supervised learning and fast predictions. 1

Introduction One might want tag recommendations for several reasons, as for example: simplifying the tagging process for the user, exposing different facets of a resource and helping the tag vocabulary to converge. Given that users are free to tag, i.e., the same resource can be tagged differently by different people, it is important to personalize the recommended tags for an individual user.

Tagging data forms a ternary relation between users, resources and tags, differently from typical recommender systems in which the relation is usually binary between users and resources. The best methods presented so far explore this ternary relation to compute tag predictions, either by means of tensor factorization [8] or PageRank [3], on the hypergraph induced by the ternary relational data. We, on the other hand, propose to explore the underlying relational graph between posts by means of relational classification.

In this paper we describe our approaches for addressing the graph-based tag recommendation task of the ECML PKDD Discovery Challenge 2009. We present two basic algorithms: PWA* (probabilistic weighted average), an iterative relational classification algorithm enhanced with relaxation labelling, and WA* (weighted average), an iterative relational classification method without relaxation labelling. These methods feature: adherence to multiple kinds of relations, training free, fast predictions, and semi-supervised classification. Semi-supervised classification is particularly appealing because it allows us to evtl. benefit from the information contained in the test dataset. Furthermore, we propose to combine these methods through unweighted voting.

The paper is organized as follows. Section 2 presents the notation used throughout the paper. In Section 3 we show how we turned the folksonomy into a post relational graph. Section 4 introduces the individual classifiers and the ensemble technique we used. In Section 5 we elaborate on the evaluation and experiments conducted for tuning the parameters of our models, and report the results obtained on the test dataset released for the challenge. The paper closes with conclusions and directions for future work. 2

Notation Foksonomy data usually comprises a set of users U , a set of resources R, a set of tags T , and a set Y of ternary relations between them, i.e., Y ⊆ U × R × T .

Let

X := {(u, r) | ∃t ∈ T : (u, r, t) ∈ Y } be the set of all unique user/resources combinations in the data, where each pair is called a post. For convenience, let T (x = (u, r)) := {t ∈ T | (u, r, t) ∈ Y } be the set of all tags assigned to a given post x ∈ X. We consider train/test splits based on posts, i.e., Xtrain, Xtest ⊂ X disjoint and covering all of X:

Xtrain ∪˙Xtest = X

For training, the learner has access to the set Xtrain of training posts and their true tags T |Xtrain . The tag recommendation task is then to predict, for a given x ∈ Xtest, a set Tˆ(x) ⊆ T of tags that are most likely to be used by the resp. user for the resp. resource. 3

Relation Engineering We propose to represent folksonomy data as a homogeneous, undirected relational graph over the post set, i.e., G := (X, E) in which edges are annotated with a weight w : X × X → R denoting the strength of the relation. Besides making the input data more compact – we have only a binary relation R ⊆ X ×X between objects of the same type – this representation will allow us to trivially cast the problem of personalized tag recommendations as a relational classification problem.

Relational classifiers usually consider, additionally to the typical attribute-value data of objects, relational information. A scientific paper, for example, can be connected to another paper that has been written by the same author or because they share common citations. It has been shown in many classification problems that relational classiefirs perform better than purely attribute-based classifiers [1, 4, 6].

In our case, we assume that posts are related to each other if they share the same user: Ruser := {(x, x0) ∈ X × X | user(x) = user(x0)}, the same resource: Rres := {(x, x0) ∈ X × X|res(x) = res(x0)}, or either share the same user or resource: Rruesser := Ruser ∪ Rres (see Figure 1). For convenience, let user(x) and res(x) denote the user and resource of post x respectively. Thus, each post is connected to each other either in terms of other users that tagged the same resource, or the resources tagged by the same user. Weights are discussed in Section 4.

Note that it may happen that some of the related posts belong themselves to the test dataset, allowing us to evtl. profit from the unlabeled information of test nodes through, e.g., collective inference (see Section 4). Thus, differently from other approaches (e.g., [3, 8]) that are only restricted to Xtrain, we can also exploit the set Xtest of test posts, but of course not their associated true tags.

Now, for a given x ∈ Xtest, one can use the tagging information of related instances to estimate Tˆ(x). A simple way to do that is, e.g., through tag frequencies of related posts:

P (t|x) := |{x0 ∈ Nx|t ∈ T (x0)}| , x ∈ X, t ∈ T

|Nx| while Nx is the neighborhood of x:

Nx := {x0 ∈ X | (x, x0) ∈ R, T (x) 6= ∅}

In section 4 we will present the actual relational classifiers we have used to approach the challenge. We extract the relational information by adapting simple statistical relational methods, usually used for classification of hypertext documents, scientific publications or movies, to the tag recommendation scenario. The aim is to recommend tags to users by using the neighborhood encoded in the homogeneous graph G(X, E). Therefore we described a very simple method in eq. (1), where the probability for a tag t ∈ T given a node x (post) is computed by counting the frequency of neighboring posts x0 ∈ Nx that have used the same tag t. In this case the strength of the relations is not taken into account, i.e., all considered neighbors of x have the same influence on the probability of tag t (1) (2) given x. But this is not an optimal solution, the more similar posts are to each other the higher the weight of this edge should be.

Hence, a more suitable relational method for tag recommendation is the WeightedAverage (WA) which sums up all the weights of posts x0 ∈ Nx that share the same tag t ∈ T and normalizes this by the sum over all weights in the neighborhood.

P (t|x) =

Px0∈Nx|t∈T (x0) w(x, x0)

Px0∈Nx w(x, x0) Thus, WA does only consider neighbors that belong to the training set.

A more sophisticated relational method that takes probabilities into account is the probabilistic weighted average (PWA), it calculates the probability of t given x by building the weighted average of the tag probabilities of neighbor nodes x0 ∈ Nx: P (t|x) =

P x0∈Nx w(x, x0)P (t|x0) P x0∈Nx w(x, x0) (3) (4)

Where P (t|x0) = 1 for x0 ∈ Xtrain, i.e., we are only exploiting nodes contained in the training set (see eq. (2)). We will see in the next paragraph how one can exploit these probabilities in a more clever way. Both approaches have been introduced in [5] and applied to relational datasets.

Since we want to recommend more than one tag we need to cast the tag recommendation problem as a multilabel classification problem, i.e., assign one or more classes to a test node. We accomplish the multilabel problem by sorting the calculated probabilities P (t|x) for all x ∈ Xtest and recommend the top n tags with highest probabilities.

The proposed relational methods could either be applied on Rruesser, i.e., the union of the user and resource relation or on each relation Ruser, Rres individually. If applied on each relation the results could be combined by using ensemble techniques. 4.1

Semi-Supervised Learning As mentioned before, we would like additionally, to exploit unlabeled information contained in the graph and use the test nodes that have not been tagged yet, but are related to other nodes. This can be achieved by applying collective inference methods, being iterative procedures, which classify related nodes simultaneously and exploit relational autocorrelation and unlabeled data. Relational autocorrelation is the correlation among a variable of an entity to the same variable (here the class) of a related entity, i.e., connected entities are likely to have the same classes assigned. Collective Classification is semi-supervised by nature, since one exploits the unlabeled part of the data. One of this semi-supervised methods is relaxation labeling [1], it can be formally expressed as: P (t|x)(i+1) = M (P (t|x0)(xi0)∈Nx ) (5)

We first initialize the unlabeled nodes with the prior probability calculated using the train set, then compute the probability of tag t given x iteratively using a relational classification method M based on the neighborhood Nx in the inner loop. The procedure stops when the algorithm converges (i.e., the difference of the tag probability between iteration i and i + 1 is less than a very small ) or a certain number of iterations is reached.

We used eq. (4) as relational method inside the loop, where we do not require that the neighbors x0 are in the training set, but are using the probabilities of unlabeled nodes. For PWA this means that in each iteration we use the probabilities of the neighborhood estimated in the previous iteration collectively. PWA combined with collective inference is denoted as PWA* in the following.

For WeightedAverage we did not use relaxation labeling but applied a so called oneshot-estimation [5, 7]. We did only use the neighbors with known classes, i.e., in the first iteration we exploit only nodes from the training set, while in the next iteration we used also test nodes that have been classified in the previous iterations. The procedure stops when all test nodes could be classified or a specific number of iterations is reached. Hence, the tag probabilities are not being re-estimated like for the relaxation labeling but only estimated once. Thus, WA combined with the one-shot-estimation procedure is denoted as WA*. 4.2

Ensemble Ensemble classification may lead to significant improvement on classification accuracy, since uncorrelated errors made by the individual classifiers are removed by the combination of different classifiers [2, 6]. Furthermore, ensemble classification reduces variance and bias.

We have decided to combine WA* and PWA* through a simple unweighted voting, since voting performs particularly well when the results of individual classifiers are similar; as we will see in Section 5, WA* and PWA* yielded very similar results in our holdout set.

After performing the individual classifiers, we receive probability distributions for each classifier Kl as output and build the arithmetic mean of the tag-assignment probabilities for each test post and tag: (6) 4.3

Weighting Schemes The weight w in eq. (3) and (4) is an important factor in the estimation of tag probabilities, since it describes the strength of the relation between x and x0. There are several ways to estimate these weights: 1. For two nodes (x, x0) ∈ Rres, compute their similarity by representing x and x0 as user-tag profile vectors. Each component of the profile vector corresponds to the count of co-occurrences between users and tags: φuser-tag := (|Y ∩ ({user(x)} × R × {t})|)t∈T 2. Similarly to 1, for two nodes (x, x0) ∈ Ruser, the node similarity is computed by representing x and x0 as resource-tag profile vectors:

φres-tag := (|Y ∩ (U × {res(x)} × {t})|)t∈T 3. Similar to 2, but x and x0 are represented as resource-user profile vectors where each component corresponds to the count of co-occurrences between resources and users:

φres-user := (|Y ∩ ({u} × {res(x)} × T )|)u∈U 4. The same as in 1, but the node similarity is computed w.r.t. to user-resource profile vectors:

φuser-res := (|Y ∩ ({user(x)} × {r} × T )|)r∈R

The edge weight is finally computed by applying the cosine similarity over the desired profile vectors: sim(φ(x), φ(x0)) := hφ(x), φ(x0)i kφ(x)kkφ(x0)k

In our experiments we basically used the scheme 1, since there is no new user in the data and therefore we can always build user-tag profile vectors. 5

Evaluation All the results presented in this section are reported in terms of F1-score, the official measure used by the graph-based tag recommendation task of the ECML PKDD Discovery Challenge 2009. For a given x ∈ Xtest the F1-Score is computed as follows: F1-score Tˆ(x) = 2 · Recall Tˆ(x) · Precision Tˆ(x) Recall Tˆ(x) + Precision Tˆ(x) (7) (8)

Although the methods presented in Section 4 usually do not have free parameters, we realized that Ruser and Rres can have a different impact in the recommendation quality (cf. Figures 2 and 3), and thereby we introduced a parameter to reward the best relations in Rruesser by a factor c ∈ N: if Rres yields better recommendations than Ruser for example, all edge weights in Rruesser that refer to Rres are multiplied by c.

For searching the best c value we performed a greedy search over the factor range {1, ..., 4} on a holdout set created by randomly selecting 800 posts from the training data. Tables 1 and 2 show the characteristics of the training and test/holdout datasets respectively. Figure 2 presents the results of WA*-Full1, i.e., WA* performed over Rruesser, for different c values on the holdout set according to the F1-score. We also plot the results of WA*-Res and WA*-Usr (i.e., WA* on Rres and Ruser resp.).

After finding the best c value on the holdout set, we applied WA*-Full, PWA*-Full, and the ensemble (c.f. eq. 6) to the challenge test dataset (see Figure 3). Note that the 1 Since the results of PWA* and WA* are very similar, we just report on WA*. dataset |U | |R| |T | |Y | |X| BibSonomy 1,185 22,389 13,276 253,615 64,406 results on the challenge test dataset are much lower than those on the holdout set. It may indicate that either our holdout set was not a good representative of the population or that the challenge test dataset represents a concept drift. We plan to further investigate the reasons underlying this large deviation.

According to the rules of the challenge, the F1-score is measured over the Top-5 recommended tags, even though one is not forced to always recommend 5 tags. This is an important remark because if one recommends more tags than the true number of tags attached to a particular test post, one can lower precision. Therefore, we estimate the number of tags to be recommended to each test post by taking the average number of tags used by each test user to his resources. If a given test user has tagged his resources with 3 tags in average, for example, we recommend the Top-3 tags delivered by our algorithms for all test posts in which this user appears. 6

Conclusions In this paper we proposed to approach the graph-based tag recommendation task of the ECML PKDD Discovery Challenge 2009 by means of relational classification. We first turned the usual tripartite graph of social tagging systems into a homogeneous post graph, whereby simple statistical relational methods can be easily applied. Our methods are training free and the prediction runtime only depends on the number of neighbors and tags, which is fast since the training data is sparse. The models we presented also incorporate a semi-supervised component that can evtl. benefit from test data. We presented two relational classification methods, namely WA* and PWA*, and one ensemble based on unweighted voting over the tag probabilities delivered by these methods.

We also introduced a parameter in order to reward more informative relations, which was learned through a greedy search in a holdout set.

In future work we want to investigate new kinds of relations between the posts (e.g. content-based), other ensemble techniques, and new methods for automatically learning more informative weights. 0.5 0.4 0.3 0.2 0.1 0 1

Parameter tuning on the Holdout Test Data 2

3 Top-n

WA*-Full (c=1)

WA*-Full (c=2) WA*-Full (c=3.5)

WA*-Full (c=4)

WA*-Res WA*-Usr 4 5

Acknowledgements This work is supported by CNPq, an institution of Brazilian Government for scientific and technologic development, and the X-Media project (www.x-media-project.org) sponsored by the European Commission as part of the Information Society Technologies (IST) programme under EC grant number IST-FP6-026978. The authors also gratefully acknowledge the partial co-funding of their work through the European Commission FP7 project MyMedia (www.mymediaproject.org) under the grant agreement no. 215006. For your inquiries please contact info@mymediaproject.org. 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1

Results on the Challenge Test Data 2

Top-n Measuring Vertex Centrality in Co-occurrence Graphs for Online Social Tag Recommendation

Iván Cantador, David Vallet, Joemon M. Jose

Department of Computing Science

University of Glasgow Lilybank Gardens, Glasgow, G12 8QQ, Scotland, UK

{cantador, dvallet, jj}@dcs.gla.ac.uk Abstract. We present a social tag recommendation model for collaborative bookmarking systems. This model receives as input a bookmark of a web page or scientific publication, and automatically suggests a set of social tags useful for annotating the bookmarked document. Analysing and processing the bookmark textual contents - document title, URL, abstract and descriptions - we extract a set of keywords, forming a query that is launched against an index, and retrieves a number of similar tagged bookmarks. Afterwards, we take the social tags of these bookmarks, and build their global co-occurrence sub-graph. The tags (vertices) of this reduced graph that have the highest vertex centrality constitute our recommendations, which are finally ranked based on TF-IDF and personalisation based techniques.

Keywords: social tag recommendation, co-occurrence, graph vertex centrality, collaborative bookmarking. 1 Introduction Social tagging systems allow users to create or upload resources (web pages1, scientific publications2, photos3, video clips4, music tracks5), annotate them with freely chosen words – so called tags – and share them with others. The set of users, resources, tags and annotations (i.e., triplets user-tag-resource) is commonly known as folksonomy, and constitutes a collective unstructured knowledge classification. This implicit classification is then used by users to organise, explore and search for resources, and by systems to recommend users interesting resources.

These systems usually include tag recommendation mechanisms to ease the finding of relevant tags for a resource, and consolidate the tag vocabulary across users. However, as stated in [7], no algorithmic details have been published, and it is assumed that, in general, tag recommendations in current applications are based on suggesting those tags that most frequently were assigned to the resource, or to similar resources. 1 Delicious – Social bookmarking, http://delicious.com/ 2 CiteULike – Scholarly reference management and discovery, http://www.citeulike.com/ 3 Flickr – Photo sharing, http://www.flickr.com/ 4 YouTube – Video sharing, http://www.youtube.com/ 5 Last.fm – Personal online radio, http://www.last.fm/

Recent works have proposed more sophisticated and accurate methods for tag recommendation. These methods can be roughly classified into content-based and collaborative approaches. Content-based techniques [3, 4, 10, 16] analyse the contents and/or meta-information of the resources to extract keywords, which are directly suggested to the user or matched with existing tags. Collaborative strategies [6, 7, 17], on the other hand, exploit folksonomy relations between users, resources and tags to infer which of the tags of the system folksonomy are most suitable for a particular resource. Hybrid techniques, combining content and collaborative features, have been also investigated [5, 15].

In this paper, we present a hybrid tag recommendation model for an online bookmarking system where users annotate online web pages and scientific publications. The model receives as input a bookmark, analyses and processes its textual contents – document title, URL, abstract and description – extracting a set of keywords, and forms a query that is launched against an index to retrieve a number of similar tagged bookmarks. Afterwards, its takes the social tags of these bookmarks, and builds their global co-occurrence sub-graph. The tags (vertices) of this reduced graph that have the highest vertex centrality constitute the recommendations, which are finally ranked based on TF-IDF [14] and personalisation based techniques.

Participating at the ECML PKDD 2009 Discovery Challenge6, we have tested our approach with a dataset from BibSonomy system7, obtaining precision values of 42% and 25% when, respectively, one and five tags are recommended per bookmark. As we explain herein, the benefits of our approach are its low computational cost, and its capability of suggesting diverse tags in comparison to selecting the most popular tags matched with each bookmark.

The rest of the paper is organised as follows. Section 2 summarises state-of-the-art tag recommendation techniques. Section 3 describes the document and index models used by our tag recommender. Section 4 explains the stages of the recommendation process. Section 5 describes the experiments conducted to evaluate the proposal. Finally, Section 6 provides some conclusions and future work. 2 Related work Analogously to recommender systems [1], tag recommendation techniques can be roughly classified into two categories: content-based and collaborative techniques. Whereas content-based approaches focus on the suggestion of keywords extracted from resource contents and meta-data, collaborative approaches exploit the relations between users, resources and tags of the folksonomy graph to select the set of recommended tags. Continuing with the previous analogy, tag recommendation techniques that combine content-based and collaborative models can be called hybrid approaches, and techniques that make tag recommendations biased by the user’s (tagbased) profile can be called personalised models.

Based on the previous classification, in this section, we describe state-of-the-art tag recommendation techniques that have been proposed for social bookmarking systems. 6 ECML PKDD 2009 Discovery Challenge, http://www.kde.cs.uni-kassel.de/ws/dc09/ 7 BibSonomy – Social bookmark and publication sharing, http://www.bibsonomy.org/ 2.1

Content-based tag recommenders Mishne [10] presents a simple content-based tag recommender. Once a user supplies a new bookmark, bookmarks that are similar to it are identified. The tags assigned to these bookmarks are aggregated, creating a ranked list of likely tags. Then, the system filters and re-ranks the tag list. The top ranked tags are finally suggested to the user. To find similar bookmarks, the author utilises a document index, and keywords of the input bookmark to form a query that is launched against the index. The tags are scored according to their frequencies in the top results of the above query, and those tags that have been used previously by the user are boosted by a constant factor. Our approach follows the same stages, also using an index to retrieve similar bookmarks. It includes, however, more sophisticated methods of tag ranking based on tag popularity and personalisation aspects.

Byde et al. [3] present a personalised tag recommendation method on the basis of similarity metrics between a new document and documents previously tagged by the user. These metrics are derived either from tagging data, or from content analysis, and are based on the cosine similarity metric [14]. Similar metrics are used by our approach in some of its stages.

Chirita et al. [4] suggest a method called P-TAG that automatically generates personalised tags for web pages. Given a particular web page, P-TAG produces keywords relevant both to the page contents and data residing on the user’s desktop, thus expressing a personalised viewpoint. A number of techniques to extract keywords from textual contents, and several metrics to compare web pages and desktop documents, are investigated. Our approach applies natural language processing techniques to extract keywords from bookmark attributes, but it can be enriched with techniques like [4] to also analyse and exploit the textual contents of the bookmarked documents.

Tatu et al. [16] propose to extract important concepts from the textual metadata associated to bookmarks, and use semantic analysis to generate normalised versions of the concepts. For instance, European Union, EU and European Community would be normalised to the concept european_union. Then, users and resources are represented in terms of the created conceptual space, and personalised tag recommendations are based on intersections between such representations. In our approach, synonym relations and lexical derivations between tags are implicitly taking into consideration through the exploitation of tag cooccurrence graphs. 2.2

Collaborative tag recommenders Xu et al. [17] propose a collaborative tag recommender that favours tags used by a large number of users on the target resource (high authority in the HITS algorithm [8]), and minimises the overlap of concepts among the recommended tags to allow for high coverage of multiple facets. Our approach also attempts to take into account tag popularity and diversity in the recommendations through the consideration of vertex centralities in the tag co-occurrence graph.

Hotho et al. [6] present a graph-based tag recommendation approach called FolkRank, which is an adaptation of PageRank algorithm [12], and is applied in the folksonomy user-resource-tag graph. Its basis is the idea that a resource tagged with important tags by important users becomes important itself. The same holds, symmetrically, for users and tags. Having a graph whose vertices are associated to users, resources and tags, the algorithm reinforces each of them by spreading their weights through the graph edges. In this work, we restrict our study to the original folksonomy graph. As a future research goal, PageRank, HITS or other graph based techniques could be applied to enhance the identification of tags with high graph centrality values.

Jäscke et al. [7] evaluate and compare several tag recommendation algorithms: an adaptation of user-based collaborative filtering [13], FolkRank strategy [6], and methods that are based on counting tag co-occurrences. The authors show that graphbased and collaborative filtering approaches provide better results that nonpersonalised methods, and state that methods based on counting co-occurrences have low computational costs, thus being preferable for real time scenarios. Our approach is computationally cheap because it is based on a simple analysis of tag co-occurrence graphs, and includes a personalisation stage to better adjust the tag recommendations to the user’s profile. 2.3

Hybrid tag recommenders Heymann et al. [5] present a technique that predicts tags for a website based on page text, anchor text, surrounding hosts, and other tags assigned to the website by users. The tag predictions are based on association rules, which, as stated by the authors, may serve as a way to link disparate vocabularies among users, and may indicate synonym and polysemy cases. As a hybrid approach, our tag recommender makes use of content-based and collaborative tag information. Nonetheless, we simplify the process limiting it to the exploitation of meta-information of the contents available in the bookmarks.

Song et al. [15] suggest a tag recommendation method that combines clustering and mixture models. Tagged documents are represented as a triplet (words, documents, tags) by two bipartite graphs. These graphs are clustered into topics by a spectral recursive embedding technique [18]. The sparsity of the obtained clusters is dealt with a two-way Poisson mixture model [9], which groups documents into components and clusters words. Inference for new documents is based on the posterior probability of topic distributions, and tags recommendations are given according to the within-cluster tag rankings. 3 Document and index models To suggest tags for an input bookmark, our recommender exploits meta-information associated to it. The text contents of bookmarked documents (web pages or scientific publications) could be also taken into account, but we decided to firstly study how accurate tag recommendations can be by only using bookmarking meta-information.

In this paper, we presented a novel approach to the challenge of graph-based tag

recommendation in folksonomies. Building on the assumption that all users of a folksonomy have their own tag vocabulary, our approach translates the personomies of users to the global folksonomy vocabulary. Evaluation results show that this translation helps to significantly improve tag prediction performance.

Furthermore, we fine-tuned our model by estimating parameters on a per user basis. Even though this user-centric approach performed rather disappointing during the challenge, we believe that user-level optimization will be essential for the success of future (tag) recommenders.

A Tag Recommendation System based on contents Ning Zhang, Yuan Zhang, and Jie Tang

Knowledge Engineering Group Department of Computer Science and Technology, Tsinghua University, Beijing, China zntsinghua1117@gmail.com, fancyzy0526@gmail.com and

jietang@tsinghua.edu.cn Abstract. Social bookmarking tools become more and more popular nowadays and tagging is used to organize information and allow users to recall or search the resources. Users need to type the tags whenever they post a resource, so that a good tag recommendation system can ease the process of finding some useful and relevant keywords for users. Researchers have made lots of relevant work for recommendation system, but those traditional collaborative systems do not fit to our tag recommendation. In this paper, we present two different methods: a simple language model and an adaption of topic model. We evaluate and compare these two approaches and show that a combination of these two methods will perform better results for the task one of PKDD Challenge 2009. 1 With the event of social resource sharing tools, such as BibSonomy 1 , Flickr 2 , del.icio.us3, tagging has become a popular way to organize the information and help users to find other users with similar interests and useful information within a given category. Tags posted by a user are not only relevant to the content of the bookmark but also to the certain user. According to [3], the collection of a user’s tag assignments is his/her personomy, and folksonomy consists of collections of personomies. From the available training data, we can find that some tags might just be words extracted from the title, some tags might be the concept or main topic of the resource, and other might be very specific to a user.(see Table 1). The last three lines of the table show that the user 293 post tags like swss0603, swss0609, swss0602, which are very specific to the user. Since the test data for task one contains posts whose user, resource or tags are not in the training data, some traditional collaborative recommendation systems might not perform well. It is because most of the collaborative recommendation systems cannot recommend tags which are not in the tag set of the training data. This paper presents our tag recommendation system, 1 http://www.bibsonomy.org 2 http://www.flick.com 3 http://del.icio.us adobe, svg api java, delicious psycholinguistics,

review, workingmemory

SVG: Adobe SourceForge.net: delicious

java

Reassessing Working Memory: Comment on Just and Carpenter (1992) and Waters and Caplan (1996)

A Semantic Web Primer The ABCDE Format Enabling

Semantic Conference

Proceedings Learning of Ontologies for the Web: the Analysis of Existent

Approaches which is a combination of two methods: simple Language model and an adaption of topic model according to [7].

USER

TITLE 37 787 173 293 293 293 SOURCE OF TAGS title title concept or

topic specific to

the user specific to

the user specific to the user

The first method can extract some keywords from the description and other information of the post and they constitute a candidate set. Then we use the relevance of a word and a document to score the words in the candidate set and recommend the words with highest scores. The second method uses an ACT model [7], and it can get some conceptual or topic knowledge of the post. Given a test post, the model can score all the tags which have been posted previously and recommend the tags with highest scores.

These two methods focus on two different aspects. The first method will probably recommend some keywords extracted from the title while the second method uses a probabilistic latent semantic method to recommend tags which are similar to the post in terms of conceptual knowledge. Comparing these two methods, we can find that the tags recommended are always different. Consequently, the combination is a intuitive better way.

This paper is organized as follows: Section 2 reviews recent development in the area of social bookmark tag recommendation systems. Section 3 describes our proposed system and the combination method in details. In section 4, we present and evaluate our experimental results on the test data of ECML PKDD challenge and conclude the results in section 5. The recent rise of Web 2.0 technologies has aroused the interest of many researchers to the tag recommendation system. Some approaches are based on collaborative information. For example, AutoTag[9] and TagAssist[8] use some information retrieval skills to recommend tags for weblog posts. They recommend tags based on the tags posted to the similar weblogs and they cannot recommend new tags which are not in the training file. FolkRank[4,5], which is an adaption of the famous PageRank algorithm, is a graph-based recommendation system. Also, it cannot recommend new tags not in the training file. The experimental results of FolkRank in [5] reveal that the FolkRank can outperform other collaborative methods. But to some extents FolkRank relies on a dense core of training file and it might not be fit to our task.

All the methods mentioned above are based on collaborative information and similarity between users and resources. However, in the cases when there are many new users and resources in the test data (our task one), those methods cannot perform well. In the RSDC ’08 challenge, the participants[1,2] who use methods based on words extracted from the title or semantic knowledge and user’s personomy can outperform other methods. Consequently, we propose our tag recommendation system mainly based on the contents. 3 First, we define notations used in this paper. We group the data in bookmark by its url_hash and data in bibtex by its simhash1. If some posts in bookmark or bibtex file have the same url_hash or simhash1, they are mapped to one resource r. In bookmark, we extract description, extended description while in bibtex, we extract journal, booktitle, description, bibtexAbstract, title and author. We define these information as the description of resource r. For each resource r and each user u who has posted tags to resource r, assuming that its description contains a vector d of Nd words; a vector d of Td tags posted to this resource r by the user ud .Then the training dataset can be represented as

D  {(w1, t1, u1),..., (wD , tD , uD )} Table 2 summarizes the notations. Language model is widely used in natural language processing applications such as speech recognition, machine translation and information retrieval. In our model, first we pick some words to form a candidate set of recommended tags and then score all the words in the candidate set and recommend words with highest scores for our tag Symbol Description

T the collection of tags posted in the training data R the collection of resources posted in the training data

(grouped by the url_hash or simhash1 ) U the collection of users who posted tags in the training data D training data set containing tagged resources.

D={(wi, ti ,ui,)}, which represents a set of pairs of resources and users, with the assigned tags by the corresponding users.

D’ The test data set containing resources and users.

D’={(rj, ui)}{i,j}. Note that: 1) either the user ui or the resource rj may not appear in the training data set.

Nd number of word tokens in the d ∈ D Td number of tags posted by user u to resource r in d ∈ D d vector form of word tokens in d ∈ D d vector form of tags in d ∈ D ud the user in d ∈ D T(u,r) the set of tags that will be recommended for a given user u and a given resource r z hidden topic layer in ACT model

Table 2: Notations. recommendation system. The candidate set C is composed with two subset, C1 and C2, i.e. C = C1 ∪ C2.

We extract useful words from the description of the active resource r∗ in the test data. Then we remove all the characters which are neither numbers nor letters and get rid of the stop words in the English dictionary. The rest words form the part of the candidate set C1. For each t1 ∈ C1, we have the following generative probability: P1(t1 | r* )  Nd  tf (t1, d )  (1 Nd )  tf (t1, D ')

Nd   Nd Nd   ND' (1) where Nd is the number of word tokens in the description d of r*, tf(t1,d) is the word frequency(i.e., occurring number) of word t1 in the description of d of r∗, ND′ is the number of word tokens in the entire test dataset, and tf(t1,D’) is the word frequency of word t1 in the collection D’. λ is the Dirichlet smoothing factor and is commonly set according to the average document length, i.e. ND′ /|D′| in our cases.

As for C2, in order to get more information about the new resource, we take the similarity between resources into consideration and add tags previously posted to the similar resource into C2. The similarity of resource is determined by the url of the resource. Each url can be split into several sections, for example, ‘http://www.kde.cs.uni-kassel.de/ws/dc09’ will be split into three sections: ‘www.kde.cs.uni-kassel.de’, ‘ws’ and ‘dc09’. The similarity between r1 and r2 is defined as follows, sim(r1,r2) = 2^(number of the identical sections of url1 and url2 – maximum number of sections of url1 and url2). For each resource r, we will choose three most similar urls to the url of r and their corresponding resources form the neighbor of the resource r, noted as neigh(r). C2 is compose of the tags previous posted to the neigh(r*). For each t2 ∈ C2 , we have the following generative probability:

n(t2 , r)   (1 

Tr )  n(|tT2, R|) )  sim(r, r* ) (2) where n(t2,r) is the number of times that tag t2 has been posted to the resource r and n(t2,R) is the number of times that tag t2 has been posted in the training data. Tr is the number of tags posted to the resource r and λ is the Dirichlet smoothing factor and is commonly set according to the average document length, i.e. T /|R|

Now, we have the definition of the candidate set C = C1 ∪ C2 and for an active resource r*, we have P1(t1|r*) for t1 ∈ C1 and P2(t2|r*) for t2 ∈ C2 . Then the set of recommended tags will be: T u, r ≔ argmaxtn∈C (P1(t|r*)+ P2(t|r*)) where n is the number of recommended tags. The model we use is called Author-Conference-Topic (ACT) model[7], which is an adaptation of topic model. The initial model was used to simultaneously modeling papers, authors, and publication venues within a unified probabilistic topic model. The model utilizes the topic distribution to represent the inter-dependencies among authors, papers and publication venues. In our task, for each d , d , ud ∈ D, we map the d to conference information, map d to author of the paper, map ud to the publication venue. Consequently, after modeling, we can get probabilistic relations among description, tags and user given a post. In details, we can get these interdependencies: P(z|t), P(w|z) and P(u|z), where z is a hidden topic layer in the topic model. From this model, we can utilize the hidden topic layer to learn some conceptual knowledge of the post. The number of topic z can be set manually. After obtaining the probability of P(z|t), P(w|z) and P(u|z) from the model, we can derive that for each word w ∈ d , each tag t ∈ d, we have:

P(w | t)   P(w | z)P(z | t)

z P(u | t)   P(u | z)P(z | t)

There are two things that needed to be mentioned here. First, if the given user u′ ∉ U, which means the given user is a new user. Then we cannot obtain P(u’|t) in the equation (3), in that case we will set the value to 1 and the equation (3) will become:

P(t)wr ' P(w | t)  P(t)P(r ' | t)  P(t | r ') (4) Because the user is a new one, so that we have nothing about his/her history of posting tags in our training data, in that case equation (4) is reasonable. Secondly, not all the words in the given resource are in our training data, so when calculating equation (3), we will ignore the word which has not appeared in the training data.

The set of recommended tags for a given user u’ and a given resource r’ will be: T u′, r′ ≔ argmaxtn∈T P(t|u′, r′) where n is the number of recommended tags, T is the collection of tags posted in the training file. 3.4 We have proposed two different methods to recommend tags, model one focuses on the useful words extracted from the description or title of the resource while the second model focuses on the conceptual knowledge and probabilistic relations among tags, resource and users. We are interested in the following problem: Can we combine these two models to perform a better result for tag recommendation? Input: a given resource r and a given user u and the result of ATC model P(t),P(w|t) and P(u|t) for all tags, users, words in the training file.

Output: T(u, r)the set of recommended tags begin //Model one ← words in the candidate set C foreach w ∈ 1 do

1 = P1 w r + P2 w r end max_score1 ← maxw∈w1score1[w] //Model two foreach t ∈ T do score2 t = P t P(u|t)

P(w|t) w∈r end

T u, r ≔ argmaxtn∈Tscore[t] end

Algorithm 1: The combined tag recommendation system

We have tried some different approaches to combine these two models. A simple method is to combine the scores of these two models and recommend tags with highest scores after combination (Algorithm 1). We can make use of the two scores calculated in the two approaches and there are two things worthy to be noted here: 1) model one only calculates the scores of tags in the candidate set but model two calculates all the tags t ∈ T. 2) due to the different distribution of scores, we need to normalize the two scores before combination. In order to solve the first problem, we consider all the tags t ∈ C ∪ T where C is the candidate used in the model one. In terms of normalization, we make the score1 ∞ = score2 ∞ and then add these two score, if a tag t is in the candidate set C but not in the T, the score2[t] = 0 and if a tag t is in T but not in C, then the score1[t] =0. 4

Experimental Results We evaluate our experimental results using the evaluation methods provided by the organizers of ECML PKDD discovery challenge 2009. The training set and the test set are strictly divided and we use the cleaned dump as our training set for our tag recommendation system.

Here are some statistical information about training data and test data: There are 1,401,104 tag assignments. 263,004 bookmarks are posted and among which there are 235,328 different url resources while 158,924 bibtex are posted and among which there are 143,050 different publications. From this, we can see that many resources appear just once in the training file. There are 3,617 users and 93,756 tags in all in the training file. The average number of tags posted to bookmark is 3.48 and the average number of tags posted to bibtex is 3.05.

In the test data, there are 43,002 tag assignments, 16,898 posted bookmarks and 26,104 posted bibtex. Among all the posts, there are only 1,693 bookmark resources and 2,239 bibtex resources which are in the training file. The average number of tags posted to bookmark is 3.81 and the average number of tags posted to bibtex is 3.82. The training data is provided by the organizers of the ECML PKDD, we establish three tables, bookmark, bibtex and tas in our MySQL database. In order to get the similarity between resources, we need to preprocess the url field. For each url in the bookmark, we eliminate the prefix such as ‘http://’, ‘https://’ and ‘ftp://’. Then we split the url by the character ‘/’. For example, a url ‘http://www.kde.cs.unikassel.de/ws/dc09’ will be split into ‘www.kde.cs.uni-kassel.de’, ‘ws’ and ‘dc09’. As we mentioned above, we define some information extracted from the table as the description of a resource r. We eliminate the stop words in the English dictionary and stem the words, for example, ‘knowledge’ will be stemmed to ‘knowledg’ and both ‘biology’ and ‘biologist’ will be stemmed to ‘biologi’. 4.3

Results and analysis As performance measures we use precision, recall and f-measure. For a given user u and a given resource r, the true tags are defined as TAG(u,r), then the precision, recall and f-measure of the recommended tags T(u, r) are defined as follows: 1 |TAG(u, r) ∩ T(u, r)| recall T u, r = |U| u∈U |TAG(u, r)| precision T u, r |TAG(u, r) ∩ T(u, r)| 2 × recall × precision recall + precision 4.3.1 Performance of Language model In table 3, we show the performance of Language model on the test data provided by the organizers of ECML PKDD challenge 2009. We show the performance of bookmark, bibtex and the whole data respectively. From the table, we can find that the result of bookmark is better than that of bibtex and we can have a highest fmeasure of 13.949% when the number of recommended tags is 10. 4.3.2 Performance of ACT model In table 4, we show the performance of Language model on the test data provided by the organizers of ECML PKDD challenge 2009. We show the performance of bookmark, bibtex and the whole data respectively. From the table, we can see that the performance of ACT model is worse than the language model. We have the highest fmeasure of 3.077% when the number of recommended tags is set to five. Also, the performance of bookmark is a little bit better than the bibtex and the reason might be that description in bibtex has some information which is irrelevant to the main topic of the publication. In table 5, we show the performance after the combination of these two models. From the table, we can see that after combination, our recommendation system works a little better than the Language model and has a highest f-measure of 14.398% when recommending five tags. The result after combination is shown in Fig. 1, together with the results of the previous two methods. 5 In this paper, we describe our tag recommendation system for the first task in the ECML PKDD Challenge 2009. We exploit two different models to recommend tags. The experimental results show that the Language model works much better than the ACT model and the combination of these two methods can improve the results.

Fig.1 Recall and precision of tag recommendation system However, we have an unexpected result of the poor ACT model, from the previous test using the separated test data from the training file, the ACT model can have a fmeasure over 20%.

We need to further analyze the results to see why ACT has a poor result for the test data. Also, we can try to change the scoring scheme or expand the candidate set in the language model. Future work also includes some new methods of combination.

Database: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Database, Warswa, Poland, September 17-21, 2007, Proceedings, volume 4702 of LNCS, pages 506-514. Springer, 2007.

Mark Steyvers, Padhraic Smyth, and Thomas Griffiths. Probabilistic Author-Topic Models for Information Discovery. In Proc. of KDD’04.

Jie Tang, Ruoming Jin, and Jing Zhang. A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search. In Proceeding of 2008 IEEE international Conference on Data Mining(ICDM’2008). pp. 1055-1060.

Sanjay C. Sood, Sara H.Owsley, Kristian J.Hammond, and Larry Birnbaum TagAssist: Automatic tag suggestion for blog posts. In Proc. the International Conference on Weblogs and Social Media(ICWSM 2007),2007.

Gilad Mishne. Autotag: a collaborative approach to automated tag assignment for weblog posts. In WWW’06: Proceedings of the 15th international conference on World Wide Web, pages 953-954, New York, NY, USA, 2006. ACM Press. A Collaborative Filtering Tag Recommendation System based on Graph Yuan Zhang, Ning Zhang, and Jie Tang

Knowledge Engineering Group Department of Computer Science and Technology, Tsinghua University, Beijing, China fancyzy0526@gmail.com, zntsinghua1117@gmail.com and

jietang@tsinghua.edu.cn Abstract. With the rapid development of web2.0 technologies, tagging become much more important today to organize information and help users search the information they need with social bookmarking tools. In order to finish the second task of ECML PKDD challenge 2009, we propose a graph-based collaborative filtering tag recommendation system. We also refer to an algorithm called FolkRank, which is an adaptation of the famous Page Rank. We evaluate and compare these two approaches and show that a combination of these two methods will perform better results for our task. 1 Tagging is very useful for users to figure out other users with similar interests within a given category. Users with similar interests might post similar tags and similar resources might have similar tags posted to them. Collaborative filtering is widely used in automatic prediction system. The idea behind it is very simple: those who agreed in the past tend to agree again in the future. Traditional collaborative filtering systems have two steps. The first step is to look for users who share the same rating patterns with the active user whom the prediction is for. Then, the systems will use the ratings from those like-minded users found in the first step to calculate a prediction for the active user. Since all the tags, users and resources in the test data are also in the training file, we can make use of the history of users’ tag, also called personomy[3] and tags previously posted to the resource to recommend tags for a active post. This paper presents our proposed tag recommendation system, which is a combination of two methods: one is an adaption of item-based collaborative filtering, the other is FolkRank according to [4,5].

As we mentioned above, collaborative filtering performs well for automatic prediction. However, current widely used collaborative filtering systems are for predicting the ratings of some products or recommend some products to users. For example, the famous websites, Amazon.com1, Last.fm2, eBay3 apply this method to their recommendation systems. Our first method considers the tags previously posted to the resource and users’ similarities to recommend tags. The second method is an application of the FolkRank algorithm in [4, 5].

These two methods have some common features. They both use the history of the user and tags previously posted to resource for recommendation. They are both suitable to the case that test data are in the training data. Both of them do not need to establish models in advance. But they are different to some extents. The first method just considers tags in the candidate set while the FolkRank will consider all the tags in the training data. Moreover, the first method focuses more on collaborative information while the second focuses on the graph information.

This paper is organized as follows: Section 2 introduces recent trends in the area of social bookmark tag recommendation systems. Section 3 describes our proposed system and the combination method in details. In Section 4, we present and evaluate our experimental results on the test data of ECML PKDD challenge 2009 and make some conclusions in Section 5.

Related work Some researchers have already used some approaches based on collaborative information for tag recommendation systems. For example, AutoTag[7] and TagAssist[6] make use of information retrieval skills to recommend tags for weblog posts. They recommend tags based on the tags posted to the similar weblogs. Our first method is similar to these two approaches.

FolkRank in[4, 5] is a topic-specific ranking in folksonomies. The key idea of FolkRank algorithm is that a resource which is tagged with important tags by important users becomes important itself. In [5], the author compared the performance of some baseline methods and his FolkRank algorithm, and found that FolkRank outperformed other methods. His experimental results relied on a dense core of the training file and considering that our training data is a post-core two dataset, we decide to refer to this algorithm in our proposed tag recommendation system.

In the RSDC ’08 challenge, the participants [1, 2] who make use of resource’s similarities and users’ personomy outperformed other approaches. Consequently, we consider using the collaborative information of resource’s similarities and users’ personomy in our tag recommendation system. 3.1

Notations First, we define notations used in this paper. We group the data in bookmark by its url_hash and data in bibtex by its simhash1. If some posts in bookmark of bibtex file have the same url_hash or simhash1, they are mapped to one resource r. For each resource rd , assuming a vector d of Td tags posted to this resource rd by the user ud . Then the training dataset can be represented as

D  {(r1, t1, u1),..., (rD , tD , uD )} Table1 summarizes the notation.

Symbol

T R U D D’ Nd Tr d ud C(u, r) T(u,r) n(t, r) Our proposed collaborative filtering method for tag recommendation has two steps. First of all, for a given resource r and a given user u in the test dataset, we make use of the tags previously posted to the resource r in the training dataset and define them as the candidate set:

C(u, r)  {ti | (r, ti , u ')  D, u ' U} The second step is to score all the tags in the candidate set and recommend the tags with the highest scores. In our proposed tag recommendation system, we score the tags in the candidate set using the following equation for all tags t∈ C(u, r):

Tr   P(t | r)   n(t, r)  (1  Tr )  n(t, R) (1)

Tr Tr   | T | where n(t,r) is the number of times that tag t has been posted to the resource r and n(t,R) is the number of times that tag t has been posted in the training data. Tr is the number of tags posted to the resource r and λ is the Dirichlet smoothing factor and is commonly set according to the average document length, i.e. T /|R|

In order to take the users’ similarities into consideration, we change the equation (1) to the following equation:

P(t | r, u)  u'U ' (sim(u, u ')  m(t, u, r))

tT tT

For a given user u and a given resource r, the set of recommended tags will be: T u, r : = argtn∈C u,r P(t|r, u) where n is the number of recommended tags.

n(t, u)  n(t, u ') n(t, u ')2 sim(u, u ')  3.3

FolkRank algorithm FolkRank is a graph-based algorithm whose basic idea is to rank all the tags and pick out tags which are relatively important given a user u and a resource r. This algorithm is derived from the PageRank algorithm, which is used by the Google Internet search engine that assigns a numerical weighting to each element of a hyperlinked set of documents. The purpose of PageRank is to measure the hyperlink’s relative importance within the set. However, due to the structural differences between hyperlinks and our tag recommendation system, we cannot apply the PageRank to our tag recommendation system and a new FolkRank algorithm was introduced in [4, 5].

In order to apply a weight-spreading ranking scheme to recommend tags, we need to change the directed graph in PageRank to an undirected graph and change the corresponding ranking approach.

First, we convert the training dataset D into an undirected graph G = (V, E). V is the set of the nodes in the graph, which is composed of all the tags, resources and users in the training file, i.e. V = T ∪ R ∪ U. E is the set of the edges in the graph, which is defined as the co-occurrences of tags and users, users and resources, tags and resources. E = u. t , t, r , u, r {r, t, u} ∈ D} and each edge {u, t} ∈ E has a weight | r ∈ R r, t, u ∈ D |, each edge t, r ∈ E has a weight | u ∈ U|{r, t, u} ∈ D | and each edge u, r ∈ E has a weight | t ∈ T|{r, t, u} ∈ D | . After having the graph format of the posts, we can spread the weight like PageRank as follows:    w  dAw  (1 d) p (3) where A is the adjacency matrix of G, p is the random surfer component, and d ∈ [0,1] is a constant which controls the influence of the random surfer.

Usually, p is set to the vector where all values equal to 1. But in order to recommend tags relevant to certain user and certain resource, we can change the p to express user preferences. In our tag recommendation system, each user, tag, and resource get a preference weight of 1 but the active user and resource for recommendation get a preference of 1+|U| and 1+|R| respectively.

The FolkRank algorithm has a differential approach to see the ranking around the topics defined in the preference vector. This approach is to compare the rankings with and without the preference vector p. Assuming that 0 is the ranking after iteration with d = 1 while 1 is the ranking after iteration with d =0.625, then the final weight will be = − 0. Details can be found in Algorithm 1.

Input: the graph information of the training file, i.e. G = (V, E) where V =T ∪ R ∪ U and E = u. t , t, r , u, r {r, t, u} ∈ D}, the adjacency matrix A, the given resource r and the given user u.

Output: the ranking w of all tags ∈ T begin //Initialize foreach t ∈ T, r ∈ R and u ∈ U do

w0[t] = w1[t]=1,w0[r]= w1[r]=2 and w0[u] = w1[u] =2 end p[r] = 1+|R| p[u]= 1+|U| d = 0.625 //iteration for 1 repeat

w1 = dAw1 + (1 − d)p until convergence //iteration for 0 repeat

w0 = Aw0 until convergence w = w1 − w0 end

Algorithm 1: The FolkRank algorithm used in our tag recommendation system 3.4 We have proposed two different but similar methods for our tag recommendation system. Both are suitable to our case that the test data have already appeared in the training file, both make use of the similarity of users and resources, but the first method focuses more on the collaborative information while the second one focus more on the graph nodes and can spread the weight according to the co-occurrences. We hope to combine these two methods and get a better result.

We have tried some different approaches to combine these two methods. A simple method of combination is to multiply the scores of these two models and recommend tags with highest scores after combination. Details can be found in Algorithm 2.

Input: a given resource r and a given user u and the result of the two methods Output: the set of recommended tags T(u, r) begin //collaborative method the candidate set C u, r ← {t|(r, t, u′) ∈ D, u′ ∈ U} foreach t ∈ C do

score1[t] = P(t|r,u) in equation(2) end //FolkRank algorithm foreach t ∈ T do

score2[t] = w, the output of the algorithm 1 end //combination foreach t∈ T do

score[t] = score1[t]×score2[t] Algorithm 2: the combination method used in our tag recommendation system end

T u, r ≔ argmaxtn∈Tscore[t] end 4

Experimental Results We evaluate our experimental results using the evaluation methods provided by the organizers of ECML PKDD discovery challenge 2009. The training set and the test set are strictly divided and we use the post-core level 2 training file as our training dataset for our tag recommendation system.

The general statistical information of training data and test data can be found in the table 2 and table 3. bookmark bibtex total As performance measures we use precision, recall and f-measure. For a given user u and a given resource r, the true tags are defined as TAG(u,r), then the precision, recall and f-measure of the recommended tags T(u, r) are defined as follows: 1 |TAG(u, r) ∩ T(u, r)| recall T u, r = |U| u∈U |TAG(u, r)| 1 |U| |TAG(u, r) ∩ T(u, r)|

|T(u, r)| f − measure T u, r 2 × recall × precision recall + precision 4.2.1 Performance of Collaborative Filtering method In table 4, we show the performance of collaborative filtering method on the test data provided by the organizers of ECML PKDD challenge 2009. From the table, we can see that this method has a highest f-measure of 30.002% when the number of recommended tags is 5. 4.2.2 Performance of FolkRank method In table 4, we show the performance of FolkRank algorithm on the test data. From the table, we can find that the first method performs a little bit better than FolkRank and FolkRank has a highest f-measure of 28.837% when the number of recommended tags is 4.

Performance of combination In table 4, we show the performance after the combination of the previous two methods. We are glad to see that the results after combination outperform these two methods. We have a 2% increase compared to the first method and a 4% increase compared to the second method. We have a highest f-measure of 32.622% when recommending 10 tags. The precision-recall plot in Fig.1 reveals the quality of our recommendation system.

Fig.1 Recall and precision of tag recommendation system In this paper, we describe our tag recommendation system for the second task in the ECML PKDD Challenge 2009. We exploit two different methods to recommend tags when tags, resources, users in the test data are also in the training file. The experimental results show that the combination of these two methods will gain a better result.

We need to further analyze the results to see which kind of information in the graph contributes more to the final ranking. Also, we can try to change the scoring scheme or expand the candidate set in our collaborative filtering method. Future work also includes some adaptations of PageRank for the tag recommendation system. References

Marta Tatu, Munirathnam Srikanth, and Thomas D’Silva. RSDC’08: Tag Recommendations using Bookmark Content. In Proc. of ECML PKDD Discovery Challenge (RSDC 08), pp. 96-107 Marek Lipczak. Tag Recommendation for Folksonomies Oriented towards Individual Users. In Proc. of ECML PKDD Discovery Challenge(RSDC 08), pp. 84-95 Andreas Hotho, Robert Jaschke, Chiristoph Schmitz, and Gerd Stumme. BibSonomy: A Social Bookmark and Publication Sharing System. In Proc. the First Conceptual Structures Tool Interoperability Workshop at the 14th Int. Conf. on Conceptual Structures, pages 87-102, Aalborg, 2006. Aalborg Universitetsforlag.

Andreas Hotho, Robert Jaschke, Christoph Schmitz, and Gerd Stumme. FolkRank: A Ranking Algorithm for Folksonomies. In Proc. of FGIR 2006 Robert Jaschke, Leandro Marinho, Andreas Hotho, Lars Schmidt-Thieme, and Gred Stumme. Tag Recommendations in Folksonomies. In Knowledge Discovery in Database: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Database, Warswa, Poland, September 17-21, 2007, Proceedings, volume 4702 of LNCS, pages 506-514. Springer, 2007.

Gilad Mishne. Autotag: a collaborative approach to automated tag assignment for weblog posts. In WWW’06: Proceedings of the 15th international conference on World Wide Web, pages 953-954, New York, NY, USA, 2006. ACM Press.Web, pages 953-954, New York, NY, USA, 2006. ACM Press.

Marta

Tatu , Munirathnam Srikanth, and Thomas D'Silva . RSDC' 08 : Tag Recommendations using Bookmark Content . In Proc. of ECML PKDD Discovery Challenge (RSDC 08) , pp. 96 - 107 Marek Lipczak. Tag Recommendation for Folksonomies Oriented towards Individual Users . In Proc. of ECML PKDD Discovery Challenge(RSDC 08) , pp. 84 - 95 Andreas Hotho, Robert Jaschke, Chiristoph Schmitz, and Gerd Stumme. BibSonomy: A Social Bookmark and Publication Sharing System . In Proc. the First Conceptual Structures Tool Interoperability Workshop at the 14th Int. Conf. on Conceptual Structures , pages 87 - 102 , Aalborg, 2006 . Aalborg Universitetsforlag.

Andreas

Hotho , Robert Jaschke, Christoph Schmitz, and Gerd Stumme. FolkRank: A Ranking Algorithm for Folksonomies . In Proc. of FGIR 2006 Robert Jaschke , Leandro Marinho, Andreas Hotho, Lars Schmidt-Thieme, and

Gred

Stumme . Tag Recommendations in Folksonomies . In Knowledge Discovery in