-

STaR: a Social Tag Recommender System

Cataldo Musto

musto@di.uniba.it 0

Fedelucio Narducci

narducci@di.uniba.it 0

Marco de Gemmis

degemmis@di.uniba.it 0

Pasquale Lops

lops@di.uniba.it 0

Giovanni Semeraro

semeraro@di.uniba.it 0 0 Department of Computer Science, University of Bari \Aldo Moro" , Italy

The continuous growth of collaborative platforms we are recently witnessing made possible the passage from an `elitary' Web, written by few and read by many, towards the so-called Web 2.0, a more `user-centric' vision, where users become active contributors in Web dynamics. In this context, collaborative tagging systems are rapidly emerging: in these platforms users can annotate resources they like with freely chosen keyword (called tags) in order to make retrieval of information and serendipitous browsing more and more easier. However, as tags are handled in a simply syntactical way, collaborative tagging systems su er of typical Information Retrieval (IR) problems like polysemy and synonymy: so, in order to reduce the impact of these drawbacks and to aid at the same time the so-called tag convergence, systems that assist the user in the task of tagging are required. The goal of these systems (called tag recommenders) is to suggest a set of relevant keywords for the resources to be annotated by exploiting di erent approaches. In this paper we present a tag recommender developed for the ECML-PKDD 2009 Discovery Challenge. Our approach is based on two assumptions: rstly, if two or more resources share some common patterns (e.g. the same features in the textual description), we can exploit this information supposing that they could be annotated with similar tags. Furthermore, since each user has a typical manner to label resources, a tag recommender might exploit this information to weigh more the tags she already used to annotate similar resources.

Recommender Systems Web 2 0 Collaborative Tagging Systems Folksonomies

The coming of Web 2.0 has changed the role of Internet users and the shape of services o ered by the World Wide Web. Since web sites tend to be more interactive and user-centric than in the past, users are shifting from passive consumers of information to active producers. By using Web 2.0 applications, users are able to easily publish content such as photos, videos, political opinions, reviews, so they are identi ed as Web prosumers : producers + consumers of knowledge. One of the forms of user-generated content (UGC) that has drawn more attention from the research community is tagging, which is the act of annotating resources of interests with free keywords, called tags, in order to help users in organizing, browsing and searching resources through the building of a sociallyconstructed classi cation schema, called folksonomy [18]. In contrast to systems where information about resources is only provided by a small set of experts, collaborative tagging systems take into account the way individuals conceive the information contained in a resource [19]. Well-known example of platforms that embed tagging activity are Flickr1 to share photos, YouTube2 to share videos, Del.icio.us3 to share bookmarks, Last.fm4 to share music listening habits and Bibsonomy5 to share bookmarks and lists of literature. Although these systems provide heterogeneous contents, they have a common core: once a user is logged in, she can post a new resource and choose some signi cant keywords to identify it. Besides, users can label resources previously posted from other users. This phenomenon represents a very important opportunity to categorize the resources on the web, otherwise hardly feasible. The act of tagging resources from di erent users is the social aspect of this activity; in this way tags create a connection among users and items. Users that label the same resource by using the same tags could have similar tastes and items labeled with the same tags could have common characteristics.

Many would argue that the power of tagging lies in the ability for people to freely determine the appropriate tags for a resource without having to rely on a prede ned lexicon or hierarchy [ 11 ]. Indeed, folksonomies are fully free and re ect the user mind, but they su er of the same problems of unchecked vocabulary. Golder et. al. [ 5 ] identi ed three major problems with current tagging systems: polysemy, synonymy, and level variation. Polysemy refers to situations where tags can have multiple meanings: for example a resource tagged with the term turkey could indicate a news taken from an online newspaper about politics or a recipe for Thanksgiving' Day. When multiple tags share a single meaning we refer to it as synonymy. In collaborative tagging systems we can have simple morphological variations (for example we can nd `blog', `blogs', `web log', to identify a common blog) but also semantic similarity (like resources tagged with `arts' versus `cultural heritage'). The third problem, called level variations, refers to the phenomenon of tagging at di erent level of abstraction. Some people can annotate a web page containing a recipe for roast turkey with the tag `roastturkey' but also with a simple `recipe'.

In order to avoid these problems, in the last years many tools have been developed to facilitate the user in the task of tagging and to aid the tag convergence [ 4 ]: these systems are know as tag recommenders. When a user posts a resource in a Web 2.0 platform, a tag recommender suggests some signi cant keywords to label the item following some criteria to lter out the noise from the complete tag space.

1 http://www. ickr.com 2 http://www.youtube.com 3 http://delicious.com/ 4 http://www.last.fm/ 5 http://www.bibsonomy.org/

This paper presents STaR (Social Tag Recommender system), a tag recommender system developed for the ECML-PKDD 2009 Discovery Challenge. The idea behind our work is that folksonomies create connections among users and items, so we tried to point out two concepts: { Resources with similar content could be annotated with similar tags; { A tag recommender needs to take into account the previous tagging activity of users, by weighting more tags already used to annotate similar resources.

In this work we identify two main aspects in the tag recommendation task: rstly, each user has a typical manner to label resources (for example using personal tags such as `beautiful', `ugly', `pleasant', etc. which are not connected to the content of the item, or simply tagging using general tags like `politics', `sport', etc.); next, similar resources usually share common tags: when a user posts a resource r on the platform, our system takes into account how she (if she is already stored in the system) and the entire community previously tagged resources similar to r in order to suggest relevant tags. Next, we develop this model and we tested it on a dataset extracted from BibSonomy.

The paper is organized as follows. Section 2 analyzes related work. The general problem of tag recommendation is introduced in Section 3. Section 4 explains the architecture of the system and how the recommendation approach is implemented. The experimental section carried out is described in Section 5.1, while conclusions and future works are drawn in last section. 2

Related Work

Previous work in the tag recommendation area can be broadly divided into three classes: content-based, collaborative and graph-based approaches.

In the content-based approach, a system exploits some textual source with Information Retrieval-related techniques [ 1 ] in order to extract relevant unigrams or bigrams from the text. Brooks et. al [ 3 ], for example, develop a tag recommender system that automatically suggests tags for a blog post extracting the top three terms exploiting TF/IDF scoring [ 14 ]. The system presented by Lee and Chun [ 8 ] recommends tags retrieved from the content of a blog using arti cial neural networks. The network is trained based on statistical information about word frequencies and lexical information about word semantics extracted from WordNet. The collaborative approach for tag recommendation, instead, presents some analogies with collaborative ltering methods [ 2 ]. In the model proposed by Mishne and implemented in AutoTag [ 12 ], the system suggests tags based on the other tags associated with similar posts in a given collection. The recommendation process is performed in three steps: rst, the tool nds similar posts and extracts their tags. All the tags are then merged, building a general folksonomy that is ltered and reranked. The top-ranked tags are suggested to the user, who selects the most appropriate ones to attach to the post. TagAssist [ 16 ] improves the AutoTags' approach performing a lossless compression over existing tag data. It nds similar blog posts and suggests a subset of the associated tag through a Tag Suggestion Engine (TSE) which leverages previously tagged posts providing appropriate suggestions for new content. In [ 10 ] the tag recommendations task is performed through a user-based collaborative ltering approach. The method seems to produce good results when applied on the user-tag matrix, so they show that users with a similar tag vocabulary tend to tag alike. The problem of tag recommendation through graph-based approaches has been rstly addressed by Jaschke et al. in [ 7 ]. They compared some recommendation techniques including collaborative ltering, PageRank and FolkRank. The key idea behind FolkRank algorithm is that a resource which is tagged by important tags from important users becomes important itself. The same concept holds for tags and users, thus the approach uses a graph whose vertices mutually reinforce themselves by spreading their weights. The evaluation showed that FolkRank outperforms other approaches. Schmitz et al. [ 15 ] proposed association rule mining as a technique that might be useful in the tag recommendation process. In literature we can nd also some hybrid methods integrating two or more approaches (mainly, content and collaborative ones) in order to reduce their typical drawbacks and point out their qualities. Heymann et. al [ 6 ] present a tag recommender that exploits at the same time social knowledge and textual sources. They suggest tags based on page text, anchor text, surrounding hosts, adding tags used by others users to label the URL. The e ectiveness of this approach is also con rmed by the use of a large dataset crawled from del.icio.us for the experimental evaluation. A hybrid approach is also proposed by Lipczak in [ 9 ]. Firstly, the system extracts tags from the title of the resource. Afterwards, based on an analysis of co-occurrences, the set of candidate tags is expanded adding also tags that usually co-occur with terms in the title. Finally, tags are ltered and reranked exploiting the information stored in a so-called "personomy", the set of the tags previously used by the user.

Finally, in [ 17 ] the authors proposed a model based on both textual content and tags associated with the resource. They introduce the concept of con ated tags to indicate a set of related tag (like blog, blogs, ecc.) used to annotate a resource. Modeling in this way the existing tag space they are able to suggest various tags for a given bookmark exploiting both user and document models. They win the previous edition of the Tag Recommendation Challenge. 3

Description of the Task

STaR has been designed to participate at the ECML-PKDD 2009 Discovery Challenge6. In this section we will rstly introduce a formal model for recommendation in folksonomies, then we will analyze the speci c requirements of the task proposed for the Challenge.

6 http://www.kde.cs.uni-kassel.de/ws/dc09

3.1

Recommendation in Folksonomies

A collaborative tagging system is a platform composed of users, resources and tags that allows users to freely assign tags to resources. Following the de nition introduced in [ 7 ], a folksonomy can be described as a triple (U; R; T ) where: { U is a set of users; { R is a set of resources ; { T is a set of tags.

We can also de ne a tag assignment function tas: U R ! T . The tag recommendation task for a given user u 2 U and a resource r 2 R can be nally described as the generation of a set of tags tas(u; r) T according to some relevance model. In our approach these tags are generated from a ranked set of candidate tags from which the top n elements are suggested to the user. 3.2

Description of the ECML-PKDD 2009 Discovery Challenge

The 2009 edition of the Discovery Challenge consists of three recommendation tasks in the area of social bookmarking. We compete for the rst task, contentbased tag recommendation, whose goal is to exploit content-based recommendation approaches in order to provide a relevant set of tags to the user when she submits a new item (Bookmark or BibTeX entry) into Bibsonomy.

The organizers make available a training set with some examples of tag assignment: the dataset contains 263,004 bookmark posts and 158,924 BibTeX entries submitted by 3,617 di erent users. For each of the 235,328 di erent URLs and the 143,050 di erent BibTeX entries were also provided some textual metadata (such as the title of the resource, the description, the abstract and so on).

Each candidate recommender is evaluated by comparing the real tags (namely, the tags a user adopts to annotate an unseen resource) with the suggested ones. The accuracy is nally computed using classical IR metrics, such as Precision, Recall and F1-Measure (Section 5.1).

By analyzing the aforementioned requirements, we designed STaR thinking at a prediction task rather than a recommendation one. Consequently, we will try to emphasize the previous tagging activity of the user, also looking for connections and patterns among resources. All these decisions will be thoroughly analyzed in the next section describing the architecture of STaR. 4

STaR: a Social Tag Recommender System

STaR (Social Tag Recommender) is a content-based tag recommender system, developed at the University of Bari. The inceptive idea behind STaR is to improve the model implemented in systems like TagAssist [ 16 ] or AutoTag [ 12 ]. Although we agree with the idea that resources with similar content could be annotated with similar tags, in our opinion Mishne's approach presents two important drawbacks: 1. The tag reranking formula simply performs a sum of the occurrences of each tag among all the folksonomies, without considering the similarity with the resource to be tagged. In this way tags often used to annotate resources with a low similarity level could be ranked rst. 2. The proposed model does not take into account the previous tagging activity performed by users. If two users bookmarked the same resource, they will receive the same suggestions since the folksonomies built from similar resources are the same.

We will try to overcome these drawbacks, by proposing an approach based on the analysis of similar resources capable also of weighting more the tags already selected by the user during her previous tagging activity. Figure 1 shows the general architecture of STaR. The recommendation process is performed in four steps, each of which is handled by a separate component.

4.1 Indexing of Resources

Given a collection of resources (corpus), a preprocessing step is performed by the Indexer module, which exploits Apache Lucene7 to perform the indexing step. As regards bookmarks we indexed the title of the web page and the extended description provided by users. For the BibteX entries we indexed the title of the publication and the abstract. Let U be the set of users and N the cardinality of this set, the indexing procedure is repeated N + 1 times: we build an index for each user (Personal Index ) storing the information on her previously tagged resources and an index for the whole community (Social Index ) storing the information about all the resources previously tagged by the community.

7 http://lucene.apache.org

Following the de nitions presented in Section 3.1, given a user u 2 U we de ne P ersonalIndex(u) as:

P ersonalIndex(u) = fr 2 Rj9t 2 T : tas(u; r) = tg where tas is the tag assignment function tas: U R ! T which assigns tags to a resource annotated by a given user. SocialIndex represents the union of all the user personal indexes:

N SocialIndex = [ P ersonalIndex(ui)

i=1 4.2

Retrieving of Similar Resources

At the end of the preprocessing step STaR is able to take into account users requests. Every user interacts with STaR by providing information about a resource to be tagged. In the Query Processing step the system acquires data about the user (her language, the tags she uses more, the number of tags she usually uses to annotate resources, etc.) before processing (through the elimination of not useful characters and punctuation) and submitting the query against the SocialIndex stored in Lucene. If the user is recognized by the system since it has previously tagged some other resources, the same query is submitted against her own PersonalIndex, as well. We used as query the title of the web page (for bookmarks) or the title of the publication (for BibTeX entries). In order to improve the performances of the Lucene Querying Engine we replaced the original Lucene Scoring function with an Okapi BM25 implementation8. BM25 is nowadays considered as one of the state-of-the art retrieval models by the IR community [ 13 ].

Let D be a corpus of documents, d 2 D, BM25 returns the top-k resources with the highest similarity value given a resource r (tokenized as a set of terms t1 : : : tm), and is de ned as follows:

m sim(r; d) = X i=1 k1((1 b) + b ntri avlgeLnegnthgrthr ) + nr ti idf (ti) (3) where ntri represents the occurrences of the term ti in the document d, lengthr is the length of the resource r and avgLengthr is the average length of resources in the corpus. Finally, k1 and b are two parameters typically set to 2:0 and 0:75 respectively, and idf (ti) represents the inverse document frequency of the term ti de ned as follows: idf (ti) = log

N + df (ti) + 0:5

df (ti) + 0:5

8 http://nlp.uned.es/ jperezi/Lucene-BM25/

(1) (2) (4) where N is the number of resources in the collection and df (ti) is the number of resources in which the term ti occurs.

Given user u 2 U and a resource r, Lucene returns the resources whose similarity with r is greater or equal than a threshold . To perform this task Lucene uses both the PersonalIndex of the user u and the SocialIndex. More formally:

P ersonalRes(u; q) = fr 2 P ersonalIndex(u)jsim(q; r) g SocialRes(q) = fr 2 SocialIndexjsim(q; r) g In the next step the Tag Extractor gets the most similar resources returned by the Apache Lucene engine and produces the set of candidate tags to be suggested, by computing for each tag a score obtained by weighting the similarity (5) (6) candT agss(q) = ft 2 T jt = T AS(u; r) ^ r 2 SocialRes(q) ^ u 2 U g In the same way we can compute the relevance of each tag with respect to the query q as: (7) (8) (9) (10) (11) (12) relp(t; u; q) =

t r2P ersonalRes(u;q) nr nt

sim(r; q) rels(t; q) =

t r2SocialRes(q) nr nt sim(r; q) where ntr is the number of occurrences of the tag t in the annotation for resource r and nt is the sum of the occurrences of tag t among all similar resources.

Finally, the set of Candidate Tags can be de ned as: score returned by Lucene with the normalized occurrence of the tag. If the Tag Extractor also gets the list of the most similar resources from the user PersonalIndex, it will produce two partial folksonomies that are merged, assigning a weight to each folksonomy in order to boost users' previously used tags.

Formally, for each query q (namely, the resource to be tagged), we can de ne a set of tags to recommend by building two sets: candT agsp and candT agss. These sets are de ned as follows: candT agsp(u; q) = ft 2 T jt = T AS(u; r) ^ r 2 P ersonalRes(u; q)g candT ags(u; q) = candT agsp(u; q) [ candT agss(q) where for each tag t the global relevance can be de ned as: rel(t; q) = relp(t; q) + (1 ) rels(t; q) where (PersonalTagWeight) and (1 ) (SocialTagWeight) are the weights of the personal and social tags respectively.

Figure 3 depicts the procedure performed by the Tag Extractor : in this case we have a set of 4 Social Tags (Newspaper, Online, Football and Inter) and 3 Personal Tags (Sport, Newspaper and Tuttosport). These sets are then merged, building the set of Candidate Tags. This set contains 6 tags since the tag newspaper appears both in social and personal tags. The system associates a score to each tag that indicates its e ectiveness for the target resource. Besides, the scores for the Candidate Tags are weighted again according to SocialTagWeight ( ) and PersonalTagWeight (1 ) values (in the example, 0:3 and 0:7 respectively), in order to boost the tags already used by the user in the nal tag rank. Indeed, we can point out that the social tag `football' gets the same score of the personal tag `tuttosport', although its original weight was twice. The Tag Extractor produces the set of the Candidate Tags, a ranked set of tags with their relevance scores. This set is exploited by the Filter, a component which performs the last step of the recommendation task, that is removing those tags not matching speci c conditions: we x a threshold for the relevance score between 0.20 to 0.25 and we return at most 5 tags. These parameters are strictly dependent from the training data.

Formally, given a user u 2 U , a query q and a threshold value , the goal of the ltering component is to build recommendation(u; q) de ned as follows: recommendation(u; q) = ft 2 candT ags(u; q)jrel(t; q) > g (13)

In the example in Figure 3, setting a threshold suggest the tags sport and newspaper. = 0:20, the system would 5 5.1

Experimental Evaluations Experimental Session

In this experiment we measure the performance of STaR in the Task 1 of the ECML-PKDD 2009 Discovery Challenge. This experimental evaluation was carried out according to the instructions provided from the organizers of the Challenge 2009. The test set was released 48 hours before the end of the competition. Every participant uploaded a le containing the tag predictions, and for each post only ve tags were considered. F1-Measure was used to evaluate the accuracy of recommendations, thus for each post Precision and Recall were computed by comparing the recommended tags with the true tags assigned by the users. The case of tags was ignored and all characters which are neither numbers nor letters were removed. Results are presented in Table 1.

STaR nished the ECML-PKDD Discovery Challenge 2009 with an overall F-measure of 13:55. As showed in the table above, exploiting only the rst recommended tag the system reaches almost 20% in precision. The value of the recall increases with the number of recommended tags reaching the 13.5% in the fourth and fth tag. In the future we will perform a more in-depth study in order to compare the predictive accuracy of STaR with di erent con gurations of parameters. 6

Conclusions and Future Work

In this paper we presented STaR, a tag recommender designed and implemented to participate to the ECML-PKDD 2009 Discovery Challenge. The idea behind our work was to discover similarity among resources in order to exploit communities and user tagging behavior. In this way our recommender system was able to suggest tags for users and items still not stored in the training set. The experimental sessions showed that users tend to reuse their own tags to annotate similar resources, so this kind of recommendation model could bene t from the use of the user personal tags before extracting the social tags of the community (we called this approach user-based).

In the future we will implement a methodology to suggest tags when the set of similar items returned by Lucene is empty. The system should be able to extract signi cant keywords from the textual content associated to a resource (title, description, etc.) that has not similar items, maybe exploiting structured data or domain ontologies. Another issue to investigate is the application of our methodology in di erent domains such as multimedia environment. In this eld discovering similarity among items just on the ground of textual content could be not su cient. Finally, textual content su ers from syntactic problems like polysemy (a keyword with two or more meanings) and synonymy (two or more keywords with the same meaning). These problems hurt the performance of the recommender. We will try to establish if a semantic document indexing could improve the performance of the recommender. 18. Thomas Vander Wal. Folksonomy coinage and de nition. Website, Februar 2007.

http://vanderwal.net/folksonomy.html. 19. Harris Wu, Mohammad Zubair, and Kurt Maly. Harvesting social knowledge from folksonomies. In HYPERTEXT '06: Proceedings of the seventeenth conference on Hypertext and hypermedia, pages 111{114, New York, NY, USA, 2006. ACM Press.

Baeza-Yates and

Ribeiro-Neto . Modern Information Retrieval . AddisonWesley , 1999 .

Billsus and

M. J.

Pazzani . Learning collaborative information lters . In Proceeding of the 15th International Conference on Machine Learning , pages 46 { 54 . Morgan Kaufmann, San Francisco, CA, 1998 .

C. H.

Brooks and

Montanez . Improved annotation of the blogosphere via autotagging and hierarchical clustering . In WWW '06: Proceedings of the 15th international conference on World Wide Web , pages 625 { 632 , New York, NY, USA, 2006 . ACM Press.

Cattuto ,

Schmitz ,

Baldassarri ,

V. D. P.

Servedio ,

Loreto ,

Hotho ,

Grahl , and

Stumme . Network properties of folksonomies . AI Communications , 20 ( 4 ): 245 { 262 , December 2007 .

Golder and

B. A.

Huberman . The Structure of Collaborative Tagging Systems . Journal of Information Science , 32 ( 2 ): 198 { 208 , 2006 .

Heymann ,

Ramage , and

Garcia-Molina . Social tag prediction . In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval , pages 531 { 538 , New York, NY, USA, 2008 . ACM.

7. R. Jaschke, L. Marinho , A.

Hotho , L.

Schmidt-Thieme , and G.

Stumme . Tag recommendations in folksonomies . In Alexander Hinneburg, editor, Workshop Proceedings of Lernen - Wissensentdeckung - Adaptivit?t (LWA 2007 ), pages 13 { 20 , September 2007 .

Sigma

On Kee Lee and Andy Hon Wai Chun . Automatic tag recommendation for the web 2.0 blogosphere using collaborative tagging and hybrid ann semantic structures . In ACOS'07: Proceedings of the 6th Conference on WSEAS International Conference on Applied Computer Science , pages 88 { 93 , Stevens

Point

, Wisconsin, USA, 2007 . World Scienti c and Engineering Academy and Society (WSEAS).

Lipczak . Tag recommendation for folksonomies oriented towards individual users . In Proceedings of ECML PKDD Discovery Challenge (RSDC08) , pages 84 { 95 , 2008 .

10. Leandro

Marinho and Lars

Schmidt-Thieme . Collaborative tag recommendations . pages 533 { 540 . 2008 .

11.

Adam

Mathes . Folksonomies - cooperative classi cation and communication through shared metadata . http://www.adammathes.com/academic/computermediated-communication/folksonomies.html, December 2004 .

12.

Gilad

Mishne . Autotag: a collaborative approach to automated tag assignment for weblog posts . In WWW '06: Proceedings of the 15th international conference on World Wide Web , pages 953 { 954 , New York, NY, USA, 2006 . ACM.

13. Stephen

Robertson , Steve Walker, Micheline H. Beaulieu , Aarron Gull, and Marianna Lau . Okapi at trec . In Text REtrieval Conference , pages 21 { 30 , 1992 .

14.

Salton . Automatic Text Processing. Addison-Wesley , 1989 .

15. Christoph

Schmitz

, Andreas Hotho, Robert Jschke, and

Gerd

Stumme . Mining association rules in folksonomies . In V. Batagelj, H. -H. Bock , A. Ferligoj , and A. ?iberna, editors, Data Science and Classi cation (Proc. IFCS 2006 Conference) , Studies in Classi cation , Data Analysis, and Knowledge Organization , pages 261 { 270 , Berlin/Heidelberg, July 2006. Springer. Ljubljana.

16. Sanjay

Sood

, Sara Owsley, Kristian Hammond, and Larry Birnbaum. TagAssist: Automatic Tag Suggestion for Blog Posts . In Proceedings of the International Conference on Weblogs and Social Media (ICWSM 2007 ), 2007 .

17. M. Tatu , M. Srikanth , and T. D'Silva . Rsdc' 08 : Tag recommendations using bookmark content . In Proceedings of ECML PKDD Discovery Challenge (RSDC08) , pages 96 { 107 , 2008 .