An IR-based approach to Tag Recommendation Cataldo Musto Fedelucio Narducci Marco De Gemmis Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science University of Bari ‘Aldo Moro’ University of Bari ‘Aldo Moro’ University of Bari ‘Aldo Moro’ Italy Italy Italy cataldomusto@di.uniba.it narducci@di.uniba.it degemmis@di.uniba.it Pasquale Lops Giovanni Semeraro Dept. of Computer Science Dept. of Computer Science University of Bari ‘Aldo Moro’ University of Bari ‘Aldo Moro’ Italy Italy lops@di.uniba.it semeraro@di.uniba.it ABSTRACT Keywords Thanks to the continuous growth of collaborative platforms Recommender Systems, Web 2.0, Collaborative Tagging Sys- like YouTube, Flickr and Delicious, we are recently witness- tems, Folksonomies ing to a rapid evolution of web dynamics towards a more ‘social’ vision, called Web 2.0. In this context collaborative 1. INTRODUCTION tagging systems are rapidly emerging as one of the most We are assisting to a transformation of the Web towards promising tools. However, as tags are handled in a sim- a more user-centric vision called Web 2.0. By using Web 2.0 ply syntactical way, collaborative tagging systems suffer of applications users are able to publish auto-produced con- typical Information Retrieval (IR) problems like polysemy tents such as photos, videos, political opinions, reviews, hence and synonymy: so, in order to reduce the impact of these they are identified as Web prosumers: producers + consumers drawbacks and to aid at the same time the so-called tag con- of knowledge. Recently the research community has thor- vergence, systems that assist the user in the task of tagging oughly analyzed the dynamics of tagging, which is the act are required. of annotating resources with free labels, called tags. These In this paper we present a system, called STaR, that im- systems provide heterogeneous contents (photos, videos, mu- plements an IR-based approach for tag recommendation. sical habits, etc.), but they all share a common core: they Our approach, mainly based on the exploitation of a state- let users to post new resources and to annotate them with of-the-art IR-model called BM25, relies on two assumptions: tags. Besides the simple act of annotation, the tagging of firstly, if two or more resources share some common pat- resources has also a key social aspect; the connection be- terns (e.g. the same features in the textual description), we tween users, resources and tags generates a tripartite graph can exploit this information supposing that they could be that can be easily exploited to analyze the dynamics of col- annotated with similar tags. Furthermore, since each user laborative tagging systems. Since folksonomies do not rely has a typical manner to label resources, a tag recommender on a predefined lexicon or hierarchy they have the main ad- might exploit this information to weigh more the tags she vantage to be fully free, but at the same time they generate already used to annotate similar resources. We also present a very noisy tag space, really hard to exploit for retrieval an experimental evaluation, carried out using a large dataset or recommendation tasks without performing any form of gathered from Bibsonomy. processing. This problem is a hindrance to completely exploit the ex- pressive power of folksonomies, so in the last years many Categories and Subject Descriptors tools have been developed to assist the user in the task of H.3.1 [Information Storage and Retrieval]: Content tagging and to aid at the same time the tag convergence: we Analysis and Indexing: Indexing methods; H.3.3 [Information refer to them as tag recommenders. Storage and Retrieval]: Information Search and Retrieval: This paper presents STaR, a tag recommender system im- Information filtering plementing an IR-based approach that relies on a state-of- the-art IR model called BM25. In this work, already pre- sented [5],within the ECML-PKDD 2009 Discovery Chal- General Terms lenge1 , we tried to point out two concepts: Algorithms, Experimentation • resources with similar content should be annotated with similar tags; • a tag recommender needs to take into account the pre- vious tagging activity of users, increasing the weight Appears in the Proceedings of the 1st Italian Information Retrieval of the tags already used to annotate similar resources. Workshop (IIR’10), January 27–28, 2010, Padova, Italy. 1 http://www.kde.cs.uni-kassel.de/ws/dc09 http://ims.dei.unipd.it/websites/iir10/index.html Copyright owned by the authors. The paper is organized as follows. Section 2 analyzes re- 3.1 Indexing of Resources lated work. Section 3 explains the architecture of the system Given a collection of resources (corpus) with some textual and how the recommendation approach is implemented. The metadata (such as the title of the resource, the authors, the experimental evaluation carried out is described in Section description, etc.), STaR firstly invokes the Indexer module 4, while conclusions and future work are drawn in the last in order to perform a preprocessing step on these data by section. exploiting Apache Lucene2 . Obviously, the kind of metadata to be indexed is strictly dependent on the nature of the 2. RELATED WORK resources. Let U be the set of users and N the cardinality of this set, the indexing procedure is repeated N + 1 times: Usually the works in the tag recommendation area are we build an index for each user (Personal Index ) storing the broadly divided into three classes: content-based, collabora- information on the resources she previously tagged and an tive and graph-based approaches. index for the whole community (Social Index ) storing the In the content-based approach, exploiting some Informa- information about all the tagged resources by merging the tion Retrieval-related techniques, a system is able to ex- Personal Indexes. tract relevant unigrams or bigrams from the text. Brooks et. al [2], for example, develop a tag recommender system 3.2 Retrieval of Similar Resources that exploits TF/IDF scoring in order to automatically sug- gests tags for a blog post. STaR can take into account users requests in order to pro- AutoTag [4] is one of the most important systems imple- duce personalized tag recommendations for each resource. menting the collaborative approach for tag recommendation. First, every user has to provide some information about the It presents some analogies with collaborative filtering meth- resource to be tagged, such as the title of the Web page or ods. As in the collaborative recommender systems the rec- its URL, in order to crawl the textual metadata associated ommendations are generated based on the ratings provided on it. Next, if the system can identify the user since she by similar users (called neighbors), in AutoTag the system has already posted other resources, it exploits data about suggests tags based on the other tags associated with similar her (language, the tags she uses more, the number of tags posts. she usually uses to annotate resources, etc.) in order to re- The problem of tag recommendation through graph-based fine the query to be submitted against both the Social and approaches has been firstly addressed by Jäschke et al. in [3]. Personal indexes stored in Lucene. The key idea behind their FolkRank algorithm is that a re- In order to improve the performances of the Lucene Query- source which is tagged by important tags from important ing Engine we replaced the original Lucene Scoring function users becomes important itself. Furthermore, Schmitz et with an Okapi BM25 implementation3 . BM25 is nowadays al. [7] proposed association rule mining as a technique that considered as one of the state-of-the art retrieval models by might be useful in the tag recommendation process. the IR community [6]. Let D be a corpus of documents, d ∈ D, BM25 returns the top-k resources with the highest similarity value given 3. STAR: A SOCIAL TAG RECOMMENDER a resource r (tokenized as a set of terms t1 . . . tm ), and is SYSTEM defined as follows: m STaR (Social Tag Recommender) is a content-based tag X nrti recommender system, developed at the University of Bari. sim(r, d) = ∗ idf (ti ) (1) k1 ((1 − b) + b ∗ l) + nrti The inceptive idea behind STaR is to improve the model i=1 implemented in systems like TagAssist [8] or AutoTag [4]. where nrti represents the occurrences of the term ti in the Although we agree that similar resources usually share document d, l is the ratio between the length of the resource similar tags, in our opinion Mishne’s approach presents two and the average length of resources in the corpus. Finally, k1 important drawbacks: and b are two parameters typically set to 2.0 and 0.75 respec- tively, and idf (ti ) represents the inverse document frequency 1. the tag re-ranking formula simply performs a sum of of the term ti defined as follows: the occurrences of each tag among all the folksonomies, without considering the similarity with the resource to N − df (ti ) + 0.5 be tagged. In this way tags often used to annotate idf (ti ) = log (2) df (ti ) + 0.5 resources with a low similarity level could be ranked first; where N is the number of resources in the collection and df (ti ) is the number of resources in which the term ti occurs. 2. the proposed model does not take into account the Given a user u and a resource r, Lucene returns the resources previous tagging activity performed by users. If two whose similarity with r is greater or equal than a threshold users bookmarked the same resource, they will receive β. To perform this task Lucene uses both the PersonalIndex the same suggestions since the folksonomies built from of the user u and the SocialIndex. similar resources are the same. For example, we suppose that the target resource is repre- sented by Gazzetta.it, one of the most famous Italian sport We will try to overcome these drawbacks, by proposing an newspaper. Lucene queries the SocialIndex and it could approach firstly based on the analysis of similar resources returns as the most similar resources an online newspaper capable also of leveraging the tags already selected by the (Corrieredellosport.it) and the official web site of an Italian user during her previous tagging activity, by putting them 2 on the top of the tag rank. http://lucene.apache.org 3 Figure 1 shows the general architecture of STaR. http://nlp.uned.es/ jperezi/Lucene-BM25/ Figure 1: Architecture of STaR Football Club (Inter.it). The PersonalIndex, instead, could 2, setting a threshold γ = 0.20, the system would suggest return another online newspaper (Tuttosport.com). the tags sport and newspaper. 3.3 Extraction of Candidate Tags 4. EXPERIMENTAL EVALUATION The role of the Tag Extractor is to produce as output The goal of experimental session was to tune the system the list of the so-called “candidate tags” (namely, the tags parameters in order to obtain the best effectiveness of the considered as ‘relevant’ by the tag recommender). In this tag recommender. We exploited a large dataset gathered step the system gets the most similar resources returned from Bibsonomy. by the Apache Lucene engine and builds their folksonomies (namely, the tags they have been annotated with). Next, it 4.1 Description of the dataset produces the list of candidate tags by computing for each The dataset used for the experimental evaluation contains tag from the folksonomy a score obtained by weighting the 263,004 bookmark posts and 158,924 BibTeX entries sub- similarity score returned by Lucene with the normalized oc- mitted by 3,617 different users. For each of the 235,328 dif- currence of the tag. If the Tag Extractor also gets the list of ferent URLs and the 143,050 different BibTeX entries were the most similar resources from the user PersonalIndex, it also provided some textual metadata (such as the title of the will produce two partial folksonomies that are merged, as- resource, the description, the abstract and so on). We eval- signing a weight to each folksonomy in order to boost the uated STaR by comparing the real tags (namely, the tags a tags previously used by the user. user adopts to annotate an unseen resource) with the sug- Figure 2 depicts the procedure performed by the Tag Ex- gested ones. The accuracy was finally computed using clas- tractor : in this case we have a set of 4 Social Tags (Newspa- sical IR metrics, such as Precision, Recall and F1-Measure. per, Online, Football and Inter) and 3 Personal Tags (Sport, Newspaper and Tuttosport). These sets are then merged, 4.2 Experimental Session building the set of Candidate Tags. This set contains 6 tags Firstly, we tried to evaluate the influence of different Lucene since the tag newspaper appears both in social and personal scoring functions on the performance of STaR. We randomly tags. The system associates a score to each tag that indi- chose 10,000 resources from the dataset and we compared cates its effectiveness for the target resource. Besides, the the results returned exploiting two different scoring func- scores for the Candidate Tags are weighted again according tions (the Lucene original one and the BM25) in order to to SocialTagWeight (α) and PersonalTagWeight (1 − α) val- find the best one. We performed the same steps previously ues (in the example, 0.3 and 0.7 respectively), in order to described, retrieving the most similar items using the two boost the tags already used by the user in the final tag rank. mentioned similarity functions and comparing the tags sug- Indeed, we can point out that the social tag ‘football’ gets gested by the system in both cases. Results are presented in the same score of the personal tag ‘tuttosport’, although its Table 1. In general, there is a low improvement by adopting original weight was twice. BM25 with respect to the Lucene original similarity func- tion. We can note that BM25 improved the recall of book- 3.4 Tag Recommendation marks (+ 6,95%) and BibTeX entries (+1,46%). Finally, the last step of the recommendation process is Next, using the BM25 as scoring function, we tried to performed by the Filter. It removes from the list of can- compare the predictive accuracy of STaR with different com- didate tags those not matching specific conditions, such as binations of system parameters. Namely: a threshold for the relevance score computed by the Tag • the maximum number of similar documents retrieved Extractor. Obviously, the value of the threshold and the by Lucene; maximum number of tags to be recommended are strictly dependent from the training data. In the example in Figure • the value of α for the PersonalTagWeight and Social- Figure 2: Description of the process performed by the Tag Extractor Table 1: Results comparing the Lucene original scor- Table 2: Predictive accuracy of STaR over 50, 000 ing function with BM25 bookmarks Scoring Resource Pr Re F1 Approach STW PTW Pr Re F1 Original bookmark 25.26 29.67 27.29 Comm.-based 1.0 0.0 23.96 24.60 24.28 Original bibtex 14.06 21.45 16.99 User-based 0.0 1.0 32.12 28.72 30.33 BM25 bookmark 25.62 36.62 30.15 Hybrid 0.7 0.3 24.96 26.30 25.61 BM25 bibtex 13.72 22.91 17.16 Hybrid 0.5 0.5 24.10 25.16 24.62 Hybrid 0.3 0.7 23.85 25.12 25.08 Original overall 16.43 23.58 19.37 Baseline - - 35.58 10.42 16.11 BM25 overall 16.45 26.46 20.29 TagWeight parameters; i) PTW = 0.7 STW = 0.3; ii) PTW = 0.5 STW = 0.5; • the threshold γ to establish whether a tag is relevant; iii) PTW = 0.3 STW = 0.7. Another parameter that can influence the system perfor- • which fields of the target resource use to compose the mance is the set of fields to use to compose the query. For query. each resource in the dataset there are many textual fields, such as title, abstract, description, extended description, etc. Tuning the number of similar documents to retrieve from In this case we used as query the title of the webpage (for the PersonalIndex and SocialIndex is very important, since bookmarks) and the title of the publication (for BibTeX en- a value too high can introduce noise in the retrieval process, tries). The last parameter we need to tune is the threshold while a value too low can exclude documents containing rel- to deem a tag as relevant (γ).We performed some tests sug- evant tags. By analyzing the results returned by some test gesting both 4 and 5 tags and we decided to recommend queries, we decided to set this value between 5 and 10, de- only 4 tags since the fifth was usually noisy. We also fixed pending on the training data. the threshold value between 0.20 and 0.25. In order to carry Next, we tried to estimate the values for PersonalTag- out this experimental session we used the aforementioned Weight (PTW) and the SocialTagWeight (STW). A higher dataset both as training and test set. We executed the test weight for the Personal Tags means that in the recommenda- over 50, 000 bookmarks and 50, 000 BibTeXs. Results are tion process the systems will weigh more the tags previously presented in Table 2 and Table 3. used by the target user, while a higher value for the So- Analyzing the results, it emerges that the approach we cial Tags will give more importance to the tags used by the called user-based outperformed the other ones. In this con- community (namely, the whole folksonomy) on the target figuration we set PTW to 1.0 and STW to 0, so we suggest resource. These parameters are biased by the user practice: only the tags already used by the user in tagging similar if tags often used by the user are very different from those resources. No query was submitted against the SocialIn- used from the community, the PTW should be higher than dex. The first remark we can make is that each user has STW. We performed an empirical study since it is difficult to her own mental model and her own vocabulary: she usu- define the user behavior at run time. We tested the system ally prefers to tag resources with labels she already used. setting the parameters with several combinations of values: Instead, getting tags from the SocialIndex only (as proved Content-based Recommender System Integrating Tags Table 3: Predictive accuracy of STaR over 50, 000 for Cultural Heritage Personalization. In P. Nesi, BibTeXs K. Ng, and J. Delgado, editors, Proceedings of the 4th Approach STW PTW Pr Re F1 International Conference on Automated Solutions for Cross Media Content and Multi-channel Distribution Comm.-based 1.0 0.0 34.44 35.89 35.15 (AXMEDIS 2008) - Workshop Panels and Industrial User-based 0.0 1.0 44.73 40.53 42.53 Applications, Florence, Italy, Firenze University Press, pages 103–106, November 17-19, 2008. Hybrid 0.7 0.3 32.31 38.57 35.16 Hybrid 0.5 0.5 32.36 37.55 34.76 [2] C. H. Brooks and N. Montanez. Improved annotation of Hybrid 0.3 0.7 35.47 39.68 37.46 the blogosphere via autotagging and hierarchical clustering. In WWW ’06: Proceedings of the 15th Baseline - - 42.03 13.23 20.13 international conference on World Wide Web, pages 625–632, New York, NY, USA, 2006. ACM Press. [3] R. Jäschke, L. Marinho, A. Hotho, L. Schmidt-Thieme, and G. Stumme. Tag recommendations in folksonomies. by the results of the community-based approach) often in- In Alexander Hinneburg, editor, Workshop Proceedings troduces some noise in the recommendation process. The of Lernen - Wissensentdeckung - Adaptivit?t (LWA hybrid approaches outperformed the community-based one, 2007), pages 13–20, September 2007. but their predictive accuracy is still worse when compared with the user-based approach. Finally, all the approaches [4] G. Mishne. Autotag: a collaborative approach to outperformed the F1-measure of the baseline. We computed automated tag assignment for weblog posts. In WWW the baseline recommending for each resource only its most ’06: Proceedings of the 15th international conference on popular tags. Obviously, for resources never tagged we could World Wide Web, pages 953–954, New York, NY, USA, not suggest anything. This analysis substantially confirms 2006. ACM. the results we obtained from other studies performed in the [5] C. Musto, F. Narducci, M. de Gemmis, P. Lops, and area of the tag-based recommendation [1]. G. Semeraro. STaR: a Social Tag Recommender System. In Folke Eisterlehner, Andreas Hotho, and Robert Jäschke, editors, ECML PKDD Discovery 5. CONCLUSIONS AND FUTURE WORK Challenge 2009 (DC09), volume 497 of CEUR Nowadays, collaborative tagging systems are powerful tools Workshop Proceedings, September 7 2009. but they are affected from some drawbacks since the com- [6] S. E. Robertson, S. Walker, M. H. Beaulieu, A. Gull, plete tag space is too noisy to be exploited for retrieval and and M. Lau. Okapi at trec. In Text REtrieval filtering tasks. In this paper we presented STaR, a social tag Conference, pages 21–30, 1992. recommender system. The idea behind our work was to dis- [7] C. Schmitz, A. Hotho, R. Jäschke, and G. Stumme. cover similarity among resources exploiting a state-of-the-art Mining association rules in folksonomies. In IR-model called BM25. The experimental sessions showed V. Batagelj, H.-H. Bock, A. Ferligoj, and A. Őiberna, that users tend to reuse their own tags to annotate similar editors, Data Science and Classification (Proc. IFCS resources, so this kind of recommendation model could ben- 2006 Conference), Studies in Classification, Data efit from the use of the user personal tags before extracting Analysis, and Knowledge Organization, pages 261–270, the social tags of the community (we called this approach Berlin/Heidelberg, July 2006. Springer. Ljubljana. user-based). [8] S. Sood, S. Owsley, K. Hammond, and L. Birnbaum. This approach has a main drawback, since it cannot sug- TagAssist: Automatic Tag Suggestion for Blog Posts. gest any tags when the set of similar items returned by In Proceedings of the International Conference on Lucene is empty. We are planning to extend the system in Weblogs and Social Media (ICWSM 2007), 2007. order to extract significant keywords from the textual con- tent associated to a resource (title, description, etc.) that has no similar items, maybe exploiting structured data or domain ontologies. Furthermore, since tags usually suffer of typical Informa- tion Retrieval problem (polysemy, etc.) we will try to estab- lish whether the integration of Word Sense Disambiguation algorithms or a semantic representation of documents could improve the performance of the recommender. Anyhow, our approach resulted promising compared with already existing and state of the art approaches for tag rec- ommendation. Indeed, our work classified in 6th position in the final results of the ECML-PKDD 2009 Discovery Chal- lenge (id: 29723)4 6. REFERENCES [1] P. Basile, M. de Gemmis, P. Lops, G. Semeraro, M. Bux, C. Musto, and F. Narducci. FIRSt: a 4 http://www.kde.cs.uni-kassel.de/ws/dc09/results