An IR-based approach to Tag Recommendation

                   Cataldo Musto                              Fedelucio Narducci                  Marco De Gemmis
            Dept. of Computer Science                      Dept. of Computer Science            Dept. of Computer Science
           University of Bari ‘Aldo Moro’                 University of Bari ‘Aldo Moro’       University of Bari ‘Aldo Moro’
                        Italy                                          Italy                                Italy
          cataldomusto@di.uniba.it   narducci@di.uniba.it    degemmis@di.uniba.it
                           Pasquale Lops          Giovanni Semeraro
                                    Dept. of Computer Science               Dept. of Computer Science
                                   University of Bari ‘Aldo Moro’          University of Bari ‘Aldo Moro’
                                                Italy                                   Italy
                                         lops@di.uniba.it                      semeraro@di.uniba.it

ABSTRACT                                                                   Keywords
Thanks to the continuous growth of collaborative platforms                 Recommender Systems, Web 2.0, Collaborative Tagging Sys-
like YouTube, Flickr and Delicious, we are recently witness-               tems, Folksonomies
ing to a rapid evolution of web dynamics towards a more
‘social’ vision, called Web 2.0. In this context collaborative             1. INTRODUCTION
tagging systems are rapidly emerging as one of the most
                                                                              We are assisting to a transformation of the Web towards
promising tools. However, as tags are handled in a sim-
                                                                           a more user-centric vision called Web 2.0. By using Web 2.0
ply syntactical way, collaborative tagging systems suffer of
                                                                           applications users are able to publish auto-produced con-
typical Information Retrieval (IR) problems like polysemy
                                                                           tents such as photos, videos, political opinions, reviews, hence
and synonymy: so, in order to reduce the impact of these
                                                                           they are identified as Web prosumers: producers + consumers
drawbacks and to aid at the same time the so-called tag con-
                                                                           of knowledge. Recently the research community has thor-
vergence, systems that assist the user in the task of tagging
                                                                           oughly analyzed the dynamics of tagging, which is the act
are required.
                                                                           of annotating resources with free labels, called tags. These
   In this paper we present a system, called STaR, that im-
                                                                           systems provide heterogeneous contents (photos, videos, mu-
plements an IR-based approach for tag recommendation.
                                                                           sical habits, etc.), but they all share a common core: they
Our approach, mainly based on the exploitation of a state-
                                                                           let users to post new resources and to annotate them with
of-the-art IR-model called BM25, relies on two assumptions:
                                                                           tags. Besides the simple act of annotation, the tagging of
firstly, if two or more resources share some common pat-
                                                                           resources has also a key social aspect; the connection be-
terns (e.g. the same features in the textual description), we
                                                                           tween users, resources and tags generates a tripartite graph
can exploit this information supposing that they could be
                                                                           that can be easily exploited to analyze the dynamics of col-
annotated with similar tags. Furthermore, since each user
                                                                           laborative tagging systems. Since folksonomies do not rely
has a typical manner to label resources, a tag recommender
                                                                           on a predefined lexicon or hierarchy they have the main ad-
might exploit this information to weigh more the tags she
                                                                           vantage to be fully free, but at the same time they generate
already used to annotate similar resources. We also present
                                                                           a very noisy tag space, really hard to exploit for retrieval
an experimental evaluation, carried out using a large dataset
                                                                           or recommendation tasks without performing any form of
gathered from Bibsonomy.
                                                                           processing.
                                                                              This problem is a hindrance to completely exploit the ex-
                                                                           pressive power of folksonomies, so in the last years many
Categories and Subject Descriptors                                         tools have been developed to assist the user in the task of
H.3.1 [Information Storage and Retrieval]: Content                         tagging and to aid at the same time the tag convergence: we
Analysis and Indexing: Indexing methods; H.3.3 [Information                refer to them as tag recommenders.
Storage and Retrieval]: Information Search and Retrieval:                     This paper presents STaR, a tag recommender system im-
Information filtering                                                      plementing an IR-based approach that relies on a state-of-
                                                                           the-art IR model called BM25. In this work, already pre-
                                                                           sented [5],within the ECML-PKDD 2009 Discovery Chal-
General Terms                                                              lenge1 , we tried to point out two concepts:
Algorithms, Experimentation                                                      • resources with similar content should be annotated
                                                                                   with similar tags;

                                                                                 • a tag recommender needs to take into account the pre-
                                                                                   vious tagging activity of users, increasing the weight
Appears in the Proceedings of the 1st Italian Information Retrieval                of the tags already used to annotate similar resources.
Workshop (IIR’10), January 27–28, 2010, Padova, Italy.                     1
                                                                               http://www.kde.cs.uni-kassel.de/ws/dc09
http://ims.dei.unipd.it/websites/iir10/index.html
Copyright owned by the authors.
   The paper is organized as follows. Section 2 analyzes re-       3.1 Indexing of Resources
lated work. Section 3 explains the architecture of the system         Given a collection of resources (corpus) with some textual
and how the recommendation approach is implemented. The            metadata (such as the title of the resource, the authors, the
experimental evaluation carried out is described in Section        description, etc.), STaR firstly invokes the Indexer module
4, while conclusions and future work are drawn in the last         in order to perform a preprocessing step on these data by
section.                                                           exploiting Apache Lucene2 . Obviously, the kind of metadata
                                                                   to be indexed is strictly dependent on the nature of the
2.     RELATED WORK                                                resources. Let U be the set of users and N the cardinality
                                                                   of this set, the indexing procedure is repeated N + 1 times:
   Usually the works in the tag recommendation area are
                                                                   we build an index for each user (Personal Index ) storing the
broadly divided into three classes: content-based, collabora-
                                                                   information on the resources she previously tagged and an
tive and graph-based approaches.
                                                                   index for the whole community (Social Index ) storing the
   In the content-based approach, exploiting some Informa-
                                                                   information about all the tagged resources by merging the
tion Retrieval-related techniques, a system is able to ex-
                                                                   Personal Indexes.
tract relevant unigrams or bigrams from the text. Brooks
et. al [2], for example, develop a tag recommender system          3.2 Retrieval of Similar Resources
that exploits TF/IDF scoring in order to automatically sug-
gests tags for a blog post.                                           STaR can take into account users requests in order to pro-
   AutoTag [4] is one of the most important systems imple-         duce personalized tag recommendations for each resource.
menting the collaborative approach for tag recommendation.         First, every user has to provide some information about the
It presents some analogies with collaborative filtering meth-      resource to be tagged, such as the title of the Web page or
ods. As in the collaborative recommender systems the rec-          its URL, in order to crawl the textual metadata associated
ommendations are generated based on the ratings provided           on it. Next, if the system can identify the user since she
by similar users (called neighbors), in AutoTag the system         has already posted other resources, it exploits data about
suggests tags based on the other tags associated with similar      her (language, the tags she uses more, the number of tags
posts.                                                             she usually uses to annotate resources, etc.) in order to re-
   The problem of tag recommendation through graph-based           fine the query to be submitted against both the Social and
approaches has been firstly addressed by Jäschke et al. in [3].   Personal indexes stored in Lucene.
The key idea behind their FolkRank algorithm is that a re-            In order to improve the performances of the Lucene Query-
source which is tagged by important tags from important            ing Engine we replaced the original Lucene Scoring function
users becomes important itself. Furthermore, Schmitz et            with an Okapi BM25 implementation3 . BM25 is nowadays
al. [7] proposed association rule mining as a technique that       considered as one of the state-of-the art retrieval models by
might be useful in the tag recommendation process.                 the IR community [6].
                                                                      Let D be a corpus of documents, d ∈ D, BM25 returns
                                                                   the top-k resources with the highest similarity value given
3.     STAR: A SOCIAL TAG RECOMMENDER                              a resource r (tokenized as a set of terms t1 . . . tm ), and is
       SYSTEM                                                      defined as follows:
                                                                                       m
  STaR (Social Tag Recommender) is a content-based tag                                 X               nrti
recommender system, developed at the University of Bari.                 sim(r, d) =                                   ∗ idf (ti )   (1)
                                                                                           k1 ((1 − b) + b ∗ l) + nrti
The inceptive idea behind STaR is to improve the model                                 i=1

implemented in systems like TagAssist [8] or AutoTag [4].          where nrti represents the occurrences of the term ti in the
  Although we agree that similar resources usually share           document d, l is the ratio between the length of the resource
similar tags, in our opinion Mishne’s approach presents two        and the average length of resources in the corpus. Finally, k1
important drawbacks:                                               and b are two parameters typically set to 2.0 and 0.75 respec-
                                                                   tively, and idf (ti ) represents the inverse document frequency
     1. the tag re-ranking formula simply performs a sum of
                                                                   of the term ti defined as follows:
        the occurrences of each tag among all the folksonomies,
        without considering the similarity with the resource to                                     N − df (ti ) + 0.5
        be tagged. In this way tags often used to annotate                        idf (ti ) = log                                    (2)
                                                                                                      df (ti ) + 0.5
        resources with a low similarity level could be ranked
        first;                                                     where N is the number of resources in the collection and
                                                                   df (ti ) is the number of resources in which the term ti occurs.
     2. the proposed model does not take into account the          Given a user u and a resource r, Lucene returns the resources
        previous tagging activity performed by users. If two       whose similarity with r is greater or equal than a threshold
        users bookmarked the same resource, they will receive      β. To perform this task Lucene uses both the PersonalIndex
        the same suggestions since the folksonomies built from     of the user u and the SocialIndex.
        similar resources are the same.                               For example, we suppose that the target resource is repre-
                                                                   sented by Gazzetta.it, one of the most famous Italian sport
  We will try to overcome these drawbacks, by proposing an         newspaper. Lucene queries the SocialIndex and it could
approach firstly based on the analysis of similar resources        returns as the most similar resources an online newspaper
capable also of leveraging the tags already selected by the        (Corrieredellosport.it) and the official web site of an Italian
user during her previous tagging activity, by putting them
                                                                   2
on the top of the tag rank.                                            http://lucene.apache.org
                                                                   3
  Figure 1 shows the general architecture of STaR.                     http://nlp.uned.es/ jperezi/Lucene-BM25/
                                              Figure 1: Architecture of STaR


Football Club (Inter.it). The PersonalIndex, instead, could       2, setting a threshold γ = 0.20, the system would suggest
return another online newspaper (Tuttosport.com).                 the tags sport and newspaper.

3.3 Extraction of Candidate Tags                                  4. EXPERIMENTAL EVALUATION
   The role of the Tag Extractor is to produce as output             The goal of experimental session was to tune the system
the list of the so-called “candidate tags” (namely, the tags      parameters in order to obtain the best effectiveness of the
considered as ‘relevant’ by the tag recommender). In this         tag recommender. We exploited a large dataset gathered
step the system gets the most similar resources returned          from Bibsonomy.
by the Apache Lucene engine and builds their folksonomies
(namely, the tags they have been annotated with). Next, it        4.1 Description of the dataset
produces the list of candidate tags by computing for each            The dataset used for the experimental evaluation contains
tag from the folksonomy a score obtained by weighting the         263,004 bookmark posts and 158,924 BibTeX entries sub-
similarity score returned by Lucene with the normalized oc-       mitted by 3,617 different users. For each of the 235,328 dif-
currence of the tag. If the Tag Extractor also gets the list of   ferent URLs and the 143,050 different BibTeX entries were
the most similar resources from the user PersonalIndex, it        also provided some textual metadata (such as the title of the
will produce two partial folksonomies that are merged, as-        resource, the description, the abstract and so on). We eval-
signing a weight to each folksonomy in order to boost the         uated STaR by comparing the real tags (namely, the tags a
tags previously used by the user.                                 user adopts to annotate an unseen resource) with the sug-
   Figure 2 depicts the procedure performed by the Tag Ex-        gested ones. The accuracy was finally computed using clas-
tractor : in this case we have a set of 4 Social Tags (Newspa-    sical IR metrics, such as Precision, Recall and F1-Measure.
per, Online, Football and Inter) and 3 Personal Tags (Sport,
Newspaper and Tuttosport). These sets are then merged,            4.2 Experimental Session
building the set of Candidate Tags. This set contains 6 tags         Firstly, we tried to evaluate the influence of different Lucene
since the tag newspaper appears both in social and personal       scoring functions on the performance of STaR. We randomly
tags. The system associates a score to each tag that indi-        chose 10,000 resources from the dataset and we compared
cates its effectiveness for the target resource. Besides, the     the results returned exploiting two different scoring func-
scores for the Candidate Tags are weighted again according        tions (the Lucene original one and the BM25) in order to
to SocialTagWeight (α) and PersonalTagWeight (1 − α) val-         find the best one. We performed the same steps previously
ues (in the example, 0.3 and 0.7 respectively), in order to       described, retrieving the most similar items using the two
boost the tags already used by the user in the final tag rank.    mentioned similarity functions and comparing the tags sug-
Indeed, we can point out that the social tag ‘football’ gets      gested by the system in both cases. Results are presented in
the same score of the personal tag ‘tuttosport’, although its     Table 1. In general, there is a low improvement by adopting
original weight was twice.                                        BM25 with respect to the Lucene original similarity func-
                                                                  tion. We can note that BM25 improved the recall of book-
3.4 Tag Recommendation                                            marks (+ 6,95%) and BibTeX entries (+1,46%).
  Finally, the last step of the recommendation process is            Next, using the BM25 as scoring function, we tried to
performed by the Filter. It removes from the list of can-         compare the predictive accuracy of STaR with different com-
didate tags those not matching specific conditions, such as       binations of system parameters. Namely:
a threshold for the relevance score computed by the Tag              • the maximum number of similar documents retrieved
Extractor. Obviously, the value of the threshold and the               by Lucene;
maximum number of tags to be recommended are strictly
dependent from the training data. In the example in Figure           • the value of α for the PersonalTagWeight and Social-
                       Figure 2: Description of the process performed by the Tag Extractor


Table 1: Results comparing the Lucene original scor-             Table 2: Predictive accuracy of STaR over 50, 000
ing function with BM25                                           bookmarks
 Scoring     Resource       Pr       Re         F1                 Approach    STW PTW         Pr    Re     F1

 Original       bookmark        25.26       29.67       27.29     Comm.-based        1.0       0.0     23.96    24.60    24.28
 Original        bibtex         14.06       21.45       16.99     User-based         0.0       1.0     32.12    28.72    30.33

  BM25          bookmark        25.62       36.62       30.15        Hybrid          0.7       0.3     24.96    26.30     25.61
  BM25           bibtex         13.72       22.91       17.16        Hybrid          0.5       0.5     24.10    25.16     24.62
                                                                     Hybrid          0.3       0.7     23.85    25.12     25.08
 Original        overall        16.43       23.58       19.37        Baseline         -         -      35.58    10.42     16.11
  BM25           overall        16.45       26.46       20.29


     TagWeight parameters;                                         i)    PTW = 0.7 STW = 0.3;
                                                                   ii) PTW = 0.5 STW = 0.5;
   • the threshold γ to establish whether a tag is relevant;       iii) PTW = 0.3 STW = 0.7.
                                                                 Another parameter that can influence the system perfor-
   • which fields of the target resource use to compose the      mance is the set of fields to use to compose the query. For
     query.                                                      each resource in the dataset there are many textual fields,
                                                                 such as title, abstract, description, extended description, etc.
   Tuning the number of similar documents to retrieve from       In this case we used as query the title of the webpage (for
the PersonalIndex and SocialIndex is very important, since       bookmarks) and the title of the publication (for BibTeX en-
a value too high can introduce noise in the retrieval process,   tries). The last parameter we need to tune is the threshold
while a value too low can exclude documents containing rel-      to deem a tag as relevant (γ).We performed some tests sug-
evant tags. By analyzing the results returned by some test       gesting both 4 and 5 tags and we decided to recommend
queries, we decided to set this value between 5 and 10, de-      only 4 tags since the fifth was usually noisy. We also fixed
pending on the training data.                                    the threshold value between 0.20 and 0.25. In order to carry
   Next, we tried to estimate the values for PersonalTag-        out this experimental session we used the aforementioned
Weight (PTW) and the SocialTagWeight (STW). A higher             dataset both as training and test set. We executed the test
weight for the Personal Tags means that in the recommenda-       over 50, 000 bookmarks and 50, 000 BibTeXs. Results are
tion process the systems will weigh more the tags previously     presented in Table 2 and Table 3.
used by the target user, while a higher value for the So-           Analyzing the results, it emerges that the approach we
cial Tags will give more importance to the tags used by the      called user-based outperformed the other ones. In this con-
community (namely, the whole folksonomy) on the target           figuration we set PTW to 1.0 and STW to 0, so we suggest
resource. These parameters are biased by the user practice:      only the tags already used by the user in tagging similar
if tags often used by the user are very different from those     resources. No query was submitted against the SocialIn-
used from the community, the PTW should be higher than           dex. The first remark we can make is that each user has
STW. We performed an empirical study since it is difficult to    her own mental model and her own vocabulary: she usu-
define the user behavior at run time. We tested the system       ally prefers to tag resources with labels she already used.
setting the parameters with several combinations of values:      Instead, getting tags from the SocialIndex only (as proved
                                                                     Content-based Recommender System Integrating Tags
Table 3: Predictive accuracy of STaR over 50, 000                    for Cultural Heritage Personalization. In P. Nesi,
BibTeXs                                                              K. Ng, and J. Delgado, editors, Proceedings of the 4th
  Approach    STW PTW         Pr    Re     F1
                                                                     International Conference on Automated Solutions for
                                                                     Cross Media Content and Multi-channel Distribution
    Comm.-based      1.0     0.0    34.44    35.89    35.15
                                                                     (AXMEDIS 2008) - Workshop Panels and Industrial
    User-based       0.0     1.0    44.73    40.53    42.53
                                                                     Applications, Florence, Italy, Firenze University Press,
                                                                     pages 103–106, November 17-19, 2008.
      Hybrid         0.7     0.3     32.31   38.57    35.16
      Hybrid         0.5     0.5     32.36   37.55    34.76      [2] C. H. Brooks and N. Montanez. Improved annotation of
      Hybrid         0.3     0.7     35.47   39.68    37.46          the blogosphere via autotagging and hierarchical
                                                                     clustering. In WWW ’06: Proceedings of the 15th
      Baseline        -       -      42.03   13.23    20.13
                                                                     international conference on World Wide Web, pages
                                                                     625–632, New York, NY, USA, 2006. ACM Press.
                                                                 [3] R. Jäschke, L. Marinho, A. Hotho, L. Schmidt-Thieme,
                                                                     and G. Stumme. Tag recommendations in folksonomies.
by the results of the community-based approach) often in-
                                                                     In Alexander Hinneburg, editor, Workshop Proceedings
troduces some noise in the recommendation process. The
                                                                     of Lernen - Wissensentdeckung - Adaptivit?t (LWA
hybrid approaches outperformed the community-based one,
                                                                     2007), pages 13–20, September 2007.
but their predictive accuracy is still worse when compared
with the user-based approach. Finally, all the approaches        [4] G. Mishne. Autotag: a collaborative approach to
outperformed the F1-measure of the baseline. We computed             automated tag assignment for weblog posts. In WWW
the baseline recommending for each resource only its most            ’06: Proceedings of the 15th international conference on
popular tags. Obviously, for resources never tagged we could         World Wide Web, pages 953–954, New York, NY, USA,
not suggest anything. This analysis substantially confirms           2006. ACM.
the results we obtained from other studies performed in the      [5] C. Musto, F. Narducci, M. de Gemmis, P. Lops, and
area of the tag-based recommendation [1].                            G. Semeraro. STaR: a Social Tag Recommender
                                                                     System. In Folke Eisterlehner, Andreas Hotho, and
                                                                     Robert Jäschke, editors, ECML PKDD Discovery
5.     CONCLUSIONS AND FUTURE WORK                                   Challenge 2009 (DC09), volume 497 of CEUR
   Nowadays, collaborative tagging systems are powerful tools        Workshop Proceedings, September 7 2009.
but they are affected from some drawbacks since the com-         [6] S. E. Robertson, S. Walker, M. H. Beaulieu, A. Gull,
plete tag space is too noisy to be exploited for retrieval and       and M. Lau. Okapi at trec. In Text REtrieval
filtering tasks. In this paper we presented STaR, a social tag       Conference, pages 21–30, 1992.
recommender system. The idea behind our work was to dis-         [7] C. Schmitz, A. Hotho, R. Jäschke, and G. Stumme.
cover similarity among resources exploiting a state-of-the-art       Mining association rules in folksonomies. In
IR-model called BM25. The experimental sessions showed               V. Batagelj, H.-H. Bock, A. Ferligoj, and A. Őiberna,
that users tend to reuse their own tags to annotate similar          editors, Data Science and Classification (Proc. IFCS
resources, so this kind of recommendation model could ben-           2006 Conference), Studies in Classification, Data
efit from the use of the user personal tags before extracting        Analysis, and Knowledge Organization, pages 261–270,
the social tags of the community (we called this approach            Berlin/Heidelberg, July 2006. Springer. Ljubljana.
user-based).
                                                                 [8] S. Sood, S. Owsley, K. Hammond, and L. Birnbaum.
   This approach has a main drawback, since it cannot sug-
                                                                     TagAssist: Automatic Tag Suggestion for Blog Posts.
gest any tags when the set of similar items returned by
                                                                     In Proceedings of the International Conference on
Lucene is empty. We are planning to extend the system in
                                                                     Weblogs and Social Media (ICWSM 2007), 2007.
order to extract significant keywords from the textual con-
tent associated to a resource (title, description, etc.) that
has no similar items, maybe exploiting structured data or
domain ontologies.
   Furthermore, since tags usually suffer of typical Informa-
tion Retrieval problem (polysemy, etc.) we will try to estab-
lish whether the integration of Word Sense Disambiguation
algorithms or a semantic representation of documents could
improve the performance of the recommender.
   Anyhow, our approach resulted promising compared with
already existing and state of the art approaches for tag rec-
ommendation. Indeed, our work classified in 6th position in
the final results of the ECML-PKDD 2009 Discovery Chal-
lenge (id: 29723)4

6.     REFERENCES
[1] P. Basile, M. de Gemmis, P. Lops, G. Semeraro,
    M. Bux, C. Musto, and F. Narducci. FIRSt: a
4
    http://www.kde.cs.uni-kassel.de/ws/dc09/results