<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>STaR: a Social Tag Recommender System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cataldo Musto</string-name>
          <email>musto@di.uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fedelucio Narducci</string-name>
          <email>narducci@di.uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco de Gemmis</string-name>
          <email>degemmis@di.uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pasquale Lops</string-name>
          <email>lops@di.uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Semeraro</string-name>
          <email>semeraro@di.uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Bari \Aldo Moro"</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The continuous growth of collaborative platforms we are recently witnessing made possible the passage from an `elitary' Web, written by few and read by many, towards the so-called Web 2.0, a more `user-centric' vision, where users become active contributors in Web dynamics. In this context, collaborative tagging systems are rapidly emerging: in these platforms users can annotate resources they like with freely chosen keyword (called tags) in order to make retrieval of information and serendipitous browsing more and more easier. However, as tags are handled in a simply syntactical way, collaborative tagging systems su er of typical Information Retrieval (IR) problems like polysemy and synonymy: so, in order to reduce the impact of these drawbacks and to aid at the same time the so-called tag convergence, systems that assist the user in the task of tagging are required. The goal of these systems (called tag recommenders) is to suggest a set of relevant keywords for the resources to be annotated by exploiting di erent approaches. In this paper we present a tag recommender developed for the ECML-PKDD 2009 Discovery Challenge. Our approach is based on two assumptions: rstly, if two or more resources share some common patterns (e.g. the same features in the textual description), we can exploit this information supposing that they could be annotated with similar tags. Furthermore, since each user has a typical manner to label resources, a tag recommender might exploit this information to weigh more the tags she already used to annotate similar resources.</p>
      </abstract>
      <kwd-group>
        <kwd>Recommender Systems</kwd>
        <kwd>Web 2</kwd>
        <kwd>0</kwd>
        <kwd>Collaborative Tagging Systems</kwd>
        <kwd>Folksonomies</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The coming of Web 2.0 has changed the role of Internet users and the shape of
services o ered by the World Wide Web. Since web sites tend to be more
interactive and user-centric than in the past, users are shifting from passive consumers
of information to active producers. By using Web 2.0 applications, users are able
to easily publish content such as photos, videos, political opinions, reviews, so
they are identi ed as Web prosumers : producers + consumers of knowledge.
One of the forms of user-generated content (UGC) that has drawn more
attention from the research community is tagging, which is the act of annotating
resources of interests with free keywords, called tags, in order to help users in
organizing, browsing and searching resources through the building of a
sociallyconstructed classi cation schema, called folksonomy [18]. In contrast to systems
where information about resources is only provided by a small set of experts,
collaborative tagging systems take into account the way individuals conceive the
information contained in a resource [19]. Well-known example of platforms that
embed tagging activity are Flickr1 to share photos, YouTube2 to share videos,
Del.icio.us3 to share bookmarks, Last.fm4 to share music listening habits and
Bibsonomy5 to share bookmarks and lists of literature. Although these systems
provide heterogeneous contents, they have a common core: once a user is logged
in, she can post a new resource and choose some signi cant keywords to identify
it. Besides, users can label resources previously posted from other users. This
phenomenon represents a very important opportunity to categorize the resources
on the web, otherwise hardly feasible. The act of tagging resources from di erent
users is the social aspect of this activity; in this way tags create a connection
among users and items. Users that label the same resource by using the same
tags could have similar tastes and items labeled with the same tags could have
common characteristics.</p>
      <p>
        Many would argue that the power of tagging lies in the ability for people to
freely determine the appropriate tags for a resource without having to rely on a
prede ned lexicon or hierarchy [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Indeed, folksonomies are fully free and re ect
the user mind, but they su er of the same problems of unchecked vocabulary.
Golder et. al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] identi ed three major problems with current tagging systems:
polysemy, synonymy, and level variation. Polysemy refers to situations where
tags can have multiple meanings: for example a resource tagged with the term
turkey could indicate a news taken from an online newspaper about politics or
a recipe for Thanksgiving' Day. When multiple tags share a single meaning we
refer to it as synonymy. In collaborative tagging systems we can have simple
morphological variations (for example we can nd `blog', `blogs', `web log', to
identify a common blog) but also semantic similarity (like resources tagged with
`arts' versus `cultural heritage'). The third problem, called level variations, refers
to the phenomenon of tagging at di erent level of abstraction. Some people can
annotate a web page containing a recipe for roast turkey with the tag
`roastturkey' but also with a simple `recipe'.
      </p>
      <p>
        In order to avoid these problems, in the last years many tools have been
developed to facilitate the user in the task of tagging and to aid the tag
convergence [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]: these systems are know as tag recommenders. When a user posts
a resource in a Web 2.0 platform, a tag recommender suggests some signi cant
keywords to label the item following some criteria to lter out the noise from
the complete tag space.
      </p>
    </sec>
    <sec id="sec-2">
      <title>1 http://www. ickr.com</title>
    </sec>
    <sec id="sec-3">
      <title>2 http://www.youtube.com</title>
    </sec>
    <sec id="sec-4">
      <title>3 http://delicious.com/</title>
    </sec>
    <sec id="sec-5">
      <title>4 http://www.last.fm/</title>
    </sec>
    <sec id="sec-6">
      <title>5 http://www.bibsonomy.org/</title>
      <p>This paper presents STaR (Social Tag Recommender system), a tag
recommender system developed for the ECML-PKDD 2009 Discovery Challenge. The
idea behind our work is that folksonomies create connections among users and
items, so we tried to point out two concepts:
{ Resources with similar content could be annotated with similar tags;
{ A tag recommender needs to take into account the previous tagging activity
of users, by weighting more tags already used to annotate similar resources.</p>
      <p>In this work we identify two main aspects in the tag recommendation task:
rstly, each user has a typical manner to label resources (for example using
personal tags such as `beautiful', `ugly', `pleasant', etc. which are not connected
to the content of the item, or simply tagging using general tags like `politics',
`sport', etc.); next, similar resources usually share common tags: when a user
posts a resource r on the platform, our system takes into account how she (if
she is already stored in the system) and the entire community previously tagged
resources similar to r in order to suggest relevant tags. Next, we develop this
model and we tested it on a dataset extracted from BibSonomy.</p>
      <p>The paper is organized as follows. Section 2 analyzes related work. The
general problem of tag recommendation is introduced in Section 3. Section 4 explains
the architecture of the system and how the recommendation approach is
implemented. The experimental section carried out is described in Section 5.1, while
conclusions and future works are drawn in last section.
2</p>
      <sec id="sec-6-1">
        <title>Related</title>
      </sec>
      <sec id="sec-6-2">
        <title>Work</title>
        <p>Previous work in the tag recommendation area can be broadly divided into three
classes: content-based, collaborative and graph-based approaches.</p>
        <p>
          In the content-based approach, a system exploits some textual source with
Information Retrieval-related techniques [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] in order to extract relevant unigrams
or bigrams from the text. Brooks et. al [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], for example, develop a tag
recommender system that automatically suggests tags for a blog post extracting the
top three terms exploiting TF/IDF scoring [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. The system presented by Lee
and Chun [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] recommends tags retrieved from the content of a blog using arti cial
neural networks. The network is trained based on statistical information about
word frequencies and lexical information about word semantics extracted from
WordNet. The collaborative approach for tag recommendation, instead, presents
some analogies with collaborative ltering methods [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. In the model proposed
by Mishne and implemented in AutoTag [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], the system suggests tags based on
the other tags associated with similar posts in a given collection. The
recommendation process is performed in three steps: rst, the tool nds similar posts and
extracts their tags. All the tags are then merged, building a general folksonomy
that is ltered and reranked. The top-ranked tags are suggested to the user, who
selects the most appropriate ones to attach to the post. TagAssist [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] improves
the AutoTags' approach performing a lossless compression over existing tag data.
It nds similar blog posts and suggests a subset of the associated tag through a
Tag Suggestion Engine (TSE) which leverages previously tagged posts providing
appropriate suggestions for new content. In [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] the tag recommendations task
is performed through a user-based collaborative ltering approach. The method
seems to produce good results when applied on the user-tag matrix, so they show
that users with a similar tag vocabulary tend to tag alike. The problem of tag
recommendation through graph-based approaches has been rstly addressed by
Jaschke et al. in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. They compared some recommendation techniques including
collaborative ltering, PageRank and FolkRank. The key idea behind FolkRank
algorithm is that a resource which is tagged by important tags from
important users becomes important itself. The same concept holds for tags and users,
thus the approach uses a graph whose vertices mutually reinforce themselves
by spreading their weights. The evaluation showed that FolkRank outperforms
other approaches. Schmitz et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] proposed association rule mining as a
technique that might be useful in the tag recommendation process. In literature we
can nd also some hybrid methods integrating two or more approaches (mainly,
content and collaborative ones) in order to reduce their typical drawbacks and
point out their qualities. Heymann et. al [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] present a tag recommender that
exploits at the same time social knowledge and textual sources. They suggest tags
based on page text, anchor text, surrounding hosts, adding tags used by others
users to label the URL. The e ectiveness of this approach is also con rmed by
the use of a large dataset crawled from del.icio.us for the experimental
evaluation. A hybrid approach is also proposed by Lipczak in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Firstly, the system
extracts tags from the title of the resource. Afterwards, based on an analysis
of co-occurrences, the set of candidate tags is expanded adding also tags that
usually co-occur with terms in the title. Finally, tags are ltered and reranked
exploiting the information stored in a so-called "personomy", the set of the tags
previously used by the user.
        </p>
        <p>
          Finally, in [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] the authors proposed a model based on both textual content
and tags associated with the resource. They introduce the concept of con ated
tags to indicate a set of related tag (like blog, blogs, ecc.) used to annotate a
resource. Modeling in this way the existing tag space they are able to suggest
various tags for a given bookmark exploiting both user and document models.
They win the previous edition of the Tag Recommendation Challenge.
3
        </p>
      </sec>
      <sec id="sec-6-3">
        <title>Description of the Task</title>
        <p>STaR has been designed to participate at the ECML-PKDD 2009 Discovery
Challenge6. In this section we will rstly introduce a formal model for
recommendation in folksonomies, then we will analyze the speci c requirements of the
task proposed for the Challenge.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6 http://www.kde.cs.uni-kassel.de/ws/dc09</title>
      <p>3.1</p>
      <sec id="sec-7-1">
        <title>Recommendation in Folksonomies</title>
        <p>
          A collaborative tagging system is a platform composed of users, resources and
tags that allows users to freely assign tags to resources. Following the de nition
introduced in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], a folksonomy can be described as a triple (U; R; T ) where:
{ U is a set of users;
{ R is a set of resources ;
{ T is a set of tags.
        </p>
        <p>We can also de ne a tag assignment function tas: U R ! T . The tag
recommendation task for a given user u 2 U and a resource r 2 R can be nally
described as the generation of a set of tags tas(u; r) T according to some
relevance model. In our approach these tags are generated from a ranked set of
candidate tags from which the top n elements are suggested to the user.
3.2</p>
      </sec>
      <sec id="sec-7-2">
        <title>Description of the ECML-PKDD 2009 Discovery Challenge</title>
        <p>The 2009 edition of the Discovery Challenge consists of three recommendation
tasks in the area of social bookmarking. We compete for the rst task,
contentbased tag recommendation, whose goal is to exploit content-based
recommendation approaches in order to provide a relevant set of tags to the user when she
submits a new item (Bookmark or BibTeX entry) into Bibsonomy.</p>
        <p>The organizers make available a training set with some examples of tag
assignment: the dataset contains 263,004 bookmark posts and 158,924 BibTeX
entries submitted by 3,617 di erent users. For each of the 235,328 di erent URLs
and the 143,050 di erent BibTeX entries were also provided some textual
metadata (such as the title of the resource, the description, the abstract and so on).</p>
        <p>Each candidate recommender is evaluated by comparing the real tags (namely,
the tags a user adopts to annotate an unseen resource) with the suggested ones.
The accuracy is nally computed using classical IR metrics, such as Precision,
Recall and F1-Measure (Section 5.1).</p>
        <p>By analyzing the aforementioned requirements, we designed STaR thinking at
a prediction task rather than a recommendation one. Consequently, we will try to
emphasize the previous tagging activity of the user, also looking for connections
and patterns among resources. All these decisions will be thoroughly analyzed
in the next section describing the architecture of STaR.
4</p>
        <sec id="sec-7-2-1">
          <title>STaR: a Social Tag Recommender System</title>
          <p>
            STaR (Social Tag Recommender) is a content-based tag recommender system,
developed at the University of Bari. The inceptive idea behind STaR is to
improve the model implemented in systems like TagAssist [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] or AutoTag [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ].
Although we agree with the idea that resources with similar content could be
annotated with similar tags, in our opinion Mishne's approach presents two
important drawbacks:
1. The tag reranking formula simply performs a sum of the occurrences of each
tag among all the folksonomies, without considering the similarity with the
resource to be tagged. In this way tags often used to annotate resources with
a low similarity level could be ranked rst.
2. The proposed model does not take into account the previous tagging
activity performed by users. If two users bookmarked the same resource, they
will receive the same suggestions since the folksonomies built from similar
resources are the same.
          </p>
          <p>We will try to overcome these drawbacks, by proposing an approach based on
the analysis of similar resources capable also of weighting more the tags already
selected by the user during her previous tagging activity. Figure 1 shows the
general architecture of STaR. The recommendation process is performed in four
steps, each of which is handled by a separate component.</p>
        </sec>
      </sec>
      <sec id="sec-7-3">
        <title>4.1 Indexing of Resources</title>
        <p>Given a collection of resources (corpus), a preprocessing step is performed by the
Indexer module, which exploits Apache Lucene7 to perform the indexing step.
As regards bookmarks we indexed the title of the web page and the extended
description provided by users. For the BibteX entries we indexed the title of
the publication and the abstract. Let U be the set of users and N the
cardinality of this set, the indexing procedure is repeated N + 1 times: we build an
index for each user (Personal Index ) storing the information on her previously
tagged resources and an index for the whole community (Social Index ) storing
the information about all the resources previously tagged by the community.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>7 http://lucene.apache.org</title>
      <p>Following the de nitions presented in Section 3.1, given a user u 2 U we
de ne P ersonalIndex(u) as:</p>
      <p>P ersonalIndex(u) = fr 2 Rj9t 2 T : tas(u; r) = tg
where tas is the tag assignment function tas: U R ! T which assigns tags
to a resource annotated by a given user. SocialIndex represents the union of all
the user personal indexes:</p>
      <p>N
SocialIndex = [ P ersonalIndex(ui)</p>
      <p>i=1
4.2</p>
      <sec id="sec-8-1">
        <title>Retrieving of Similar Resources</title>
        <p>
          At the end of the preprocessing step STaR is able to take into account users
requests. Every user interacts with STaR by providing information about a
resource to be tagged. In the Query Processing step the system acquires data about
the user (her language, the tags she uses more, the number of tags she usually
uses to annotate resources, etc.) before processing (through the elimination of
not useful characters and punctuation) and submitting the query against the
SocialIndex stored in Lucene. If the user is recognized by the system since it has
previously tagged some other resources, the same query is submitted against
her own PersonalIndex, as well. We used as query the title of the web page
(for bookmarks) or the title of the publication (for BibTeX entries). In order
to improve the performances of the Lucene Querying Engine we replaced the
original Lucene Scoring function with an Okapi BM25 implementation8. BM25
is nowadays considered as one of the state-of-the art retrieval models by the IR
community [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>Let D be a corpus of documents, d 2 D, BM25 returns the top-k resources
with the highest similarity value given a resource r (tokenized as a set of terms
t1 : : : tm), and is de ned as follows:</p>
        <p>m
sim(r; d) = X
i=1 k1((1
b) + b
ntri
avlgeLnegnthgrthr ) + nr
ti
idf (ti)
(3)
where ntri represents the occurrences of the term ti in the document d, lengthr
is the length of the resource r and avgLengthr is the average length of resources
in the corpus. Finally, k1 and b are two parameters typically set to 2:0 and 0:75
respectively, and idf (ti) represents the inverse document frequency of the term
ti de ned as follows:
idf (ti) = log</p>
        <p>N + df (ti) + 0:5</p>
        <p>df (ti) + 0:5</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>8 http://nlp.uned.es/ jperezi/Lucene-BM25/</title>
      <p>(1)
(2)
(4)
where N is the number of resources in the collection and df (ti) is the number of
resources in which the term ti occurs.</p>
      <p>Given user u 2 U and a resource r, Lucene returns the resources whose
similarity with r is greater or equal than a threshold . To perform this task
Lucene uses both the PersonalIndex of the user u and the SocialIndex. More
formally:</p>
      <p>P ersonalRes(u; q) = fr 2 P ersonalIndex(u)jsim(q; r)
g
SocialRes(q) = fr 2 SocialIndexjsim(q; r)
g
In the next step the Tag Extractor gets the most similar resources returned by
the Apache Lucene engine and produces the set of candidate tags to be
suggested, by computing for each tag a score obtained by weighting the similarity
(5)
(6)
candT agss(q) = ft 2 T jt = T AS(u; r) ^ r 2 SocialRes(q) ^ u 2 U g
In the same way we can compute the relevance of each tag with respect to
the query q as:
(7)
(8)
(9)
(10)
(11)
(12)
relp(t; u; q) =</p>
      <p>P</p>
      <p>t
r2P ersonalRes(u;q) nr
nt</p>
      <p>sim(r; q)
rels(t; q) =</p>
      <p>P</p>
      <p>t
r2SocialRes(q) nr
nt
sim(r; q)
where ntr is the number of occurrences of the tag t in the annotation for resource
r and nt is the sum of the occurrences of tag t among all similar resources.</p>
      <p>Finally, the set of Candidate Tags can be de ned as:
score returned by Lucene with the normalized occurrence of the tag. If the Tag
Extractor also gets the list of the most similar resources from the user
PersonalIndex, it will produce two partial folksonomies that are merged, assigning a
weight to each folksonomy in order to boost users' previously used tags.</p>
      <p>Formally, for each query q (namely, the resource to be tagged), we can de ne
a set of tags to recommend by building two sets: candT agsp and candT agss.
These sets are de ned as follows:
candT agsp(u; q) = ft 2 T jt = T AS(u; r) ^ r 2 P ersonalRes(u; q)g
candT ags(u; q) = candT agsp(u; q) [ candT agss(q)
where for each tag t the global relevance can be de ned as:
rel(t; q) =
relp(t; q) + (1
) rels(t; q)
where (PersonalTagWeight) and (1 ) (SocialTagWeight) are the weights of
the personal and social tags respectively.</p>
      <p>Figure 3 depicts the procedure performed by the Tag Extractor : in this case
we have a set of 4 Social Tags (Newspaper, Online, Football and Inter) and 3
Personal Tags (Sport, Newspaper and Tuttosport). These sets are then merged,
building the set of Candidate Tags. This set contains 6 tags since the tag
newspaper appears both in social and personal tags. The system associates a score
to each tag that indicates its e ectiveness for the target resource. Besides, the
scores for the Candidate Tags are weighted again according to SocialTagWeight
( ) and PersonalTagWeight (1 ) values (in the example, 0:3 and 0:7
respectively), in order to boost the tags already used by the user in the nal tag rank.
Indeed, we can point out that the social tag `football' gets the same score of the
personal tag `tuttosport', although its original weight was twice.
The Tag Extractor produces the set of the Candidate Tags, a ranked set of
tags with their relevance scores. This set is exploited by the Filter, a component
which performs the last step of the recommendation task, that is removing those
tags not matching speci c conditions: we x a threshold for the relevance score
between 0.20 to 0.25 and we return at most 5 tags. These parameters are strictly
dependent from the training data.</p>
      <p>Formally, given a user u 2 U , a query q and a threshold value , the goal of
the ltering component is to build recommendation(u; q) de ned as follows:
recommendation(u; q) = ft 2 candT ags(u; q)jrel(t; q) &gt;
g
(13)</p>
      <p>In the example in Figure 3, setting a threshold
suggest the tags sport and newspaper.
= 0:20, the system would
5
5.1</p>
      <sec id="sec-9-1">
        <title>Experimental Evaluations</title>
        <sec id="sec-9-1-1">
          <title>Experimental Session</title>
          <p>In this experiment we measure the performance of STaR in the Task 1 of the
ECML-PKDD 2009 Discovery Challenge. This experimental evaluation was
carried out according to the instructions provided from the organizers of the
Challenge 2009. The test set was released 48 hours before the end of the competition.
Every participant uploaded a le containing the tag predictions, and for each
post only ve tags were considered. F1-Measure was used to evaluate the
accuracy of recommendations, thus for each post Precision and Recall were computed
by comparing the recommended tags with the true tags assigned by the users.
The case of tags was ignored and all characters which are neither numbers nor
letters were removed. Results are presented in Table 1.</p>
          <p>STaR nished the ECML-PKDD Discovery Challenge 2009 with an overall
F-measure of 13:55. As showed in the table above, exploiting only the rst
recommended tag the system reaches almost 20% in precision. The value of the
recall increases with the number of recommended tags reaching the 13.5% in
the fourth and fth tag. In the future we will perform a more in-depth study in
order to compare the predictive accuracy of STaR with di erent con gurations
of parameters.
6</p>
        </sec>
      </sec>
      <sec id="sec-9-2">
        <title>Conclusions and Future Work</title>
        <p>In this paper we presented STaR, a tag recommender designed and implemented
to participate to the ECML-PKDD 2009 Discovery Challenge. The idea behind
our work was to discover similarity among resources in order to exploit
communities and user tagging behavior. In this way our recommender system was
able to suggest tags for users and items still not stored in the training set. The
experimental sessions showed that users tend to reuse their own tags to annotate
similar resources, so this kind of recommendation model could bene t from the
use of the user personal tags before extracting the social tags of the community
(we called this approach user-based).</p>
        <p>In the future we will implement a methodology to suggest tags when the
set of similar items returned by Lucene is empty. The system should be able to
extract signi cant keywords from the textual content associated to a resource
(title, description, etc.) that has not similar items, maybe exploiting structured
data or domain ontologies. Another issue to investigate is the application of our
methodology in di erent domains such as multimedia environment. In this eld
discovering similarity among items just on the ground of textual content could
be not su cient. Finally, textual content su ers from syntactic problems like
polysemy (a keyword with two or more meanings) and synonymy (two or more
keywords with the same meaning). These problems hurt the performance of the
recommender. We will try to establish if a semantic document indexing could
improve the performance of the recommender.
18. Thomas Vander Wal. Folksonomy coinage and de nition. Website, Februar 2007.</p>
        <p>http://vanderwal.net/folksonomy.html.
19. Harris Wu, Mohammad Zubair, and Kurt Maly. Harvesting social knowledge from
folksonomies. In HYPERTEXT '06: Proceedings of the seventeenth conference on
Hypertext and hypermedia, pages 111{114, New York, NY, USA, 2006. ACM Press.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>R.</given-names>
            <surname>Baeza-Yates</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Ribeiro-Neto</surname>
          </string-name>
          .
          <article-title>Modern Information Retrieval</article-title>
          .
          <source>AddisonWesley</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D.</given-names>
            <surname>Billsus</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Pazzani</surname>
          </string-name>
          .
          <article-title>Learning collaborative information lters</article-title>
          .
          <source>In Proceeding of the 15th International Conference on Machine Learning</source>
          , pages
          <volume>46</volume>
          {
          <fpage>54</fpage>
          . Morgan Kaufmann, San Francisco, CA,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Brooks</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Montanez</surname>
          </string-name>
          .
          <article-title>Improved annotation of the blogosphere via autotagging and hierarchical clustering</article-title>
          .
          <source>In WWW '06: Proceedings of the 15th international conference on World Wide Web</source>
          , pages
          <volume>625</volume>
          {
          <fpage>632</fpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>C.</given-names>
            <surname>Cattuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schmitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baldassarri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. D. P.</given-names>
            <surname>Servedio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Loreto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hotho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grahl</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Stumme</surname>
          </string-name>
          .
          <article-title>Network properties of folksonomies</article-title>
          .
          <source>AI Communications</source>
          ,
          <volume>20</volume>
          (
          <issue>4</issue>
          ):
          <volume>245</volume>
          {
          <fpage>262</fpage>
          ,
          <year>December 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>S.</given-names>
            <surname>Golder</surname>
          </string-name>
          and
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Huberman</surname>
          </string-name>
          .
          <article-title>The Structure of Collaborative Tagging Systems</article-title>
          .
          <source>Journal of Information Science</source>
          ,
          <volume>32</volume>
          (
          <issue>2</issue>
          ):
          <volume>198</volume>
          {
          <fpage>208</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>P.</given-names>
            <surname>Heymann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ramage</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          .
          <article-title>Social tag prediction</article-title>
          .
          <source>In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>531</volume>
          {
          <fpage>538</fpage>
          , New York, NY, USA,
          <year>2008</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. R. Jaschke, L.
          <string-name>
            <surname>Marinho</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hotho</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Schmidt-Thieme</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Stumme</surname>
          </string-name>
          .
          <article-title>Tag recommendations in folksonomies</article-title>
          . In Alexander Hinneburg, editor,
          <source>Workshop Proceedings of Lernen - Wissensentdeckung - Adaptivit?t (LWA</source>
          <year>2007</year>
          ), pages
          <fpage>13</fpage>
          {
          <fpage>20</fpage>
          ,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Sigma</given-names>
            <surname>On</surname>
          </string-name>
          <article-title>Kee Lee and Andy Hon Wai Chun</article-title>
          .
          <article-title>Automatic tag recommendation for the web 2.0 blogosphere using collaborative tagging and hybrid ann semantic structures</article-title>
          .
          <source>In ACOS'07: Proceedings of the 6th Conference on WSEAS International Conference on Applied Computer Science</source>
          , pages
          <volume>88</volume>
          {
          <fpage>93</fpage>
          ,
          <string-name>
            <surname>Stevens</surname>
            <given-names>Point</given-names>
          </string-name>
          , Wisconsin, USA,
          <year>2007</year>
          . World Scienti c and Engineering Academy and
          <string-name>
            <surname>Society (WSEAS).</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>M.</given-names>
            <surname>Lipczak</surname>
          </string-name>
          .
          <article-title>Tag recommendation for folksonomies oriented towards individual users</article-title>
          .
          <source>In Proceedings of ECML PKDD Discovery Challenge (RSDC08)</source>
          , pages
          <fpage>84</fpage>
          {
          <fpage>95</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Leandro</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Marinho</surname>
            and
            <given-names>Lars</given-names>
          </string-name>
          <string-name>
            <surname>Schmidt-Thieme</surname>
          </string-name>
          .
          <article-title>Collaborative tag recommendations</article-title>
          . pages
          <volume>533</volume>
          {
          <fpage>540</fpage>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Adam</given-names>
            <surname>Mathes</surname>
          </string-name>
          .
          <article-title>Folksonomies - cooperative classi cation and communication through shared metadata</article-title>
          . http://www.adammathes.com/academic/computermediated-communication/folksonomies.html,
          <year>December 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Gilad</given-names>
            <surname>Mishne</surname>
          </string-name>
          .
          <article-title>Autotag: a collaborative approach to automated tag assignment for weblog posts</article-title>
          .
          <source>In WWW '06: Proceedings of the 15th international conference on World Wide Web</source>
          , pages
          <volume>953</volume>
          {
          <fpage>954</fpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Stephen</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Robertson</surname>
          </string-name>
          , Steve Walker,
          <string-name>
            <surname>Micheline H. Beaulieu</surname>
            , Aarron Gull, and
            <given-names>Marianna</given-names>
          </string-name>
          <string-name>
            <surname>Lau</surname>
          </string-name>
          .
          <article-title>Okapi at trec</article-title>
          .
          <source>In Text REtrieval Conference</source>
          , pages
          <volume>21</volume>
          {
          <fpage>30</fpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>G.</given-names>
            <surname>Salton</surname>
          </string-name>
          .
          <source>Automatic Text Processing. Addison-Wesley</source>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Christoph</surname>
            <given-names>Schmitz</given-names>
          </string-name>
          , Andreas Hotho, Robert Jschke, and
          <string-name>
            <given-names>Gerd</given-names>
            <surname>Stumme</surname>
          </string-name>
          .
          <article-title>Mining association rules in folksonomies</article-title>
          . In V. Batagelj, H.
          <string-name>
            <surname>-H. Bock</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ferligoj</surname>
          </string-name>
          , and A. ?iberna, editors,
          <source>Data Science and Classi cation (Proc. IFCS 2006 Conference)</source>
          ,
          <article-title>Studies in Classi cation</article-title>
          ,
          <source>Data Analysis, and Knowledge Organization</source>
          , pages
          <volume>261</volume>
          {
          <fpage>270</fpage>
          , Berlin/Heidelberg, July 2006. Springer. Ljubljana.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Sanjay</surname>
            <given-names>Sood</given-names>
          </string-name>
          , Sara Owsley, Kristian Hammond, and Larry Birnbaum.
          <article-title>TagAssist: Automatic Tag Suggestion for Blog Posts</article-title>
          .
          <source>In Proceedings of the International Conference on Weblogs and Social Media (ICWSM</source>
          <year>2007</year>
          ),
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>M. Tatu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Srikanth</surname>
          </string-name>
          , and
          <string-name>
            <surname>T. D'Silva</surname>
          </string-name>
          . Rsdc'
          <volume>08</volume>
          :
          <article-title>Tag recommendations using bookmark content</article-title>
          .
          <source>In Proceedings of ECML PKDD Discovery Challenge (RSDC08)</source>
          , pages
          <fpage>96</fpage>
          {
          <fpage>107</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>