<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An IR-based approach to Tag Recommendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cataldo Musto</string-name>
          <email>cataldomusto@di.uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fedelucio Narducci</string-name>
          <email>narducci@di.uniba.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco De Gemmis</string-name>
          <email>degemmis@di.uniba.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pasquale Lops</string-name>
          <email>lops@di.uniba.it</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Semeraro</string-name>
          <email>semeraro@di.uniba.it</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Recommender Systems, Web 2.0, Collaborative Tagging Sys-</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer Science, University of Bari 'Aldo Moro'</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Computer Science, University of Bari 'Aldo Moro'</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Dept. of Computer Science, University of Bari 'Aldo Moro'</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Dept. of Computer Science, University of Bari 'Aldo Moro'</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Dept. of Computer Science, University of Bari 'Aldo Moro'</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>tems</institution>
          ,
          <addr-line>Folksonomies</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <fpage>27</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>Thanks to the continuous growth of collaborative platforms like YouTube, Flickr and Delicious, we are recently witnessing to a rapid evolution of web dynamics towards a more 'social' vision, called Web 2.0. In this context collaborative tagging systems are rapidly emerging as one of the most promising tools. However, as tags are handled in a simply syntactical way, collaborative tagging systems suffer of typical Information Retrieval (IR) problems like polysemy and synonymy: so, in order to reduce the impact of these drawbacks and to aid at the same time the so-called tag convergence, systems that assist the user in the task of tagging are required. In this paper we present a system, called STaR, that implements an IR-based approach for tag recommendation. Our approach, mainly based on the exploitation of a stateof-the-art IR-model called BM25, relies on two assumptions: firstly, if two or more resources share some common patterns (e.g. the same features in the textual description), we can exploit this information supposing that they could be annotated with similar tags. Furthermore, since each user has a typical manner to label resources, a tag recommender might exploit this information to weigh more the tags she already used to annotate similar resources. We also present an experimental evaluation, carried out using a large dataset gathered from Bibsonomy.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>H.3.1 [Information Storage and Retrieval]: Content
Analysis and Indexing: Indexing methods; H.3.3 [Information
Storage and Retrieval]: Information Search and Retrieval:
Information filtering
1.</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>We are assisting to a transformation of the Web towards
a more user-centric vision called Web 2.0. By using Web 2.0
applications users are able to publish auto-produced
contents such as photos, videos, political opinions, reviews, hence
they are identified as Web prosumers: producers + consumers
of knowledge. Recently the research community has
thoroughly analyzed the dynamics of tagging, which is the act
of annotating resources with free labels, called tags. These
systems provide heterogeneous contents (photos, videos,
musical habits, etc.), but they all share a common core: they
let users to post new resources and to annotate them with
tags. Besides the simple act of annotation, the tagging of
resources has also a key social aspect; the connection
between users, resources and tags generates a tripartite graph
that can be easily exploited to analyze the dynamics of
collaborative tagging systems. Since folksonomies do not rely
on a predefined lexicon or hierarchy they have the main
advantage to be fully free, but at the same time they generate
a very noisy tag space, really hard to exploit for retrieval
or recommendation tasks without performing any form of
processing.</p>
      <p>This problem is a hindrance to completely exploit the
expressive power of folksonomies, so in the last years many
tools have been developed to assist the user in the task of
tagging and to aid at the same time the tag convergence: we
refer to them as tag recommenders.</p>
      <p>
        This paper presents STaR, a tag recommender system
implementing an IR-based approach that relies on a
state-ofthe-art IR model called BM25. In this work, already
presented [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],within the ECML-PKDD 2009 Discovery
Challenge1, we tried to point out two concepts:
• resources with similar content should be annotated
with similar tags;
• a tag recommender needs to take into account the
previous tagging activity of users, increasing the weight
of the tags already used to annotate similar resources.
      </p>
      <sec id="sec-2-1">
        <title>1http://www.kde.cs.uni-kassel.de/ws/dc09</title>
        <p>The paper is organized as follows. Section 2 analyzes
related work. Section 3 explains the architecture of the system
and how the recommendation approach is implemented. The
experimental evaluation carried out is described in Section
4, while conclusions and future work are drawn in the last
section.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>Usually the works in the tag recommendation area are
broadly divided into three classes: content-based,
collaborative and graph-based approaches.</p>
      <p>
        In the content-based approach, exploiting some
Information Retrieval-related techniques, a system is able to
extract relevant unigrams or bigrams from the text. Brooks
et. al [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], for example, develop a tag recommender system
that exploits TF/IDF scoring in order to automatically
suggests tags for a blog post.
      </p>
      <p>
        AutoTag [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is one of the most important systems
implementing the collaborative approach for tag recommendation.
It presents some analogies with collaborative filtering
methods. As in the collaborative recommender systems the
recommendations are generated based on the ratings provided
by similar users (called neighbors), in AutoTag the system
suggests tags based on the other tags associated with similar
posts.
      </p>
      <p>
        The problem of tag recommendation through graph-based
approaches has been firstly addressed by Ja¨schke et al. in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
The key idea behind their FolkRank algorithm is that a
resource which is tagged by important tags from important
users becomes important itself. Furthermore, Schmitz et
al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposed association rule mining as a technique that
might be useful in the tag recommendation process.
      </p>
    </sec>
    <sec id="sec-4">
      <title>STAR: A SOCIAL TAG RECOMMENDER</title>
    </sec>
    <sec id="sec-5">
      <title>SYSTEM</title>
      <p>
        STaR (Social Tag Recommender) is a content-based tag
recommender system, developed at the University of Bari.
The inceptive idea behind STaR is to improve the model
implemented in systems like TagAssist [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or AutoTag [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Although we agree that similar resources usually share
similar tags, in our opinion Mishne’s approach presents two
important drawbacks:
1. the tag re-ranking formula simply performs a sum of
the occurrences of each tag among all the folksonomies,
without considering the similarity with the resource to
be tagged. In this way tags often used to annotate
resources with a low similarity level could be ranked
first;
2. the proposed model does not take into account the
previous tagging activity performed by users. If two
users bookmarked the same resource, they will receive
the same suggestions since the folksonomies built from
similar resources are the same.</p>
      <p>We will try to overcome these drawbacks, by proposing an
approach firstly based on the analysis of similar resources
capable also of leveraging the tags already selected by the
user during her previous tagging activity, by putting them
on the top of the tag rank.</p>
      <p>Figure 1 shows the general architecture of STaR.
3.1</p>
    </sec>
    <sec id="sec-6">
      <title>Indexing of Resources</title>
      <p>Given a collection of resources (corpus) with some textual
metadata (such as the title of the resource, the authors, the
description, etc.), STaR firstly invokes the Indexer module
in order to perform a preprocessing step on these data by
exploiting Apache Lucene2. Obviously, the kind of metadata
to be indexed is strictly dependent on the nature of the
resources. Let U be the set of users and N the cardinality
of this set, the indexing procedure is repeated N + 1 times:
we build an index for each user (Personal Index ) storing the
information on the resources she previously tagged and an
index for the whole community (Social Index ) storing the
information about all the tagged resources by merging the
Personal Indexes.
3.2</p>
    </sec>
    <sec id="sec-7">
      <title>Retrieval of Similar Resources</title>
      <p>STaR can take into account users requests in order to
produce personalized tag recommendations for each resource.
First, every user has to provide some information about the
resource to be tagged, such as the title of the Web page or
its URL, in order to crawl the textual metadata associated
on it. Next, if the system can identify the user since she
has already posted other resources, it exploits data about
her (language, the tags she uses more, the number of tags
she usually uses to annotate resources, etc.) in order to
refine the query to be submitted against both the Social and
Personal indexes stored in Lucene.</p>
      <p>
        In order to improve the performances of the Lucene
Querying Engine we replaced the original Lucene Scoring function
with an Okapi BM25 implementation3. BM25 is nowadays
considered as one of the state-of-the art retrieval models by
the IR community [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Let D be a corpus of documents, d ∈ D, BM25 returns
the top-k resources with the highest similarity value given
a resource r (tokenized as a set of terms t1 . . . tm), and is
defined as follows:</p>
      <p>m r
sim(r, d) = X nti r ∗ idf (ti)
i=1 k1((1 − b) + b ∗ l) + nti
where ntri represents the occurrences of the term ti in the
document d, l is the ratio between the length of the resource
and the average length of resources in the corpus. Finally, k1
and b are two parameters typically set to 2.0 and 0.75
respectively, and idf (ti) represents the inverse document frequency
of the term ti defined as follows:
idf (ti) = log</p>
      <p>N − df (ti) + 0.5
df (ti) + 0.5
(1)
(2)
where N is the number of resources in the collection and
df (ti) is the number of resources in which the term ti occurs.
Given a user u and a resource r, Lucene returns the resources
whose similarity with r is greater or equal than a threshold
β. To perform this task Lucene uses both the PersonalIndex
of the user u and the SocialIndex.</p>
      <p>For example, we suppose that the target resource is
represented by Gazzetta.it, one of the most famous Italian sport
newspaper. Lucene queries the SocialIndex and it could
returns as the most similar resources an online newspaper
(Corrieredellosport.it) and the official web site of an Italian</p>
      <sec id="sec-7-1">
        <title>2http://lucene.apache.org</title>
        <p>3http://nlp.uned.es/ jperezi/Lucene-BM25/
Football Club (Inter.it). The PersonalIndex, instead, could
return another online newspaper (Tuttosport.com).
2, setting a threshold γ = 0.20, the system would suggest
the tags sport and newspaper.
3.3</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Extraction of Candidate Tags</title>
      <p>The role of the Tag Extractor is to produce as output
the list of the so-called “candidate tags” (namely, the tags
considered as ‘relevant’ by the tag recommender). In this
step the system gets the most similar resources returned
by the Apache Lucene engine and builds their folksonomies
(namely, the tags they have been annotated with). Next, it
produces the list of candidate tags by computing for each
tag from the folksonomy a score obtained by weighting the
similarity score returned by Lucene with the normalized
occurrence of the tag. If the Tag Extractor also gets the list of
the most similar resources from the user PersonalIndex, it
will produce two partial folksonomies that are merged,
assigning a weight to each folksonomy in order to boost the
tags previously used by the user.</p>
      <p>Figure 2 depicts the procedure performed by the Tag
Extractor : in this case we have a set of 4 Social Tags
(Newspaper, Online, Football and Inter) and 3 Personal Tags (Sport,
Newspaper and Tuttosport). These sets are then merged,
building the set of Candidate Tags. This set contains 6 tags
since the tag newspaper appears both in social and personal
tags. The system associates a score to each tag that
indicates its effectiveness for the target resource. Besides, the
scores for the Candidate Tags are weighted again according
to SocialTagWeight (α) and PersonalTagWeight (1 − α)
values (in the example, 0.3 and 0.7 respectively), in order to
boost the tags already used by the user in the final tag rank.
Indeed, we can point out that the social tag ‘football’ gets
the same score of the personal tag ‘tuttosport’, although its
original weight was twice.
3.4</p>
    </sec>
    <sec id="sec-9">
      <title>Tag Recommendation</title>
      <p>Finally, the last step of the recommendation process is
performed by the Filter. It removes from the list of
candidate tags those not matching specific conditions, such as
a threshold for the relevance score computed by the Tag
Extractor. Obviously, the value of the threshold and the
maximum number of tags to be recommended are strictly
dependent from the training data. In the example in Figure
4.</p>
    </sec>
    <sec id="sec-10">
      <title>EXPERIMENTAL EVALUATION</title>
      <p>The goal of experimental session was to tune the system
parameters in order to obtain the best effectiveness of the
tag recommender. We exploited a large dataset gathered
from Bibsonomy.
4.1</p>
    </sec>
    <sec id="sec-11">
      <title>Description of the dataset</title>
      <p>The dataset used for the experimental evaluation contains
263,004 bookmark posts and 158,924 BibTeX entries
submitted by 3,617 different users. For each of the 235,328
different URLs and the 143,050 different BibTeX entries were
also provided some textual metadata (such as the title of the
resource, the description, the abstract and so on). We
evaluated STaR by comparing the real tags (namely, the tags a
user adopts to annotate an unseen resource) with the
suggested ones. The accuracy was finally computed using
classical IR metrics, such as Precision, Recall and F1-Measure.
4.2</p>
    </sec>
    <sec id="sec-12">
      <title>Experimental Session</title>
      <p>Firstly, we tried to evaluate the influence of different Lucene
scoring functions on the performance of STaR. We randomly
chose 10,000 resources from the dataset and we compared
the results returned exploiting two different scoring
functions (the Lucene original one and the BM25) in order to
find the best one. We performed the same steps previously
described, retrieving the most similar items using the two
mentioned similarity functions and comparing the tags
suggested by the system in both cases. Results are presented in
Table 1. In general, there is a low improvement by adopting
BM25 with respect to the Lucene original similarity
function. We can note that BM25 improved the recall of
bookmarks (+ 6,95%) and BibTeX entries (+1,46%).</p>
      <p>Next, using the BM25 as scoring function, we tried to
compare the predictive accuracy of STaR with different
combinations of system parameters. Namely:
• the maximum number of similar documents retrieved
by Lucene;
• the value of α for the PersonalTagWeight and
Social• the threshold γ to establish whether a tag is relevant;
• which fields of the target resource use to compose the
query.</p>
      <p>Tuning the number of similar documents to retrieve from
the PersonalIndex and SocialIndex is very important, since
a value too high can introduce noise in the retrieval process,
while a value too low can exclude documents containing
relevant tags. By analyzing the results returned by some test
queries, we decided to set this value between 5 and 10,
depending on the training data.</p>
      <p>Next, we tried to estimate the values for
PersonalTagWeight (PTW) and the SocialTagWeight (STW). A higher
weight for the Personal Tags means that in the
recommendation process the systems will weigh more the tags previously
used by the target user, while a higher value for the
Social Tags will give more importance to the tags used by the
community (namely, the whole folksonomy) on the target
resource. These parameters are biased by the user practice:
if tags often used by the user are very different from those
used from the community, the PTW should be higher than
STW. We performed an empirical study since it is difficult to
define the user behavior at run time. We tested the system
setting the parameters with several combinations of values:</p>
      <sec id="sec-12-1">
        <title>Comm.-based</title>
        <p>User-based</p>
      </sec>
      <sec id="sec-12-2">
        <title>Hybrid</title>
        <p>Hybrid
Hybrid
Baseline
i) PTW = 0.7 STW = 0.3;
ii) PTW = 0.5 STW = 0.5;
iii) PTW = 0.3 STW = 0.7.</p>
        <p>Another parameter that can influence the system
performance is the set of fields to use to compose the query. For
each resource in the dataset there are many textual fields,
such as title, abstract, description, extended description, etc.
In this case we used as query the title of the webpage (for
bookmarks) and the title of the publication (for BibTeX
entries). The last parameter we need to tune is the threshold
to deem a tag as relevant (γ).We performed some tests
suggesting both 4 and 5 tags and we decided to recommend
only 4 tags since the fifth was usually noisy. We also fixed
the threshold value between 0.20 and 0.25. In order to carry
out this experimental session we used the aforementioned
dataset both as training and test set. We executed the test
over 50, 000 bookmarks and 50, 000 BibTeXs. Results are
presented in Table 2 and Table 3.</p>
        <p>
          Analyzing the results, it emerges that the approach we
called user-based outperformed the other ones. In this
configuration we set PTW to 1.0 and STW to 0, so we suggest
only the tags already used by the user in tagging similar
resources. No query was submitted against the
SocialIndex. The first remark we can make is that each user has
her own mental model and her own vocabulary: she
usually prefers to tag resources with labels she already used.
Instead, getting tags from the SocialIndex only (as proved
by the results of the community-based approach) often
introduces some noise in the recommendation process. The
hybrid approaches outperformed the community-based one,
but their predictive accuracy is still worse when compared
with the user-based approach. Finally, all the approaches
outperformed the F1-measure of the baseline. We computed
the baseline recommending for each resource only its most
popular tags. Obviously, for resources never tagged we could
not suggest anything. This analysis substantially confirms
the results we obtained from other studies performed in the
area of the tag-based recommendation [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>Nowadays, collaborative tagging systems are powerful tools
but they are affected from some drawbacks since the
complete tag space is too noisy to be exploited for retrieval and
filtering tasks. In this paper we presented STaR, a social tag
recommender system. The idea behind our work was to
discover similarity among resources exploiting a state-of-the-art
IR-model called BM25. The experimental sessions showed
that users tend to reuse their own tags to annotate similar
resources, so this kind of recommendation model could
benefit from the use of the user personal tags before extracting
the social tags of the community (we called this approach
user-based).</p>
      <p>This approach has a main drawback, since it cannot
suggest any tags when the set of similar items returned by
Lucene is empty. We are planning to extend the system in
order to extract significant keywords from the textual
content associated to a resource (title, description, etc.) that
has no similar items, maybe exploiting structured data or
domain ontologies.</p>
      <p>Furthermore, since tags usually suffer of typical
Information Retrieval problem (polysemy, etc.) we will try to
establish whether the integration of Word Sense Disambiguation
algorithms or a semantic representation of documents could
improve the performance of the recommender.</p>
      <p>Anyhow, our approach resulted promising compared with
already existing and state of the art approaches for tag
recommendation. Indeed, our work classified in 6th position in
the final results of the ECML-PKDD 2009 Discovery
Challenge (id: 29723)4</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , M. de Gemmis, P. Lops, G. Semeraro,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Musto</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Narducci</surname>
          </string-name>
          .
          <article-title>FIRSt: a Content-based Recommender System Integrating Tags for Cultural Heritage Personalization</article-title>
          . In P. Nesi,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and J. Delgado, editors,
          <source>Proceedings of the 4th International Conference on Automated Solutions for Cross Media Content and Multi-channel Distribution (AXMEDIS</source>
          <year>2008</year>
          )
          <article-title>- Workshop Panels</article-title>
          and Industrial Applications, Florence, Italy, Firenze University Press, pages
          <fpage>103</fpage>
          -
          <lpage>106</lpage>
          , November 17-
          <issue>19</issue>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Brooks</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Montanez</surname>
          </string-name>
          .
          <article-title>Improved annotation of the blogosphere via autotagging and hierarchical clustering</article-title>
          .
          <source>In WWW '06: Proceedings of the 15th international conference on World Wide Web</source>
          , pages
          <fpage>625</fpage>
          -
          <lpage>632</lpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ja</surname>
          </string-name>
          ¨schke, L.
          <string-name>
            <surname>Marinho</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hotho</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Schmidt-Thieme</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Stumme</surname>
          </string-name>
          .
          <article-title>Tag recommendations in folksonomies</article-title>
          . In Alexander Hinneburg, editor,
          <source>Workshop Proceedings of Lernen - Wissensentdeckung - Adaptivit?t (LWA</source>
          <year>2007</year>
          ), pages
          <fpage>13</fpage>
          -
          <lpage>20</lpage>
          ,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Mishne.</surname>
          </string-name>
          <article-title>Autotag: a collaborative approach to automated tag assignment for weblog posts</article-title>
          .
          <source>In WWW '06: Proceedings of the 15th international conference on World Wide Web</source>
          , pages
          <fpage>953</fpage>
          -
          <lpage>954</lpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Musto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Narducci</surname>
          </string-name>
          , M. de Gemmis, P. Lops, and
          <string-name>
            <surname>G. Semeraro.</surname>
          </string-name>
          <article-title>STaR: a Social Tag Recommender System</article-title>
          . In Folke Eisterlehner, Andreas Hotho, and Robert Ja¨schke, editors,
          <source>ECML PKDD Discovery Challenge 2009 (DC09)</source>
          , volume
          <volume>497</volume>
          <source>of CEUR Workshop Proceedings, September 7</source>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Beaulieu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gull</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lau</surname>
          </string-name>
          .
          <article-title>Okapi at trec</article-title>
          .
          <source>In Text REtrieval Conference</source>
          , pages
          <fpage>21</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Schmitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hotho</surname>
          </string-name>
          , R. Ja¨schke, and
          <string-name>
            <given-names>G.</given-names>
            <surname>Stumme</surname>
          </string-name>
          .
          <article-title>Mining association rules in folksonomies</article-title>
          . In V. Batagelj, H.
          <string-name>
            <surname>-H. Bock</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ferligoj</surname>
          </string-name>
          , and A. O˝ iberna, editors,
          <source>Data Science and Classification (Proc. IFCS 2006 Conference)</source>
          ,
          <source>Studies in Classification, Data Analysis, and Knowledge Organization</source>
          , pages
          <fpage>261</fpage>
          -
          <lpage>270</lpage>
          , Berlin/Heidelberg, July 2006. Springer. Ljubljana.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Owsley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hammond</surname>
          </string-name>
          , and
          <string-name>
            <surname>L. Birnbaum.</surname>
          </string-name>
          <article-title>TagAssist: Automatic Tag Suggestion for Blog Posts</article-title>
          .
          <source>In Proceedings of the International Conference on Weblogs and Social Media (ICWSM</source>
          <year>2007</year>
          ),
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>