<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Collaboration-based Social Tag Prediction in the Graph of Annotated Web Pages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hossein Rahmani</string-name>
          <email>hrahmani@liacs.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Behrooz Nobakht</string-name>
          <email>bnobakht@liacs.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hendrik Blockeel</string-name>
          <email>blockeel@liacs.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Katholieke Universiteit Leuven</institution>
          ,
          <addr-line>Celestijnenlaan 200A, 3001 Leuven</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Leiden Institute of Advanced Computer Science, Universiteit Leiden</institution>
          ,
          <addr-line>Niels Bohrweg 1, 2333 CA Leiden</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Di erent approaches based on content or tag information have been proposed to address the problem of tag recommendation for a web page. In this paper, we analyze two approaches in a graph of web pages. Each node is a web page and edges represent hyperlinks. The rst approach uses the content while the second one uses tag information in the graph. The second approach makes two assumptions about the tag set of two interacting nodes. The Tag Similarity Assumption claims that two interacting nodes discuss about rather similar topics; therefore, the chance of having more similar tag set is higher. The Tag Collaboration Assumption says that two interacting nodes complement each others topics. We apply algorithms such as Self Organizing Map (SOM), Reinforcement Learning (RL) and K-means clustering to compare methods on several datasets. We conclude that tag-based tag predictors outperform their content-based peers by more than ten percent with respect to the cosine similarity between predicted and actual tag sets.</p>
      </abstract>
      <kwd-group>
        <kwd>Social Tag Prediction</kwd>
        <kwd>Tag Similarity Assumption</kwd>
        <kwd>Tag Collaboration Assumption</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Social tagging systems have become increasingly popular in recent years. Various
domain applications such as search engines, recommendation systems, spam
detection and many others have improved their performance by considering these
types of information.</p>
      <p>Although using user's meta data has been shown useful for di erent
purposes, these new types of describing resources have their own problems. The
uncontrolled vocabulary nature of these data has inherent ambiguity and may
make it hard to get a coherent vision of di erent users.</p>
      <p>One possible way to solve the problems is to help users choose better
informative tags. The system could suggest some \good tags" to users and users may
or may not use those tags. In this paper, we proposed two di erent approaches
for the task of social tag prediction in a graph of web pages. For each approach
there are di erent implementations. The rst approach uses only the content
of the web pages for the task of tag prediction. The second approach considers
two assumptions about the tag set of two interacting web pages in a graph.
The rst assumption says that two interacting web pages have similar tag sets
(Tag Similarity Assumption) while the second one assumes that two
interacting web pages have \collaborative" tag sets (Tag Collaboration Assumption),
collaborative meaning in this case that the tag sets somehow complement each
other.</p>
      <p>The rest of the paper is organized as follows: Section 2 brie y reviews
motivations and related works. Section 3 discusses our approach for social tag
prediction. In Section 4, rst, we tune the parameters of our methods and then, we
present experimental results on ve web page graphs. We present our conclusions
in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>
        Various works have explored social annotations for di erent purposes. A lot of
methods designed to improve web search results by considering data from social
bookmarking systems ([
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]). Social annotation could be seen as a new way of
organizing information and categorizing resources ([
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]). Users of social tagging
systems could be connected to each other based on their areas of interests. The
term Folksonomy, a combination of folk and taxonomy, was rst proposed by
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. A general introduction of folksonomy could be found in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Ranking and
recommender systems for folksonomies are proposed in [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. While the above
mentioned approaches might look similar to our work but they are actually
orthogonal, since the authors do not directly address the problem of predicting
tags for resources.
      </p>
      <p>
        The approaches in [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] predicts tags based on the content and tag
cooccurance respectively, while our methods consider also the neighborhood
context of each web page for the tag prediction task. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] proposes
neighborhoodbased tag prediction by using content similarity. They apply a straightforward
scoring model to select the candidate tags, however, we use machine learning
methods for tag prediction. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] predicts tags from the set of 100 most frequent
tags found in del.icio.us by training the binary classi er for each tag. We do not
restrict ourselves to a prede ned set of tags which are predicted.
      </p>
      <p>
        Another important di erence between our method and [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ] is that our
method predicts the set of tags without considering any number of known tags
for a given web page, whereas the methods in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] are more \Social Tag
Expansion" methods; they start with some known tags for a web page and then
try to expand that set. In other words, we do not assume any knowledge about
the document's own tags, in contrast to [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>We introduce a new assumption \Tag Collaboration", which to our
knowledge is not discussed in any previous work. We also de ne the \Topic Locality"
characteristic of a web page graph, which is shown to a ect the optimal
parameter settings and the performance of the approach. These novelties sets aside our
methods from the related works.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Methods</title>
      <p>The graph of web pages is represented by G(P; E) in which P is a set of web
pages and E is a set of interactions between web pages. Each epq 2 E shows a
hyperlink between two web pages p 2 P and q 2 P . Let T be the set of all the
tags that occur in one dataset. Each classi ed web page p 2 P is annotated with
a jT j-dimensional vector T Sp that indicates the tag set of this web page: T Sp(ti)
is 1 if ti 2 T is assigned to the web page p, and 0 otherwise. T Sp can also be
seen as the set of all tags ti for which T Sp(ti) = 1. Similarly, the jT j-dimensional
vector N Bp describes how often each tag occurs in the neighborhood (all the
web pages that are reachable from p with a path of length at most 1) of web
page p. N Bp(ti) = n means that among all the web pages that interact with p,
n are annotated with tag ti.</p>
      <p>In this section, we discuss di erent approaches for solving the problem of
social tag prediction in a graph of web pages. We analyze two main approaches
for this purpose. The rst approach uses the content while the second one uses
tag information in the graph.
3.1</p>
      <sec id="sec-3-1">
        <title>Content-Based Tag Predictor</title>
        <p>In this approach, we use only the content of a web page for predicting its tags. We
use the standard bag-of-words approach: after stemming and stop word removal,
a vector is constructed in which each component is the frequency with which a
particular word occurs on the page.</p>
        <p>Most similar: Our rst implementation for this approach is a standard nearest
neighbor approach, which we refer to as \Most similar". In this implementation,
rst, we compare the content of the unannotated web page with the content of
all the annotated web pages in the graph, using cosine similarity (Formula 1).
After nding the most similar annotated web page, we select the top tags of the
annotated web page and assign them to the unannotated one.</p>
        <p>Cosine similarity (p; q) =
(1)
p:q
jpj jqj
K-means: In this approach, we rst cluster the web pages in the graph based on
their content. Then, we nd the most frequently occurring tags in each cluster.
The popular (most frequent) tags of each cluster will be assigned to all the
unannotated web pages in that cluster.</p>
        <p>
          We use K-means [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] for clustering the graph of web pages. We start
clustering the network with k random centers and iteratively assign the web pages to
these k clusters based on the content similarity between web pages and the
cluster centers. In each iteration, cluster centers move towards more balanced points
in the cluster. As similarity metrics we use both Jaccard similarity (Formula 2)
and Cosine similarity (Formula 1) of the tag sets.
        </p>
        <p>J accard similarity (p; q) = jp \ qj
jp [ qj
(2)
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Tag-Based Tag Predictors</title>
        <p>Contrary to the previous approach, here, we use only tag information available
in the graph for the task of tag prediction. We consider two assumptions behind
the web page interactions. In the rst assumption, two interacting web pages
discuss about same topics and as a result their tag set will be similar (Tag
Similarity assumption). Second assumption claims that interacting web pages
complement the topics of each other and they do not necessarily have the same
set of tags (Tag Collaboration assumption). We propose \Majority Rule Tag
Predictor" as one of the possible implementations for tag similarity assumption.
Two methods based on Reinforcement Learning (RL) and Self Organizing Map
(SOM) are proposed for implementing tag collaboration assumption. We describe
each proposed implementation in detail in the following:
Majority Rule Tag Predictor: This method simply implements tag similarity
assumption by nding the most common tag(s) among the neighbors of the
unannotated web page. Typically, a xed number of tags is predicted, for instance
the ve most frequently occurring tags in the neighborhood are predicted for the
unannotated web page.</p>
        <p>The hypothesis behind tag similarity based approaches is that web pages
with similar tags are always topologically close in the graph which all web pages
in actual graphs not necessarily corroborate this hypothesis.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Reinforcement Based Tag Predictor: In this method, we try to quan</title>
        <p>tify how strongly two tags ti and tj collaborate, in the following way. Let
T agColV al(ti; tj ) denote the strength of collaboration between ti and tj .</p>
        <p>We consider each annotated web page p 2 P in turn. If tag tj occurs in the
neighborhood of web page p (i.e., N Bp(tj ) &gt; 0) then we increase the
collaboration value between tag tj and all the tags in T Sp:
8ti 2 T Sp : T agColV al(ti; tj )+=</p>
        <p>N Bp(tj )</p>
        <p>R
support(tj )
If tag tj does not occur in the neighborhood of p (N Bp(tj ) = 0), we decrease
the collaboration value between tag tj and all the tags belonging to T Sp:
8ti 2 T Sp : T agColV al(ti; tj ) =</p>
        <p>P
support(tj )
support(tj) is the total number of times that tag tj appears on a side of
an edge epq in the graph. R and P are \Reward" and \Punish" coe cients
determined by the user.</p>
        <p>Next, we determine the candidate tags for an unannotated web page p and
rank them based on how well they collaborate with the neighborhood of web
page p. As an example of a candidate tag strategy, consider Majority Rule: this
method nominates all tags that appear in the direct neighborhood of the
unannotated web page (and among these, will select the most frequently occurring
ones). Here, we consider a number of extensions to this candidate tag strategy:
{ First Tag Level Strategy (First-TL): This strategy selects the tags that
appear in the direct neighborhood of the web page as candidate tags. This
strategy nominates tags similar to the Majority Rule method.
{ Second, Third, Fourth Tag Level Strategy (Second-TL, Third-TL,
FourthTL): De ne the n-neighborhood of a web page p as all the web pages that are
reachable from p with a path of length at most n (thus, the 2-neighborhood
includes all neighbors of neighbors of p, etc.). In Second-TL, Third-TL,
Fourth-TL, all the tags occurring in the 2-, 3- or 4-neighborhood of p,
respectively, are considered as candidate tags.
{ All Tag Strategy (All-TL): All the tags are taken into account in this strategy.</p>
        <p>After selecting candidate tags, we rank them based on how well they
collaborate with the neighborhood of unannotated web page p. Formula (3) assigns a
collaboration score to each candidate tag tc:</p>
        <p>Score(tc) = X N Bp(tj) T agColV al(tj; tc)
(3)
High score candidate tag(s) collaborate(s) better with the neighborhood of p and
are predicted as its tags.</p>
        <p>We call the above method the \Reinforcement based tag predictor", as it is
based on reinforcing collaboration values between tags as they are observed.
SOM Based Tag Predictor: Our second method makes use of a Self
Organizing Map (SOM) for the task of tag prediction in a graph of web pages. We
map the web page graph to a SOM as follows:
{ Input Layer: The number of input neurons equals the number of tags in
the web page graph. So, if inputN eurons is the set of all neurons in the
input layer then jinputN euronsj = jT j. The values we put in the input
layer are extracted from the neighborhood tag vector of the web page: if
inputN euron(i) is the i'th neuron in the input layer then inputN euron(i) =
N Bp(ti).
{ Output Layer: The number of output neurons equals to the number of
tags in the web page graph (joutputN euronsj = jT j). The values we put
in the output layer are extracted from the tag vector of the web page: if
outputN euron(i) is the i'th neuron in the output layer then outputN euron(i) =
T Sp(ti).
{ Network Initialization: Weights of the neurons can be initialized to small
random values; in our implementation we initialized all the weights to zero.
{ Adaption: Weights of winner neurons and neurons close to them in the
SOM lattice should be adjusted towards the input vector. The magnitude
of the change decreases with time and with the distance from the winner
neuron. Here, we take some new parameters into consideration which are
LearningRate(LR), DecreasingLearningRate(DecLR) and T erminateCriteria(T C)
parameters. LR is the change rate of the weights toward the input vector
and DecLR determines the change rate of LR in di erent iterations. T C
is the criteria in which the learning phase of SOM will terminate. Here, we
think of TC as the minimum amount of change required in one iteration:
when there is less change, the training procedure stops. We use Formula (4)
for updating weights of output neurons.</p>
        <p>Wij;New = Wi;j;Current + LR (N Bp(j)</p>
        <p>Wi;j;Current)
(4)
{ Testing: For each web page p in the graph that we did not use in the training
phase, we nd the Euclidean distance between N Bp and the weight vectors.
We select the output neurons which have the shortest Euclidean distance to
N Bp and predict them as the tag set of web page p. The number of predicted
tags is xed and determined by the user.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Empirical Results</title>
      <sec id="sec-4-1">
        <title>Dataset</title>
        <p>We were interested in graphs of web pages in which a) nodes are reasonably
interconnected to each other at least as form of a tree; and b) nodes are tagged in a
tagging system such as Delicious. Seemingly, there is no current data set available
providing such information for web pages. Starting with ECML PKDD 09 Data
Set 3 as the base, the rst issue was to update the weighted tag assignments on
web pages; we used URL's in the original data set to fetch their weighted tag
assignments from Delicious 4. The next consideration was the fact that there are
many web pages in the data set that are sparsely tagged at Delicious; thus, we
constrained the data set to the web pages annotated in Delicious with a minimum
tag assignment weight. To build the desired graphs, starting from a web page
in the data set, we included those neighboring URLs of the web page (and
the links to them) that were tagged at Delicious. The same procedure was then
applied to those neighboring URLs, and so on, up to a maximum crawling depth.
Table 1 shows brief statistical information of datasets we used for evaluating our
methods. Detailed information on data construction and preprocessing phase is
available at http://www.liacs.nl/~bnobakht/social-tag-prediction/.
3 http://www.kde.cs.uni-kassel.de/ws/dc09/dataset/#files
4 http://delicious.com/help/feeds</p>
        <p>Number of web page graph 27
Average page count of each web page graph 140.40
Average link count of each web page graph 311.03</p>
        <p>Average number of tags for each page 6.92
The methods we want to evaluate have some parameters for which good values
need to be found. In order to tune the method's parameters, we select ve
di erent graphs and tune the parameters on those graphs and then use the
tuned values for the other graphs.</p>
        <p>While Majority Rule assigns only tags from the direct neighborhood to a
web page, we are interested to nd out whether using candidate tags from a
wider neighborhood (including neighbors of neighbors) would be advantageous.
We tested this by extending the Majority Rule so that it can consider not only
direct neighbors, but also neighbors in the 2- or 3-neighborhood (as de ned in
Section 3.2). Figure (1) shows the e ect of considering a wider neighborhood in
Majority Rule in di erent datasets. There is no improvement over the base MR
method by considering the second and third neighborhood level.
feature in a graph of web pages. We de ne topicLocality(G) as the number of
distinct cluster tags divided by the number of clusters in the graph G (Formula
5).</p>
        <p>T opicLocality(G) =</p>
        <p>N umber of distinct cluster tags</p>
        <p>N umber of clusters
(5)</p>
        <p>For instance, if all the web pages in the graph talk about the same topic
then independent of di erent values for k, applying the K-means tag predictor
will always lead to the same result. In this case we have small topic locality.
But, if di erent parts of the graph discuss di erent topics, then we expect that
by increasing the number of clusters, we increase the topic locality and this
results in better tag prediction. Figure 3 shows the value of topicLocality in
three di erent graphs. We cluster each graph into four clusters. In graph G1, all
four clusters are about topic \Music", so the value of topicLocality is 41 in this
graph. Two clusters of graph G2 are about \Music", while the topics of the other
two clusters are \Sport" and \Food". There are 3 distinct topics and 4 clusters,
therefore the topicLocality of graph G2 equals 43 . In G3, all four clusters are
about distinct topics, so the topicLocality of G3 equals 44 . So, choosing the right
value for k in K-means algorithm is completely dependent to the topic locality
of that speci c graph.</p>
        <p>Figure 4-(a) shows the result of applying K-means tag predictor with di erent
values of k to graph G(142; 292) with high topic locality. In a case of k 5, there
are k big \topic divergent" clusters, so the average cosine similarity is small
(around 2%). As we increase the number of clusters the average cosine similarity
value improves which means topic locality value of this graph is high. The best
setting for K-means tag predictor is 60 K 64.</p>
        <p>Figure 4-(b) shows the e ect of choosing di erent values for k in graph
G(362; 934) with low topic locality. In this graph, the topic of the most of the
web pages are similar to each other and changing the value of k does not e ect
the average cosine similarity a lot.
4.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Comparison of Di erent Methods</title>
        <p>In this section, we compare content-based tag predictors (i.e., Most Similar and
K-means) with tag-based tag predictors (i.e., Majority Rule, RL and SOM) on
several datasets. We use average cosine similarity as the evaluation criterion.
We predict 5 tags for each unannotated web page in all methods and then we
compare the cosine similarity of di erent methods.</p>
        <p>In the proposed implementations, we use the parameter values tuned in the
previous section. Majority Rule (MR) selects the ve most frequently occurring
tags in the neighborhood of the web page in the network. As we discussed in
the previous section, choosing the right value for k in K-means algorithm is
completely dependent to the topic locality nature of that speci c graph. In this
section, we use xed value for k(= N5 ) for clustering all the graph. N equals the
number of web pages in the graph.</p>
        <p>Figure 5 compares content-based tag predictors with tag-based tag
predictors. Tag-based tag predictors outperform their content-based peers by more
than 10 percent with respect to cosine similarity metric.</p>
        <p>
          By assuming number of known tags for a given web page (\Social Tag
Expansion" methods [
          <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
          ]) or limit our methods predicting just small set of frequent
tags (like [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]), we could expect a gain in precision; however this could drastically
restrict the generality of our framework.
        </p>
        <p>After analyzing each graph individually, we believe that there is no xed
approach (or parameters) which works best in all di erent types of web-page
graphs. For example, considering \Topic locality" as a one out of many graph
characteristics, in the graph with low Topic Locality (G1 in Figure 3) it might
be better to use methods working based on the Tag Similarity assumption, while
in high Topic Locality graphs (G3 in Figure 3), it is recommended to use Tag
Collaboration based methods.</p>
        <p>(a) Graph G(142; 292) with high \Topic Locality" characteristics.
(b) Graph G(362; 934) with low \Topic Locality" characteristics.
To our knowledge, this is the rst study that considers graph of annotated web
pages for the task of tag prediction. We proposed two di erent approaches for
the task of social tag prediction in a graph of web pages. For each approach, we
recommend di erent implementations. The rst approach uses only the content
of the web pages for the task of tag prediction. For this approach we propose
\Most Similar" and \K-means" methods. Contrary to the rst approach, the
second approach uses only tag information available in the graph for the task of
tag prediction. It considers two assumptions about the tag set of two interacting
web pages in a graph. The rst assumption says that two interacting web pages
have similar tag sets (Tag Similarity Assumption) while the second one assumes
that two interacting web pages have collaborative tag sets (Tag Collaboration
Assumption). We proposed \Majority Rule" method as one of the possible
implementations for that similarity assumption. This method simply predicts most
frequently occurring tags in the neighborhood of unannotated web page. We
used two machine learning methods Self Organizing Map (SOM) and
Reinforcement Based Tag Predictor for implementing tag collaboration assumption. Both
methods rst learn the collaboration value between each pair of tags and then
at prediction time, they rank candidate tags based on how well they collaborate
with the neighborhood of unannotated web page.</p>
        <p>We compared content-based tag predictors with tag-based tag predictors and
we found out that Tag-based tag predictors outperform their content-based peers
by more than 10 percent with respect to cosine similarity metric. Among the
tag-based tag predictors, Majority Rule method predicts the best tags for
unannotated web pages which means Tag Similarity Assumption dominates Tag
Collaboration Assumption in the graph of web pages. So, in general, web pages in
the our dataset tend to discuss more about similar topics rather than
complementary topics.</p>
        <p>We also analyzed each graph individually and we concluded that graph
characteristics have direct impact on choosing the right method for social tag
prediction. We observed that in low Topic Locality graphs, results of Tag Similarity
methods outperform the results Tag Collaboration methods while in graphs with
high topic locality, it is better to apply Tag Collaboration methods.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Funding</title>
      <p>This research is funded by the Dutch Science Foundation (NWO) through VIDI
grant 639.022.605.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Han,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Kramer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>Substitution or complement: An empirical analysis on the impact of collaborative tagging on web search</article-title>
          .
          <source>In: WI '06: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence</source>
          , Washington, DC, USA, IEEE Computer Society (
          <year>2006</year>
          )
          <volume>757</volume>
          {
          <fpage>760</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fei</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Optimizing web search using social annotations</article-title>
          .
          <source>In: WWW '07: Proceedings of the 16th international conference on World Wide Web</source>
          , New York, NY, USA, ACM (
          <year>2007</year>
          )
          <volume>501</volume>
          {
          <fpage>510</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Aliakbary</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abolhassani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahmani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nobakht</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Web page classi cation using social tags</article-title>
          .
          <source>In: CSE '09: Proceedings of the 2009 International Conference on Computational Science and Engineering</source>
          , Washington, DC, USA, IEEE Computer Society (
          <year>2009</year>
          )
          <volume>588</volume>
          {
          <fpage>593</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Zubiaga</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
            <given-names>nez</given-names>
          </string-name>
          , R.,
          <string-name>
            <surname>Fresno</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Getting the most out of social annotations for web page classi cation</article-title>
          .
          <source>In: DocEng '09: Proceedings of the 9th ACM symposium on Document engineering</source>
          , New York, NY, USA, ACM (
          <year>2009</year>
          )
          <volume>74</volume>
          {
          <fpage>83</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Vander</given-names>
            <surname>Wal</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>Folksonomy de nition and wikipedia</article-title>
          .
          <source>(November</source>
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Quintarelli</surname>
          </string-name>
          , E.:
          <article-title>Folksonomies: power to the people</article-title>
          . ISKO
          <string-name>
            <surname>Italy-UniMIB Meeting</surname>
          </string-name>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hotho</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Jaschke, R.,
          <string-name>
            <surname>Schmitz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stumme</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Information retrieval in folksonomies: Search and ranking</article-title>
          . In Sure, Y.,
          <string-name>
            <surname>Domingue</surname>
          </string-name>
          , J., eds.
          <source>: ESWC</source>
          . Volume
          <volume>4011</volume>
          of Lecture Notes in Computer Science., Springer (
          <year>2006</year>
          )
          <volume>411</volume>
          {
          <fpage>426</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Jschke</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marinho</surname>
            ,
            <given-names>L.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hotho</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt-Thieme</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stumme</surname>
          </string-name>
          , G.:
          <article-title>Tag recommendations in folksonomies</article-title>
          . In Kok,
          <string-name>
            <given-names>J.N.</given-names>
            ,
            <surname>Koronacki</surname>
          </string-name>
          , J., de Mntaras,
          <string-name>
            <given-names>R.L.</given-names>
            ,
            <surname>Matwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Mladenic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Skowron</surname>
          </string-name>
          , A., eds.
          <source>: Knowledge Discovery in Databases: PKDD</source>
          <year>2007</year>
          ,
          <source>11th European Conference on Principles and Practice of Knowledge Discovery in Databases</source>
          , Warsaw, Poland,
          <source>September 17-21</source>
          ,
          <year>2007</year>
          , Proceedings. Volume
          <volume>4702</volume>
          of Lecture Notes in Computer Science., Springer (
          <year>2007</year>
          )
          <volume>506</volume>
          {
          <fpage>514</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Chirita</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Costache</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nejdl</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Handschuh</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>P-tag: large scale automatic generation of personalized annotation tags for the web</article-title>
          .
          <source>In: WWW '07: Proceedings of the 16th international conference on World Wide Web</source>
          , New York, NY, USA, ACM (
          <year>2007</year>
          )
          <volume>845</volume>
          {
          <fpage>854</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. Sigurbjornsson, B.,
          <string-name>
            <surname>van</surname>
            <given-names>Zwol</given-names>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          :
          <article-title>Flickr tag recommendation based on collective knowledge</article-title>
          .
          <source>In: WWW '08: Proceeding of the 17th international conference on World Wide Web</source>
          , New York, NY, USA, ACM (
          <year>2008</year>
          )
          <volume>327</volume>
          {
          <fpage>336</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Budura</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cudr-Mauroux</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aberer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Neighborhood-based tag prediction</article-title>
          . In Aroyo, L.,
          <string-name>
            <surname>Traverso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciravegna</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hyvnen</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mizoguchi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oren</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sabou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simperl</surname>
          </string-name>
          , E.P.B., eds.
          <source>: ESWC</source>
          . Volume
          <volume>5554</volume>
          of Lecture Notes in Computer Science., Springer (
          <year>2009</year>
          )
          <volume>608</volume>
          {
          <fpage>622</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Heymann</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramage</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Molina</surname>
          </string-name>
          , H.:
          <article-title>Social tag prediction</article-title>
          . In Myaeng,
          <string-name>
            <given-names>S.H.</given-names>
            ,
            <surname>Oard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.W.</given-names>
            ,
            <surname>Sebastiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Chua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.S.</given-names>
            ,
            <surname>Leong</surname>
          </string-name>
          , M.K., eds.: SIGIR,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2008</year>
          )
          <volume>531</volume>
          {
          <fpage>538</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>MacQueen</surname>
            ,
            <given-names>J.B.:</given-names>
          </string-name>
          <article-title>Some methods for classi cation and analysis of multivariate observations</article-title>
          . In Cam,
          <string-name>
            <given-names>L.M.L.</given-names>
            ,
            <surname>Neyman</surname>
          </string-name>
          , J., eds.
          <source>: Proc. of the fth Berkeley Symposium on Mathematical Statistics and Probability. Volume</source>
          <volume>1</volume>
          ., University of California Press (
          <year>1967</year>
          )
          <volume>281</volume>
          {
          <fpage>297</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>