Content-based and Graph-based Tag Suggestion

         Xiance Si, Zhiyuan Liu, Peng Li, Qixia Jiang, Maosong Sun

               State Key Lab on Intelligent Technology and Systems
               National Lab for Information Science and Technology
Dept. of Computer Science&Technology, Tsinghua University, Beijing 100084, China
     {adam.si, lzy.thu, pengli09, qixia.jiang}@gmail.com, sms@tsinghua.edu.cn
                           http://nlp.csai.tsinghua.edu.cn


      Abstract. Social tagging is a popular and convenient way to organize
      information. Automatic tag suggestion can ease the user’s tagging activ-
      ity. In this paper, we exam both content-based and graph-based meth-
      ods for tag suggestion using the BibSonomy dataset, and describe our
      methods for ECML/PKDD Discovery Challenge 2009 submissions . In
      content-based tag suggestion, we propose a fast yet accurate method
      named Feature-Driven Tagging. In graph-based tag suggestion, we apply
      DiffusionRank to solve the problem, and get a better result than current
      state-of-the-art methods in cross-validation.


1   Introduction

Social tagging, aka, folksonomy, is a popular way to organize resources like doc-
uments, bookmarks and photos. Resource, tag and user are three essential parts
in a social tagging system, a user uses tags to describe resources. Tag suggestion
system eases the process of social tagging. It can suggest tags to new resources
based on previous tagged resources.
     To promote related research, ECML/PKDD organizes a open contest of tag
suggestion systems, named Discovery Challenge 2009 (DC09 in short). In this
contest a snapshot of users, documents and tags in the online bookmarking
system BibSonomy is provided. Each team trains their suggestion system on the
snapshot, and test the performance on the same test dataset. There are 3 tasks
in the contest. Task 1 focuses on suggesting tags by the content of the resources,
i.e, content-based tag suggestion. Task 2 focuses on suggesting tags by the tri-
partite links between resources, tags and users, i.e., graph-based tag suggestion.
Task 3 puts the suggestion system into real-life situation by integrating it with
BibSonomy website, and see which system predicts the user’s intention best.
     In this paper, we describe our methods for the three tasks. For Task 1 and 3,
we propose a fast tag suggestion method called Feature-Driven Tagging (FDT).
FDT indexes tags by features, where feature can be word, resource ID, user ID
or others. For each feature, FDT keeps a list of weighted tags, the higher the
weight, the more likely the tag is suggested by the feature. For a new resource,
each feature in it suggests a list of weighted tags, the suggestions are combined
according to the importance of features to get the final suggestion. Compared to
other methods, FDT provides suggestions faster, and the speed is only related
with the number of features in the resource(number of words in the content).
    For Task 2, we apply two existing methods, most popular tags and FolkRank,
for graph-based suggestion. Furthermore, we propose to use a new graph-based
ranking model, DiffusionRank, for tag suggestion. The method of “most popular
tags” is the simplest collaborative-filtering based methods. It recommends the
most popular tags of the resources used by other users. FolkRank is based on
PageRank [1] on user-resource-tag tripartite graph, which was first proposed
as a tag suggestion method in [2]. DiffusionRank was originally proposed for
combating web spam [3], which has also been successfully used in social network
analysis [4] and search query suggestion [5]. DiffusionRank is motivated by the
heat diffusion process, which can be used for ranking because the activities flow
on the graph can be imagined as heat flow, the edge from a vertex to another
can be treated as the pipe of an air-conditioner for heat flow. Compared to
PageRank, DiffusionRank provides more flexible mechanism to make the ranking
scores related to initial values of the vertices, which is important for graph-based
tag suggestion.
    The paper is organized as follows. Section 2 formulates the problem of tag
suggestion. Section 3 introduces our method for content-based tag suggestion.
Section 4 introduces our method for graph-based tag suggestion. Section 5 de-
scribes the dataset, experiment settings and the result. Section 6 introduces
related work on tag suggestion. Section 7 concludes the paper.


2   Problem Formulation
We adopt the model of social tagging proposed by Jaschke et al [2]. A social
tagging data set is defined as a tuple F := (U, T, R, Y ), where U is the set of
users, T is the set of tags and R is the set of resources. Y is a ternary relation
between U, T and R, Y ⊆ U × T × R. (u, r, t) ∈ Y is called a tag assignment,
which means user u assigned the tag t to resource r. A resource r ∈ R can be
described with a piece of text, like titles of a paper or user-edited description of
a website. We denote the words in the text as {wi }.                      S S
    Resources, users and tags form a graph G = (V, E), where V = U R T ,
and E = {{u, t}, {u, r}, {r, t}|(u, t, r) ∈ Y }. The goal of tag suggestion is to
predict the set of tags {t} for a given pair of user and resource (u, r).
    In related literature, social tags are also called folksonomy, the pair of a
resource and a user is also called a post.


3   Content-based Tag Suggestion
In this section, we propose a content-based tag suggestion method named Feature-
Driven Tagging(FDT). Briefly speaking, FDT is a voting model, where each fea-
ture in the resource votes for their favorite tags, and the final scores of tags
are averaged by the importance of the features. Figure 1 illustrates the tagging
procedure of FDT, it consists of 3 steps: feature extraction, feature weighting
and tag voting. For a resource with content, FDT first extracts features from
the content. Features include but are not limited to words, resource ID and user
ID. Then, FDT weights each feature by their importance in the resource, we ex-
plain different ways to compute the importance of features later in this section.
In the voting step, each feature contributes a weighted list of tags, the higher
the weight, the more likely we should suggest the tag. Weight of a tag from
different features are combined by the importance of each feature, thus creates
the final weighted list of tags. In the tagging process, all parameters are indexed
by feature, we do not need to iterate over all tags (as in text categorization
approaches) or resources (as in neighborhood-based approaches), so it is called
Feature-Driven Tagging.


                                                              UID-264
                                                                                                 0.532 : UID-264
      UserID: 264                                        RID-43EFAD583EF
                                                                                                 0.496 : tribler
                                                               power
                                                                                                 0.427 : bittorrent
      ResourceID: 43EFAD583EF                              management
                                                                                                 0.412 : pouwelse
                                                              portable
                                                                                                 0.270 : p2p
      Power management for portable                           devices
                                                                                                 0.163 : devices
      devices, P2P Bittorrent Tribler,                          p2p
                                                                                                 0.109 : power
       J.A. Pouwelse.                                        bittorrent
                                                                                                 0.086 : management
                                                              tribbler
                                         Feature                                Feature          ...
                                                             pouwelse
                                         Extraction                             Weighting


                                                                                                       Voting


                                                                   0.532 : UID-264 => p2p(1.0)
       2.137 : p2p                                                 0.496 : tribler => p2p(1.0)
       0.216 : peertopeer                                          0.427 : bittorrent => p2p(1.0)
       0.124 : powermanagement                                     0.412 : pouwelse => p2p(1.0)
       0.163 : mobile                                              0.270 : p2p => p2p(1.0) peertopeer(0.8)
       0.098 : device                                              0.163 : devices => mobile(1.0) device(0.6) litreview(0.36)
       0.109 : power                                               0.109 : power => power(1.0) powermanagement(0.6)
       ....                                                        0.086 : management => management(1.0)
                                                                      knowledgemanagement(0.7)
                                            Get Result                powermanagement(0.68)
                                                                   ...


                           Fig. 1. The procedure of Feature-Driven Tagging.


3.1    Feature Extraction

We extract features from different sources. Word features are extracted from tex-
tual content of resources, we use them to capture the relationship between words
and tags. For bibtex, the textual content is title + bibtexAbstract + journal +
booktitle + annote + note + description; For bookmark, it is description +
extended. We also include simhash1 and the user ID of a resource as a fea-
ture. The same publication or website share the same simhash1, we use it to
capture the tags assigned by other users. We use user ID as a feature so as to
model a user’s preferences of tagging.
3.2   Compute the Importance of Features
We use two methods to assess the importance of features in a resource. The first
and most intuitive one is T F × IDF . T F × IDF is widely used in information
retrieval, text categorization and keyword extraction [6]. We use log-version of
T F × IDF , which computes as

                                               n(f )              |R|
                 T F × IDF (f ) = log(               + 1) ∗ log(         + 1)              (1)
                                                N                df (f )
    where n(f ) is the number of occurrences of f in this resource, and N is the
total number of features occurred in this resource. |R| is the total number of
resources, df (f ) is the number of resources f has occurred in. The +1 is to avoid
zero or negative weights.
    The other method we used is T F × IT F , ITF stands for Inverse Tag Fre-
quency, it computes as follows,

                                            n(f )               |T |
                T F × IT F (f ) = log(            + 1) ∗ log(          + 1)                (2)
                                             N                ntag(f )
   where |T | is the total number of tags, and ntag(f ) is the number of tags f
has co-occurred with. ITF implies that the more tags a feature co-occurs with,
the less specific and important the feature is.

3.3   Feature-Tag Correlation
In FDT, each feature is associated with a weighted list of tags. We denote this
as a matrix Θ, where θi,j is the weight of tag tj to feature fi , the size of Θ is
|F | × |T |, F is the set of all features. Although Θ is large, it is extremely sparse,
so each feature only associates with a small number of tags.
     We use three different methods to compute Θ offline, they are co-occurrence
count(CC), Mutual Information (MI) and χ2 statistics (χ2 ). Co-occurrence count
is computed by

                              CC(f, t) = n(f, t)/n(t)                                      (3)
    where n(f, t) is the number of co-occurrences of feature f and tag t, and n(t)
is the total number of occurrences of tag t. CC is a naive way to find the most
important tags for a feature.
    In MI, we model each feature or tag as a binary-valued probabilistic vari-
able, the value of which means occur in a document(1) or not(0). Then, we can
compute the Mutual Information between a feature and a tag by
                                  X X                                     p(f ′ , t′ )
                   M I(f, t) =                        p(f ′ , t′ )log(                 )   (4)
                                                                         p(f ′ )p(t′ )
                                 f ′ ∈f,f¯ t′ ∈t,t̄

where f ′ = f means feature f occurs in the resource, and f ′ = f¯ means it
doesn’t occur, the same is for t′ . MI computes the shared information between
f and t, the higher it is, the more correlated f and t are.
   χ2 has been used for feature selection in text categorization [7], it also find
the correlation between a feature and a category, here we use the tag as category.
χ2 is computed as follows,

                                        N (AD − BC)2
                   χ2 (f, t) =                                                    (5)
                               (A + C)(B + D)(A + B)(C + D)
    where A = n(f, t), B = n(f, t̄), C = n(f¯, t, D = n(f¯, t̄).
    After we get Θ by one of the above methods, we make Θ sparse by picking
the largest K values in Θ and set other values to 0. We test K = 30000, 50000
and 100000, as K increases, the F1 measure increases. When K > 50000, the
F1-measure doesn’t change a lot, so we use K = 50000 in all experiments. For
each row in Θ, we first find the largest value θi,max , then set all values in this
row to θi,j = θi,j /θi,max . We compare the performance of these 3 methods in
the experiment section.
    FDT has low computation complexity when tagging. For a resource with
n features, the complexity of tagging is O(nm), where m is the average tags
for each feature in Θ. m is usually a small number, in our model it is 4.63 for
bibtex and 5.81 for bookmark. Note that the complexity of FDT is not related to
the total number of training documents, tags or users. Nearest neighbor methods
have to search in the entire training data set, so the complexity is at least O(|R|).
Multi-label classifier methods have to train a classifier for each one of tags, so
the complexity is at least O(|T |). Furthermore, the model of FDT is related with
K, which is around 105 , it is small enough to load in the main memory.

4     Graph-based Tag Suggestion
4.1   Method Preliminaries
The basic idea of graph-based tag suggestion is to construct a graph with users,
resources and tags as vertices and build edges according to user tagging behav-
iors. After building the graph, we can adopt some graph-based ranking algo-
rithms to rank tags for a specific user and resource. Then the top-ranked tags
are recommended to users.
    To describe the graph-based methods more clearly, we first give some math-
ematical notations. For the folksonomy F := (U, T, R, Y ), we firstly convert it
into an undirected tripartite graph GF S       = (V, SE). In GF , the vertices consists
of users, resources and tags, i.e., V = U R T . For each tagging behavior of
user u assigning tag t to resource r, we will add edges between u , r and t, i.e.,
E = {{u, r}, {u, t}, {r, t}|(u, t, r) ∈ Y }.
    In GF , we have the set of vertices V = {v1 , v2 , · · · , vN } and the set of edges
E = {(vi , vj ) | There is an edge between vi and vj }. For a given vertex vi , let
N (vi ) be the set of vertices that are neighbors of vi . We have w(vi , vj ) as the
weight of the edge (vi , vj ). For an undirected graph, w(vi , vj ) = w(vj , vi ). Let
w(vi ) be the degree of vi , and we have
                               X                      X
                   w(vi ) =            w(vj , vi ) =         w(vi , vj )             (6)
                            vj ∈N (vi )           vj ∈N (vi )
   Based on the graph, we can employ various graph-based ranking methods to
recommend tags. In this paper, we first introduce two existing methods, including
“most popular tags” and “FolkRank”. Furthermore, we propose to use a new
ranking model, DiffusionRank, for graph-based tag suggestion.

4.2   Most Popular Tags
We first introduce a simple but effective method for tag suggestion. Some nota-
tions are given as below, which is identical
                                          T with [2]. For a user u ∈ U , we denote
all his/her tag assignments as Yu := Y ({u} × T × R). Accordingly,
                                                                T       we have Yr
and Yt . Based on the same principle, we can define Yu,t := Y ({u} × {t} × R)
for u ∈ U and t ∈ T . We also have Yt,r accordingly. Furthermore, we denote all
tags that user u ∈ U have assigned as Tu := {t ∈ T |∃r ∈ R : (u, t, r) ∈ Y }.
    There are variants of “most popular tags” as shown in [8], which are usu-
ally restricted in different statistical range. For example, most popular tags of
folksonomy recommends the most popular tags of the whole set of folksonomy.
Therefore, it recommends the same set of tags for any user and resource, which
suffers from cold-start problems and has no consideration on personalization.
    A reasonable variant of “most popular tags” is recommending the tags that
globally are most specific to the resource. The method is named as most popular
tags by resource:
                                            n
                              T (u, r) = argmax(|Yt,r |)                       (7)
                                                  t∈T

    Since users might have specific preferences for some tags, which should have
been used by him/her, thus we can use the most popular tags by user. As shown
in [8], the performance is poor if we use most popular tags by user in isolation.
If we mix the most popular tags of user and resource, the performance will be
much better than each of them. The simplest way to mix the effect of users and
resources on tags is to add the counts and then sort:
                                             n
                        T (u, r) = argmax(|Yt,r | × |Yu,t |)                      (8)
                                            t∈T


4.3   FolkRank
FolkRank is originally proposed in [2] which is based on user-resource-tag tripar-
tite graph. In FolkRank, two random surfer model is employed on the tripartite
graph. The ranking values of vertices are computed using the following formula:
                                X          w(vj , vi )
              P R(vi ) = λ                             P R(vj ) + (1 − λ)p(vi )   (9)
                                            w(vj )
                             vj ∈N (vi )

where P R(vi ) is the PageRank value and pvi is the preference to vi . Suppose we
have an adjacent matrix A to represent the graph GF :
                                 (
                                     0       if (vi , vj ) ∈
                                                           /E
                        A(i, j) = w(vi ,vj )
                                   w(vj ) if (vi , vj ) ∈ E
With the matrix, we can rewrite the Equation 9 as:
                                 s = λAs + (1 − λ)p                                (10)
where s is the vector of PageRank scores of vertices, and p is the vector of
preferences of vertices.
    A straightforward idea of graph-based tag suggestion is to set preference to
the user and resource to be suggested for, and then compute ranking values using
PageRank in Eq. (10). However, as pointed out in [8], this will make it is difficult
for other vertices than those with high edge degrees to become highly ranked,
no matter what the preference values are.
    Based on above analysis, we described FolkRank as follows. To generate tags
for user u and resource r, we have to:
 1. Let s(0) be the stable results of Eq. (10) with p = 1, i.e., the vector composed
    by 1’s.
 2. Let s(1) be the stable results of Eq. (10) with p = 0, but p(u) = 1 and p(r)
    = 1.
 3. Compute s := s(1) − s(0) .
Therefore, we can rank tags according to their final values in s, where the top-
ranked tags are suggested to user u for resource r.

4.4   DiffusionRank
DiffusionRank was originally proposed for combating web spam [3], which has
also been successfully used in social network analysis [4] and search query sug-
gestion [5]. DiffusionRank is motivated by the heat diffusion process, which can
be used for ranking because the activities flow on the graph can be imagined as
heat flow, the edge from a vertex to another can be treated as the pipe of an
air-conditioner for heat flow.
    For a graph G = {V, E}, denote fi (t) is the heat on vertex vi at time t, we
construct DiffusionRank as follows. Suppose at time t, each vertex vi receives
an amount of heat, M (vi , vj , t, ∆t), from its neighbor vj during a period ∆t.
The received heat is proportional to the time period ∆t and the heat difference
between vi and vj , namely fj (t) − fi (t). Based on this, we denote M (vi , vj , t, ∆t)
as
                       M (vi , vj , t, ∆t) = γ(fj (t) − fi (t))∆t
where γ is heat diffusion factor, i.e. the thermal conductivity. Therefore, the heat
difference at node vi between time t + ∆t and time t is equal to the sum of the
heat that it receives from all its neighbors. This is formulated as:
                                           X
                 fi (t + ∆t) − fi (t) =         γ(fj (t) − fi (t))∆t            (11)
                                        vj ∈N (vi )

The process can also be expressed in a matrix form:
                             f (t + ∆t) − f (t)
                                                = γHf (t)                          (12)
                                    ∆t
where f is a vector of heat at vertices at time t, and H is

                                  −1 if i = j
                                 

                       H(i, j) =       0      if (vi , vj ) ∈
                                                            /E                 (13)
                                  w(vi ,vj )
                                      w(vj )  if (v  ,
                                                   i j v  ) ∈ E

If the limit ∆t → 0, the process will become into
                                  d
                                     f (t) = γHf (t)                           (14)
                                  dt
Solving this differential equation, we have f (t) = eγtH f (0). Here we could extend
the eγtH as
                                        γ 2 t2 2 γ 3 t3 3
                    eγtH = I + γtH +          H +       H + ···                 (15)
                                         2!          3!
The matrix eγtH is named as the diffusion kernel in the sense that the heat
diffusion process continues infinitely from the initial heat diffusion.
    γ is an important factor in the diffusion process. If γ is large, the heat will
diffuse quickly. If γ is small, the heat will diffuse slowly. When γ → +∞, heat
will diffuse immediately, and DiffusionRank becomes into PageRank.
    As in PageRank, there are random relations among vertices. To capture these
relations, we use a uniform random relation among different vertices as in PageR-
ank. Let 1 − λ denote the probability that random surfer happens and λ is the
probability of following the edges. Based on the above discussion, we can modify
DiffusionRank into
                                                             1
                      f (t) = eγtR f (0), R = λH + (1 − λ)     1               (16)
                                                             N
In application, a computation of eγtR is time consuming. We usually to approx-
imate it to a discrete form
                                         γ
                            f (t) = (I +   R)Mt f (0)                     (17)
                                         M
Without loss of generality, we use one unit time for heat diffusion between ver-
tices and their neighbors, we have
                                             γ
                              f (1) = (I +     R)M f (0)                       (18)
                                             M
                                    γ                                        γ
We could iteratively calculate (I+ M  R)M f (0) by applying the operator (I+ M R)
to f (0). Therefore, for each iteration, we could diffuse the heat values at each
vertices using the following formulation:
                                γ      γ               1
                     s = (1 −     )s +   (λAs + (1 − λ) 1)                     (19)
                                M      M               N
where M is the number of iterations. As analyzed in [3], for a given threshold ǫ,
                                              γ
we can compute to get M such that k((I + M      R)M − eγR )f (0)k < ǫ for any f (0)
whose sum is one. Similar to [3], in this paper we set M = 100 for DiffusionRank.
   Different from FolkRank, in DiffusionRank we set the initial values f (0) for
vertices to indicate the preferences. To suggest tags to user u for resource r,
we set f (0) = 0, but for fu (0) = 1 and fr (0) = 1. After running DiffusionRank
on the tripartite graph, we rank tags according to their ranking scores and the
top-ranked tags are suggested to user u for resource r.


5     Experiments

5.1   Data Set

We use the given BibSonomy data set to validate our methods, it is a snapshot
of the BibSonomy system until Jan 1, 2009. The data set contains two parts,
bibtex and bookmark. In bibtex, the resources are citation of research papers or
books, with title, author and other information. In bookmark, the resources are
website URLs with a user-provided short description. Additionally, the contest
organizer provide two postcore-2 data sets. In the postcore-2 data sets, the orga-
nizer removed all users, tags, and resources which appear in only one post. The
process was iterated until convergence and got a core in which each user, tag,
and resource occurs in at least two posts. Batagelj et al [9] provided a detailed
explanation of postcore building . The basic statistics of these data sets are lists
in Table 1


Table 1. Basic statistics of the full bibtex and bookmark data sets. Mean Len. is the
mean number of words in the corresponding text content.

Name             #posts #tags #users #words Mean Length Mean #tags/user
bibtex          158,912 50,855 1,790 278,106      47.67           60.75
bookmark        263,004 56,424 2,679 293,026      11.83           57.78
bibtex(pcore2)   22,852 5,816    788 48,401       59.21           31.75
bookmark(pcore2) 41,268 10,702   861 47,689       12.23           60.26


   To validate and tune our methods, we split each of the four dataset into 5
equal-sized subset randomly, and perform 5-fold cross validation on them.


5.2   Evaluation Metrics

We use precision, recall and F1 measure as the evaluation metrics. Precision is
the number of correct suggested tags multiplied by the total number of tags
suggested. Recall is the number of correct suggested tags multiplied by the total
number of tags of original post. F1 measure is a geometry mean of precision and
recall, F 1 = 2P recsion × Recall/(P recision + Recall). For each post, we only
consider the first 5 tags suggested.
5.3    Content-based Tag Suggestion
To test the performance of our content-based method, we run 5-fold cross vali-
dation using the given training data. Additionally, for each fold, we remove all
posts in the postcore set from the test data, since posts in postcore will not
appear in the final test data. We remove stopwords, punctuation marks and all
words shorter than 2 letters from the data set, and convert all text to lowercase.
We remove words, resource IDs and user IDs appear in less than 5 post. We
treat bibtex and bookmark separately.
    We use search-based kNN as our baseline method, this is proposed by Mishne [10]
for suggesting tags to blog posts. In our experiment, we index the training data
by Lucene1 indexing package. For a test post, we use T F × IDF to select 10 top
words. Then, we use these words to construct a weighted query, and search the
training data with it. We take all tags from Lucene returned top-k documents,
weight each tag using the corresponding document’s relevance score, and sum
the weights of duplicated tags. We take the first 5 tags as the suggested tags. In
search-based kNN, k is a parameter to tune. After using k = 1, 2, 3, 4, 5, we use
k = 1 as the final k, since it has the best F1 measure.
    We list the mean precision, recall and F1 value for bibtex and bookmark data
in Table 2 and 3 respectively. We experimented with the different combination
of methods for weighting features and estimating Θ matrix.
    In the bibtex dataset, FDT(TFITF+MI) has the similar performance as the
search-based kNN methods. In the bookmark dataset, FDT(TFITF+MI) has
the best performance, which is 3 percentage better than search-based kNN.


Table 2. P, R and F1 of search-based kNN and different learning methods for FDT
on the bibtex dataset. All averaged over 5 folds.

                       Method     Precision        Recall           F1
             search-based kNN      0.2792          0.2324      0.2537
            FDT(TFIDF+CC)           0.2517         0.2152       0.2320
            FDT(TFIDF+MI)           0.1822         0.1652       0.1733
             FDT(TFIDF+χ2 )         0.2261         0.2235       0.2248
            FDT(TFITF+CC)           0.2513         0.2173       0.2330
            FDT(TFITF+MI)           0.2432        0.2526        0.2478
             FDT(TFITF+χ2 )         0.2216         0.2246       0.2231


    In the training data, the number of post from each user roughly follows the
power law distribution, where most users have less than 100 posts, and the top
4 users have 50% of all posts. If we treat all posts as equal, then the model may
bias to the preference of several super users. To know the performance of the
methods on super users and common users, we run other two experiments. In
the first experiment, we train the model using posts from all users, then check
its performance on each of the top n users and all the rest users separately. In
1
    http://lucene.apache.org
Table 3. P, R and F1 of search-based kNN and different learning methods for FDT
on the bookmark dataset. All averaged over 5 folds.

                  Method             Precision         Recall             F1
                    kNN                0.2935          0.2409         0.2646
          FDT(TFIDF+CC)               0.4036           0.2411        0.3018
          FDT(TFIDF+MI)                0.3580          0.1892         0.2475
           FDT(TFIDF+χ2 )              0.2675          0.2486         0.2577
          FDT(TFITF+CC)                0.3633          0.2404         0.2893
          FDT(TFITF+MI)                0.3143         0.2610          0.2852
           FDT(TFITF+χ2 )              0.2284          0.1831         0.2033


the second experiment, we train and test models using only posts from each of
the top n users and all the rest users separately. For bibtex dataset, we choose
n = 4, for bookmark dataset, we choose n = 5. The results for bibtex and
bookmark are listed in Table 4 and Table 5 respectively. In these experiments,
we use FDT(TF*ITF+MI) for bibtex data and FDT(TF*IDF+CC) for book-
mark data. In the result table, the column Trained(ALL) means all methods are
trained on full training data. The column Trained(USER) means each method
is trained using only posts from corresponding group of users. In the method
name, kNN(2463) means the method used is search-based kNN, and test data
set are all post of user 2463, rest means all other users. The same naming rule
applies to FDT(xxxx).
     For each group of test data, we have 4 different models, they are kNN trained
by all users, kNN trained by this group, FDT trained by all users and FDT
trained by this user. As the result shows, for groups of super users, kNN-based
models have best performance. For common users (the rest group), FDT-based
models performs better. This result follows our intuition. In this data set, super
users have different tag preference than common users. kNN suggest tags using
most similar resources, it is less affected by the overall distribution of resources,
so it fits to the . FDT relies on the global statistics of feature-tag relationship, it
is less effective to fit a special user’s preference. In practical situation, we can get
the best performance by choosing different model for different group of users.
    One interesting observation is about the user #2732. When trained with all
posts, FDT performs much better(0.6308 vs 0.2300) on #2732 than trained with
#2732’s own posts. We examined the posts of #2732, found that many posts
contains only three tags: genetic, programming and algorithm, and the number
of posts by #2732 is large. When we use all posts to train FDT(TFITF+MI),
these three tags have a large Mutual Information value with many features,
especially the user id feature “UID-2732”, so FDT can predict tags for posts of
#2732 with high accuracy. When trained only with #2732’s posts, the Mutual
Information between features and these three tags is much smaller, since these
three tags appears everywhere and can be seen as stopwords in tags. Small
Mutual Information of these three tags means FDT will make wrong prediction
about most posts of #2732, which leads to a decreasing in F1-measure.
Table 4. P, R and F1 of search-based kNN and FDT on different set of users on the
bibtex dataset. Trained(ALL) means that we train the model using posts from all users,
and test the performance on given set of users. Train(USER) means that both training
and testing use posts from the given set of users. The % column indicates the size of
corresponding user group, as the percentage in all posts. All averaged over 5 folds.

Trained(ALL)      P      R     F1      Trained(USER)      P      R     F1          %
kNN(2463)    0.1323 0.1352 0.1337      kNN(2463)     0.1376 0.1377 0.1376       24.27
kNN(2651)    0.1561 0.1573 0.1567      kNN(2651)     0.1953 0.1910 0.1932       12.40
kNN(3180)    0.4278 0.2771 0.3364      kNN(3180)     0.4440 0.2807 0.3440        9.20
kNN(2732)    0.6267 0.3915 0.4819      kNN(2732)     0.6517 0.4422 0.5269        3.78
kNN(rest)    0.3207 0.2530 0.2829      kNN(rest)     0.3202 0.2579 0.2857       50.35
FDT(2463)    0.1066 0.1869 0.1358      FDT(2463)     0.1100 0.2055 0.1429       24.27
FDT(2651)    0.1022 0.1285 0.1138      FDT(2651)     0.1126 0.1818 0.1391       12.40
FDT(3180)    0.3656 0.3334 0.3488      FDT(3180)     0.3688 0.3274 0.3469        9.20
FDT(2732)    0.8763 0.4927 0.6308      FDT(2732)     0.3142 0.1814 0.2300        3.78
FDT(rest)    0.3101 0.2516 0.2778      FDT(rest)     0.3260 0.2559 0.2867       50.35


Table 5. P, R and F1 of search-based kNN and FDT on different set of users on the
bookmark dataset. Trained(ALL) means that we train the model using posts from all
users, and test the performance on given set of users. Train(USER) means that both
training and testing use posts from the given set of users. The % column indicates the
size of corresponding user group, as the percentage in all posts. All averaged over 5
folds.

Trained(ALL)      P      R     F1     Trained(USER)      P      R     F1           %
kNN(1747)    0.5523 0.4877 0.5180     kNN(1747)     0.6513 0.5566 0.6003        19.90
kNN(2977)    0.4554 0.4072 0.4299     kNN(2977)     0.5567 0.5154 0.5353         9.48
kNN(483)     0.1002 0.1365 0.1156     kNN(483)      0.2375 0.2227 0.2299         3.56
kNN(275)     0.2102 0.1947 0.2022     kNN(275)      0.3413 0.3059 0.3226         3.40
kNN(421)     0.2749 0.0867 0.1318     kNN(421)      0.2787 0.1080 0.1557         2.26
kNN(rest)    0.1921 0.1627 0.1762     kNN(rest)     0.2007 0.1643 0.1807        61.41
FDT(1747)    0.5306 0.4169 0.4670     FDT(1747)     0.3592 0.2325 0.2823        19.90
FDT(2977)    0.4437 0.3622 0.3988     FDT(2977)     0.4162 0.3367 0.3722         9.48
FDT(483)     0.1684 0.2653 0.2060     FDT(483)      0.1642 0.2637 0.2024         3.56
FDT(275)     0.1887 0.1610 0.1738     FDT(275)      0.2531 0.1760 0.2076         3.40
FDT(421)     0.4044 0.1258 0.1920     FDT(421)      0.4328 0.1339 0.2045         2.26
FDT(rest)    0.2462 0.2133 0.2286     FDT(rest)     0.2502 0.2204 0.2344        61.41
    For final test, we use FDT(ITF+MI) for bibtex and FDT(IDF+CC) for book-
mark. The test data of DC09 has a different distribution with the training data.
Most top ranked users don’t appear in the test data. So we removed the top
ranked users from the training data, use the rest group of users to train the model
for final suggestion. The p/r/f1 on final test data are 0.1388/0.1049/0.1189 re-
spectively. Compared to the cross validation results, the performance dropped a
lot on final test data. One reason is that FDT does not suggest tags that are not
in the training data. There are 93756 tags in the training data and 34051 tags in
the test data, the overlapped tags are only 15194. To achieve better performance,
suggesting new tags should be considered in the future.


5.4   Graph-based Tag Suggestion

In experiments, we compare the results of three graph-based methods, most
popular tags, FolkRank and DiffusionRank.
    Here we first demonstrate the results using 5-fold cross validation on training
dataset. In Table 6, we show the best performance of various methods on bibtex
dataset. In this table, we also demonstrate the performance of the content-based
method kNN , which achieves the best result when k = 2. For the method of most
popular tags, we use “mpt+resource” to indicate most popular tags by resource,
and “mpt+mix” to indicate most popular tags by mixing resource and user. For
FolkRank, the best result is achieved when damping factor λ = 0.01 with 100
iterations. DiffusionRank obtains the best result when damping factor λ = 0.85,
maximum number of iterations maxi t = 10 and diffusion factor γ = 0.1. From
the table, we can see that most popular tags by mix achieves the best F1 -
measure, which has the largest precision. While for DiffusionRank, it achieves
the best recall.


Table 6. Best performance of various methods on bibtex training dataset. All values
are averaged over 5 folds.

                       Method    Precision Recall F1 -measure
                         kNN      0.3664 0.4307      0.3959
                    mpt+resource 0.3949 0.3765       0.3855
                      mpt+mix     0.4211 0.4014 0.4110
                      FolkRank    0.3222 0.4459      0.3741
                    DiffusionRank 0.3347 0.4630 0.3885


    In Table 7, we show the best performance of various methods on bookmark
dataset. kNN achieves the best performance when k = 2. For FolkRank, the
best result is achieved when damping factor λ = 0.0001 with 10 iterations.
DiffusionRank obtains the best result when damping factor λ = 0.85, maximum
number of iterations maxi t = 10 and diffusion factor γ = 0.01. Furthermore,
we also restrict the scores of suggested tags should be no less than 1/5 of score
of first-ranked tags. From the table, we can see that DiffusionRank achieves the
best F1 -measure, which has the largest precision.


Table 7. Best performance of various methods on bookmark training dataset. All
values are averaged over 5 folds.

                      Method    Precision Recall F1 -measure
                        kNN      0.2855 0.2892      0.2873
                   mpt+resource 0.3345 0.2798       0.3047
                     mpt+mix     0.3606 0.3017      0.3285
                     FolkRank    0.3288 0.3309 0.3298
                   DiffusionRank 0.3772 0.3266 0.3501


    From the above two tables, we find that on the bibtex dataset the method of
most popular tags by mix is the best, and on bookmark dataset DiffusionRank
achieves the best result. Therefore, for task 2 of rsdc’09, we use the two methods
to train ranking models separately on bibtex and bookmark. Using the original
result and evaluation program provided by the challenge organizer, we obtain
the evaluation results on test dataset, as shown in Table 8. From the table, we
find that the absolute values are much smaller than what are shown in Table 6
and 7.

               Table 8. Evaluation result on test dataset of rsdc’09.

                     Tag Number Precision Recall F1 -measure
                          1      0.1483 0.4229 0.2196
                          2      0.2301 0.3477 0.2769
                          3      0.2960 0.3113 0.3034
                          4      0.3418 0.2840 0.3102
                          5      0.3760 0.2601 0.3075


    Besides the above analysis, we want to investigate the performance of FolkRank
and DiffusionRank as their parameters change.
    In Table 9 and 10, we demonstrate the performance of FolkRank on bibtex
training dataset and bookmark training dataset as its parameters, the damping
factor λ and maximum number of iterations (denoted as “max-it” in tables)
change. From the both tables, we find the performance of FolkRank improves
as damping factor shrinks, which indicates the effect of preference values are
growing larger. That is to say the generalization of FolkRank by passing values
iteratively on graphs may harm the performance. Moreover, it seems that the
maximum number of iterations of FolkRank does not effect the results signifi-
cantly.
    In Table 11 and 12, we demonstrate the performance of DiffusionRank on
bibtex training dataset and bookmark training dataset as its parameters, the
Table 9. Performance of FolkRank on bibtex training dataset. All values are averaged
over 5 folds.

                    λ      max-it Precision Recall F1 -measure
                    0.85      10 0.2943 0.4072        0.3417
                    0.5       10 0.3053 0.4225        0.3545
                    0.1       10 0.3198 0.4425        0.3713
                    0.01      10 0.3222 0.4459 0.3741
                    0.01     100 0.3222 0.4459 0.3741
                    0.001     10 0.3219 0.4455        0.3738
                    0.0001    10 0.3219 0.4455        0.3738

Table 10. Performance of FolkRank on bookmark training dataset. All values are
averaged over 5 folds.

                    λ      max-it Precision Recall F1 -measure
                    0.85      10 0.2989 0.3008        0.2998
                    0.5       10 0.3038 0.3058        0.3048
                    0.1       10 0.3198 0.3218        0.3208
                    0.01      10 0.3275 0.3297        0.3286
                    0.01     100 0.3275 0.3297        0.3286
                    0.001     10 0.3288 0.3309 0.3298
                    0.0001    10 0.3288 0.3309 0.3298


diffusion factor γ and maximum number of iterations (denoted as “max-it” in
tables) change. Here the damping factor λ is set to 0.85. We also find that the per-
formance of DiffusionRank improves as diffusion factor shrinks, which indicates
the effect of initial values is growing larger. Similar to FolkRank, the general-
ization of DiffusionRank by passing values iteratively on graphs may also harm
the performance. It is also the same as FolkRank that the maximum number of
iterations of DiffusionRank does not effect the results significantly.
    From the experiments on both bibtex and bookmark training datasets, we
can see that DiffusionRank always outperforms FolkRank with some specific
parameters, which is more significant on bookmark dataset. Although in this
dataset, FolkRank does not outperform the method of most popular tags, in [8]
we know that in some datasets, FolkRank outperforms most simple methods in-
cluding the method of most popular tags. Therefore, more experiments still need
to be done to investigate the efficiency of DiffusionRank compared to FolkRank
and other graph-based methods for tag suggestion.
    Furthermore, the number of suggested tags should be specified in advance in
FolkRank and DiffusionRank. However in some conditions, we do not have to
recommend as many tags as specified. For DiffusionRank, we set the maximum
number of suggested tags is 5. If we further require the suggested tags should
have the ranking values no less than 1/5 of the ranking value of the first-ranked
tag, the performance of precision, recall and F1 -measure will be improved to
0.3772, 0.3266 and 0.3501 on bookmark training dataset. Therefore, we use the
altered DiffusionRank for the bookmark test set of task 2 in rsdc’09.
Table 11. Performance of DiffusionRank on bibtex training dataset. In all experiments,
damping factor λ is set to λ = 0.85. All values are averaged over 5 folds.

                      γ    max-it Precision Recall F1 -measure
                      2.0     10 0.3279 0.4537        0.3807
                      1.0     10 0.3331 0.4609        0.3867
                      0.1     10 0.3347 0.4630 0.3885
                      0.1    100 0.3347 0.4630 0.3885
                      0.01    10 0.3347 0.4630 0.3885

Table 12. Performance of DiffusionRank on bookmark training dataset. In all exper-
iments, damping factor λ is set to λ = 0.85. All values are averaged over 5 folds.

                      γ    max-it Precision Recall F1 -measure
                      2.0     10 0.3336 0.3357        0.3346
                      1.0     10 0.3370 0.3392        0.3381
                      0.1     10 0.3403 0.3425        0.3414
                      0.1    100 0.3403 0.3425        0.3414
                      0.01    10 0.3406 0.3428 0.3417


6    Related Work
Ohkura et al [11] proposed a Support Vector Machine-based tag suggestion sys-
tem. They train a binary classifier for each tag to decide if this tag should be
suggested. Katakis et al [12] use a hierarchical multi-label text classifier to find
the proper tags for a document. They cluster all tags using modified k-means,
use one classifier to decide which clusters a document belongs to, then use an-
other cluster-specific classifier to decide which tags in the cluster belongs to the
document. Mishne [10] use a search-based nearest neighbor method to suggest
tags, where the tags of a new document is collected from the most relevant doc-
uments in the training set. Lipczak et al [13] extract keywords from the title of
a document, then filter them with a user’s used tags to get the final suggestion.
These methods all use the content of a document, we call them content-based
methods. Tatu et al [14] combine tags from similar documents and extracted
keywords to provide tag suggestions. They have the best performance in the
first ECML/PKDD Discovery Challenge task.
    Another class of tag suggestion system is based on the links between users,
tags and resources, which does not take the content of resources into consid-
eration. Since the method of “most popular tags” also does not consider the
content of resources, in this paper we regard it as a member of graph-based tag
suggestion approach. Xu et al [15] use collaborative filtering to suggest tags for
URL bookmarks. Jaschke et al [2] proposed FolkRank, a PageRank-like iterative
algorithm to find the most related tags for a resource in its neighbor users and
tags. PageRank is originally used for ranking web pages only according to the
topology of web graph. However, in PageRank we can set preference values to
a subset of pages to make the PageRank values biased to these pages and their
neighbors. In fact, FolkRank is used to compute the relatedness between tags
and the specific user and resource by setting the given user and resource to high
preference values in PageRank.
    Recently, a new graph-based ranking method, DiffusionRank [3], is proposed
for anti-spam of web pages. DiffusionRank is motivated by the heat diffusion
process, which can be used for ranking because the activities flow on the graph
can be imagined as heat flow, the edge from a vertex to another can be treated as
the pipe of an air-conditioner for heat flow. Based on the property of heat always
flow from high to low, the ranking values of DiffusionRank are related to initial
values of vertices. Therefore, DiffusionRank provides a more flexible method to
rank tags by setting high initial values to the given user and resource. In this
paper, we for the first time propose to use DiffusionRank for graph-based tag
suggestions.


7   Conclusion
In this paper, we study the problem of tag suggestion and describe our methods
for content-based and graph-based suggestion. For content-based tag sugges-
tion, we propose a new method named Feature-Driven Tagging for fast content-
based tag suggestion. Cross validation on the training data shows that FDT
outperforms wildly-used search-based kNN, especially when suggesting tags for
long-tail users. For graph-based tag suggestion, we study most popular tags,
FolkRank, and propose a DiffusionRank-based method. Experiments show that
on bibtex dataset the method of most popular tags by mixing of user and re-
source performs best, and on bookmark dataset, DiffusionRank outperforms
other methods.
    Work remains to be done. First, currently we use empirical methods to es-
timate the parameters for FDT, like CC, MI and ITF. We will consider learn a
Θ matrix directly by optimizing a tag-related loss function. Second, evaluation
using final test data of DC09 shows that the F1 value drops a lot than cross
validation on the training data, especially for content-based methods. This sug-
gests we should pay attention to out-of-vocabulary tags. Third, more information
should be considered, such like time-stamp, to suggest better tags in real-world
situation.


Acknowledgments
This work is supported by the National Science Foundation of China under Grant
No. 60621062, 60873174 and the National 863 High-Tech Project under Grant
No. 2007AA01Z148.


References
 1. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking:
    Bringing order to the web. Technical report, Stanford Digital Library Technologies
    Project (1998)
 2. Jaschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., Stumme, G.: Tag rec-
    ommendations in folksonomies. In: Proceedings of the 11th European conference
    on Principles and Practice of Knowledge Discovery in Databases, Springer-Verlag
    Berlin, Heidelberg (2007) 506–514
 3. Yang, H., King, I., Lyu, M.R.: Diffusionrank: a possible penicillin for web spam-
    ming. In: Proceedings of SIGIR. (2007) 431–438
 4. Ma, H., Yang, H., Lyu, M.R., King, I.: Mining social networks using heat diffusion
    processes for marketing candidates selection. In: Proceeding of CIKM. (2008) 233–
    242
 5. Ma, H., Yang, H., King, I., Lyu, M.R.: Learning latent semantic relations from
    clickthrough data for query suggestion. In: Proceeding of CIKM. (2008) 709–718
 6. Manning, C., Raghavan, P., Schtze, H.: Introduction to information retrieval.
    Cambridge University Press New York, NY, USA (2008)
 7. Yang, Y., Pedersen, J.: A comparative study on feature selection in text cate-
    gorization. In: MACHINE LEARNING-INTERNATIONAL WORKSHOP THEN
    CONFERENCE-, MORGAN KAUFMANN PUBLISHERS, INC. (1997) 412–420
 8. Jaschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., Stumme, G.: Tag rec-
    ommendations in social bookmarking systems. AI Communications 21(4) (2008)
    231–247
 9. Batagelj, V., Zaversnik, M.: Generalized cores (2002)
10. Mishne, G.: Autotag: a collaborative approach to automated tag assignment for
    weblog posts. In: Proceedings of the 15th international conference on World Wide
    Web, ACM New York, NY, USA (2006) 953–954
11. Ohkura, T., Kiyota, Y., Nakagawa, H.: Browsing system for weblog articles based
    on automated folksonomy. In: Proceedings of the WWW 2006 Workshop on the
    Weblogging Ecosystem: Aggregation, Analysis and Dynamics, at WWW. Volume
    2006. (2006)
12. Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for auto-
    mated tag suggestion. ECML PKDD Discovery Challenge 2008 75
13. Lipczak, M.: Tag Recommendation for Folksonomies Oriented towards Individual
    Users. ECML PKDD Discovery Challenge 2008 84
14. Tatu, M., Srikanth, M., D’Silva, T.: Rsdc’08: Tag recommendations using book-
    mark content. ECML PKDD Discovery Challenge 2008
15. Xu, Z., Fu, Y., Mao, J., Su, D.: Towards the semantic web: Collaborative tag
    suggestions. In: Collaborative Web Tagging Workshop at WWW2006. (2006)