A Two-Level Learning Hierarchy of Concept
       Based Keyword Extraction for Tag
               Recommendations

                       Hendri Murfi and Klaus Obermayer

                 Neural Information Processing Group, TU Berlin
                   Franklinstr. 28/29, 10587 Berlin, Germany
                         {henri,oby}@cs.tu-berlin.de
                           http://ni.cs.tu-berlin.de


      Abstract. Textual contents associated to resources are considered as
      sources of candidate tags to improve the performance of tag recom-
      menders in social tagging systems. In this paper, we propose a two-
      level learning hierarchy of a concept based keyword extraction method
      to filter the candidate tags and rank them based on their occurrences
      in concepts existing in the given resources. Incorporating user-created
      tags to extract the hidden concept-document relationships distinguishes
      the two-level from the one-level learning version, which extracts concepts
      directly using terms existing in textual contents. Our experiment shows
      that a multi-concept approach, which considers more than one concept
      for each resource, improves the performance of a single-concept approach,
      which takes into account just the most relevant concept. Moreover, the
      experiments also prove that the proposed two-level learning hierarchy
      gives better performances than one of the one-level version.


Key words: Recommender system, Social tagging, Machine learning, Keyword
extraction, Concept extraction


1   Introduction
Social tagging is intended to make resources increasingly easy to discover and
recover over time. Discovery enables users to find new content of their interest
shared by other users. This social indexing gives a promising index quality be-
cause it is done by human beings, who understand the content of the resource, as
opposed to software, which algorithmically attempts to determine its meaning.
Moreover, it is done collectively among users, that is, it uses a collective human
intelligence as an index extractor. Recovery enables a user to recall content that
was discovered before. It should be easier because the tags are both originated
by, and familiar to, its primary users. However, Golder et al. [9] identify three
major problems with the current social tagging systems: polysemy, synonymy,
and level variation. The first two inherit the problems of natural language, while
the third one refers to the phenomenon of users tagging content at different levels
of abstraction. Other problems are dealing with word forms, nouns in singular,
nouns in plural, abbreviations, and misspelled words.
    To direct users towards the consistency of the tags, the system usually has a
service that assists users in the tagging process, by automatically recommending
an appropriate set of tags. The service is a mediated suggestion system, that
is, the service does not apply the recommended tags automatically, rather it
suggests a set of appropriate tags and allows the user to select tags from the
set they find appropriate. Moreover, the tag recommendation can serve many
purposes such as consolidating the vocabulary across the users, giving a second
opinion what a resource is about and, the important thing, increasing the success
of searching because of the consistency [14].
    In practice, the standard tag recommenders are services that recommend
the most popular tags used for either a particular resource or a whole system.
There are other methods proposed from a diversity of approaches to recommend
tags from user-created tags (folksonomy) such as information retrieval [23, 28],
graph-based approaches [11], collaborative filtering [14], machine learning [10,
15]. Recently, people consider textual contents associated to the resources as
sources of candidate tags to improve the performance of tag recommenders. For
example, Xu et al. [31] suggest content-based (and context-based) tags based
on analysis and classification of the tagged content and context. This not only
solves the cold start problem, but also increases the tag quality of those objects
that are less popular. Tatu et al. [29] use natural language processing tools to
extract important terms (nouns, adjectives and named entities) from the textual
contents. They conclude that the understanding of the contents improves the
quality of the tag recommendations.
    In this paper, we also consider the textual contents associated to resources
as sources of candidate tags to improve the performance of the tag recommender
in the social tagging system. To achieve this goal, we propose a two-level learn-
ing hierarchy of concept based keyword extraction as a tag recommendation
method. Firstly, the method extracts concepts, which can be considered as a
set of related words, using nonnegative matrix factorization (NMF) from train-
ing document collections using a two-level learning hierarchy: at the lower level
the method extracts concepts and concept-document relationships using user-
created tags. Having these relationships, the method populates the concepts
with terms existing in textual contents of resources at the higher level. Next,
the tag recommender finds the relevant concepts to a given resource and then
scales terms of the resource based on their occurrences in the concepts. The
terms having the highest scores are set as keywords and recommended as tags.
Incorporating the user-created tags to extract the hidden concept-document re-
lationships distinguishes the two-level from the one-level learning version, which
extracts concepts directly from terms existing in textual contents. The main
advantage of this approach is that NMF algorithm decomposes more compact
document representations. Also, the concept extraction from textual contents
is handled by nonnegative least squares algorithm which is much more efficient
than NMF algorithm. Therefore, the two-level learning hierarchy approach is
not only more efficient but also more reliable because it uses tags created by
users who understand the content of documents. Moreover, the approach may
have richer vocabularies because it can combine vocabularies from tag space and
content space. Our experiment shows that a multi-concept approach, which con-
siders more than one concept for each resource, improves the f-measure values
of a single-concept approach, which takes into account just the most relevant
concept, about 10%. Moreover, the experiments also prove that the proposed
two-level learning hierarchy has f-measure values 13% better than one of the
one-level version.
    The rest of the paper is organized as follows: Section 2 discusses a concept
extraction method using nonnegative matrix factorization and our proposed
two-level learning hierarcy method. Section 3 describes the existing keyword
extraction methods and the proposed concept based keyword extraction meth-
ods. In Section 4, we describe a tag recommendation algorithm which combines
keywords, which are extracted by the keyword extraction methods, with user-
created tags in training data. In Section 5, we show our experiments and results.
We conclude and give a summary in Section 6.


2     Concept Extraction

Many researchers are trying to address questions about concepts and, in this
section, we consider one of them that defines the concepts as a set of related
terms. These definitions are proposed and used by some researchers such as
[21] or [27]. They use clustering methods to extract the concepts from training
document collections. Formal concept analysis (FCA) [5,26] and Latent semantic
analysis (LSA) [3, 7] are other methods to perform this task .


2.1   A One-Level Learning Hierarchy for Concept Extraction


There are some disadvantages of singular value decomposition (SVD) to extract
concepts from a document collection as used by LSA. Its negative values make
a semantic interpretation difficult. What we would really like to say is that
a concept is mostly concerned with some subset of terms, but any semantic
interpretation is difficult because of these negative values. To circumvent this
problem, a new method which maintains the nonnegative structure of original
documents has been proposed. The method uses nonnegative matrix factorization
(NMF) [17] rather than SVD to extract the concepts from document collections.
    Let V be a m × n term-by-document matrix whose columns are document
vectors and a positive integer k < min(m, n). In this paper, we use NMF to
extract concepts from the term-by-document matrix V . NMF problem is how
to find a nonnegative m × k matrix W and a nonnegative k × n matrix H to
minimize the functional [2]:
                                              m   n                 2
                                            1 XX
                    min        f (W, H) =             Vij − (W H)ij
                    W,H                     2 i=1 j=1
             subject to        Wia ≥ 0, Haj ≥ 0, ∀i, a, j .                    (1)

The constrained optimization problem above is convex on either W or H, but not
on both, hence realistic possible solutions usually correspond to local minima.
The product W H is called a nonnegative matrix factorization of V , although
V is not necessarily equal to the product W H. Clearly the product W H is an
rank-k approximation to V . An appropriate decision on the value of k is critical
in practice, but the choice of k is very often problem dependent. In most cases,
however, k is usually chosen such that k << min(m, n).
    The most popular approach for the NMF problem is the multiplicative up-
date algorithm proposed by Lee and Seung [18]. To either overcome shortcomings
related to convergence properties or to speed up this algorithm, researchers have
proposed modifications of the algorithm or even created new ones [2]. In general,
the algorithms can be divided into three general classes: multiplicative update al-
gorithms [18,19], gradient descent algorithm [6,12], and alternating least squares
algorithms [20, 24].
    Because all elements of the matrix W and H are nonnegative, we can interpret
them immediately as following: Each column of W corresponds to a set of related
terms called concepts and each element wia of matrix W represents the degree
to which term i belongs to concept a. Each element haj of matrix H represents
the degree to which document j is associated to concept a. Next, we call this
type of concept extraction as an one-level learning hierarcy method.


2.2   A Two-Level Learning Hierarchy for Concept Extraction

In case where the training documents are accompanied by user-created keywords,
it is a good idea to incorporate the valuable information in learning process. For
this reason, we propose a new learning scheme that uses the keywords for ex-
tracting concepts from a document collection. The learning scheme consists of
two-level learning hierarchy. At the lower level, concepts and concept-document
relationships are discovered using the user-created keywords. Having these re-
lationships, the concepts are populated by terms existing in textual contents of
documents at higher level. We expect this method to be successful because the
hidden document structures are discovered using keywords collectively created
by users. Another advantage of this approach is that NMF algorithm uses more
compact document representations. On the other hand, the concept extraction
from textual contents is handled by nonnegative least squares algorithm which
is much more efficient than NMF algorithm. Therefore, this two-level learning
hierarchy approach is not only more efficient but also more reliable because it
uses tags created by users who understand the content of documents. Moreover,
the approach may have richer vocabularies because it can combine vocabular-
ies from tag space and content space. The detail algorithm of this method is
described in Algorithm 1.


Algorithm 1 A two level learning hierarchy for concept extraction
1: Let V be the tag by document matrix, and X be the term by document matrix
2: Find the tag by concept matrix W and the concept by document matrix H from
   V = W H using nonnegative matrix factorization (see Section 2.1) to minimize the
   functional:
                                         1
                             f (W, H) = kV − W Hk2
                                         2
3: Find the term by concept matrix T from X = T H using nonnegative least squares
   algorithm, e.g. [2]:

     – Solve for T in matrix equation HH T T T = HX T
     – Set all negative elements in T to 0


3   Concept Based Keyword Extraction
Keyword extraction is the task of automatically selecting a small set of impor-
tant, topical terms within the textual content of a document. The fact that the
keywords are extracted means that the selected terms are present in the doc-
ument [16]. In general, the task of automatically extracting keywords can be
divided into two stages:
 1. Selecting candidate terms in the document
 2. Filtering out the most significant ones to serve as keywords and rejecting
    those that are inappropriate
There are various methods proposed for selecting candidate terms. The first one
is n-gram extraction, that is, extracting uni-, bi-, or tri-grams, removing those
that begin or end with a stop word [8]. Another one is more linguistically ori-
ented using natural language processing (NLP) method such as NP-chunker or
part-of-speech (PoS) [13]. Filtering uses either simple statistics, where a weight-
ing schema is applied to rank words accoding to their score [1, 25], or machine
learning, where the ranking function is defined by a statistical model derived
from training set with manually assigned keywords [13, 22, 30].
    In this section, we propose a machine learning based filtering method, that
is, a method that uses concepts extracted from textual contents of documents.
The method finds the relevant concepts to a given document and then scales
terms of the document based on their occurrences in the concepts. The terms
having the highest scores are set as keywords. The method can be considered
as unsupervised learning when we use the one-level learning hierarchy. It means
that the method does not need labeled data for the training process. Moreover,
the method becomes a supervised method when the user-created tags are used
in learning process for the two-level learning hierarchy approach. Two variants
of the concept based keyword extraction method are described in detail in the
following sections.

3.1   Single-Concept Based Keyword Extraction
The two-level learning hierarchy extracts concepts from a training document
collection. Having these concepts, the single-concept based keyword extraction
method finds the most relevant concept to a given document and then scales
the candidate terms existing in the document based on their occurrence in the
concept. The relevance of a concept c with a document d is calculated using the
following cosine distance measure:
                                              dT Vc
                               rel(d, c) =                                   (2)
                                             kdk kVc k
The detail algorithm of this approach is described in Algorithm 2.


Algorithm 2 The single-concept based keyword extraction
1: Let column c of W (Wc ) be concept c and d be a document
                    dT Wc
2: Let rel(d, c) = kdkkW ck
3: Find the most relevant concept to the document d, i.e., concept c0 where c0 =
   argmaxc (rel(d, c))
4: Scale terms existed in the document based on the most relevant concept, i.e.,

                                      d1i = wic0 di

                                      d2j = tjc0 dj
5: Combine the normalized terms:

                        d̃ = (1 − α) (d1/ kd1k) + α(d2/ kd2k)

6: Select first n non zero terms of the ranked d̃ as keywords


3.2   Multi-Concept Based Keyword Extraction
The multi-concept based keyword extraction assumes that a document may con-
tain more than one relevant concept. The detail algorithm of the multi-concept
based keyword extraction method is described in Algorithm 3.

4     A Hybrid Tag Recommender
In our experiment, we use a hybrid recommender as described in detail in Algo-
rithm 4. The recommender checks if a given resource exists in the training data.
Algorithm 3 The multi-concept based keyword extraction
1: Let column c of W (Wc ) be concept c and d be a document
                    dT Wc
2: Let rel(d, c) = kdkkW ck
3: Scale terms existed in the document using the concepts, i.e.,
                                       X
                                 d1i =   rel(d, c)wic di
                                          c
                                          X
                                  d2j =       rel(d, c)tjc dj
                                          c

4: Combine the normalized terms:

                         d̃ = (1 − α) (d1/ kd1k) + α(d2/ kd2k)

5: Select first n non zero terms of the ranked d̃ as keywords


If this is the case then the tag space based recommenders are suggested. The
collaborative recommender is used if a given user has profiles in system. Oth-
erwise, the most popular tag by resource method is used as tag recommender.
If the resource appears for the first time then the recommender examines the
content of the resource using the concept based keyword extraction algorithm.
Boosting the extracted tags if they have been used by the user before. If nei-
ther any tags nor any keywords are suggested then the most popular tags in the
training data are recommended. Using the user-created tags aims to direct the
standardization and consistency of supplied tags, while using the tags extracted
from textual contents intend especially to overcome the cold start problem.


Algorithm 4 A hybrid tag recommendation algorithm
 1: input : a post P < user, resource >
 2: if P.resource exists in system then
 3:    if P.user exists in system then
 4:       P.tags ⇐ collaborative tags by P.user
 5:    else
 6:       P.tags ⇐ most popular tags by P.resource
 7:    end if
 8: else
 9:    P.tags ⇐ the keywords by P.resource
10:    Boosting tags of P.tags that exist in tag space of P.user
11: end if
12: if P.tags = φ then
13:    P.tags ⇐ most popular tags
14: end if
15: Select top n of the ranked P.tags as recommended tags
5     Experiment
We apply our proposed recommender methods (Algorithm 4) for ECML PKDD
Discovery Challenge 20091 . The task of the competition requires the development
of a content-based tag recommendation method for BibSonomy2 , a web based
social bookmarking system that enables users to tag both web pages (bookmark)
and scientific publications (bibtex). The organizers of the competition made
available a training set of examples consisting of the resources accompanied
with their user-created tags. A testing data will be provided in order to evaluate
proposed recommenders. Each bookmark is described by its URL, a description
of the URL that usually is the title of the web page and an extended description
of the bookmark supplied by the user. Each bibtex is associated with values
of bibtex fields such as title, author, booktitle, journal, series, volume, number,
etc. BibtexKey, bibtexAbstract, URL, and description of the publication can be
specified. Some statistics of the data are shown in Table 1.


                       Table 1. Statistics of Experiment data

                                        Bookmark Bibtex
                             Training     181,491 72,124
                             Testing       16,898 26,104


    In our experiment, we use textual contents associated to each resource as
content of the resources. For the bookmark, the contents are the description of
the URL and the extended description. Title and abstract are textual contents
associated to the bibtex. A bookmark is identified by its URL address (url hash)
attribute and a bibtex by its title (simhash1 ) attribute. Therefore, a document,
bookmark or bibtex, is represented by the description given to the document by
all users that bookmarked the document.
    Let D be a testing data set, consisting of |D| examples (ri , Ti ), i = 1...|D|.
Let Ti be the set of tags created by users for a resource ri and Pi be the set
of tags predicted by a recommender for a resource ri . The precision, recall, and
F-measure for recommender f on testing data set D is calculated as follows:
                                               |D|
                                           1 X |Ti ∩ Pi |
                            Precision =
                                          |D| i=1 |Pi |
                                               |D|
                                           1 X |Ti ∩ Pi |
                               Recall =
                                          |D| i=1 |Ti |
                                               |D|
                                           1 X 2 |Ti ∩ Pi |
                        F − Measure =
                                          |D| i=1 |Pi | + |Ti |
1
    http://www.kde.cs.uni-kassel.de/ws/dc09
2
    http://www.bibsonomy.org
   We perform our experiment in java platform and use Lucene3 for creating the
tag-by-resource matrix and the term-by-resource matrix. The other processes are
conducted on the Weka4 framework, an open source machine learning software.

5.1    Experiment Settings
For each of the method of our experiment the settings we used to run them are
described as following:

Concept-based keyword extraction . For creating the term-by-resource matrix,
resources are parsed and a dictionary of terms is created using a standard word
tokenization method. The terms are words, special characters are removed, and
Snowball Porter stemming and standard stop words of English and German are
applied. Finally, the term-by-resource matrix is created using a term frequency
weighting scheme.
    Extracting concepts from the term-by-resource matrix is an important step to
find keywords from new resources. The optimal number of concepts (k), which
captures most concepts in the training document collection, remains difficult
to find. The method that is usually used for a practical purpose is a heuris-
tic approach. However, because of the memory usage, simulations are usually
conducted on the maximum number of concepts that can be extracted. In our
experiment, we extract 200 concepts for the training document collection. For
this task, we use the nonnegative double SVD initialization method [4] that con-
ducts no randomization and the projected gradient method [20] that converges
to a local minimum. We expect these combining methods leads to converge to a
unique solution with a minimum error.
    There is another parameter α that should be optimized in the two-level learn-
ing hierarchy approach. The parameter reflects the portion of the tag space and
the content space as sources of tags for the recommender. In our experiment,
we set the parameter α = 0.25 for the single-concept method and α = 0.05 for
the multi-concept method, which are the optimal values we get using a heuristic
method.

Collaborative recommendation [14] . For a given tag-by-user matrix X, a given
user u, a given resource r, and integer k and n, the set T (u, r) of n recommended
tags is calculated by:
                                        X
                 T (u, r) = argmaxnt∈T      sim(Xu , Xv )δ(v, t, r)             (3)
                                       v∈Nuk


where Nuk is k nearest neighbors of u in X, δ(v, t, r) = 1 if (v, t, r) ∈ f olksonomy
and 0 else. Therefore, the only parameter to be tuned is the number of neighbors
k. For that, multiple runs where performed where k incremented until a point
where no more improvement in the results were observed.
3
    http://lucene.apache.org/
4
    http://www.cs.waikato.ac.nz/ml/weka/
Most popular tags by resource . For a given resource we count how many posts
a tag occur together with that resource. We use tags that occur most often
together with that resource as recommendation.

Most popular tags . For each tags we count in how many posts it occurs. We
then use tags that occur most often as recommendation.


5.2   Single- vs. Multi-Concept Method

Fig. 1 shows the performances of the single- and multi-concept based keyword
extraction on testing data. From Fig. 1, we can calculate that recall, precision,
and f-measure of the multi-concept approach are, on average, 10%, 15% and
12%. The recall is likely to increase when the number of recommended tags gets
bigger, while the precision is reduced for the bigger numbers of tags. Fig. 1 also
shows the performance of the single-concept approach in the similar pattern
and its f-measure is, on average, 11%. From both curves, we conclude that the
multi-concept approach, which assumes that a resource may contain more than
one concept, improves f-measure of the single-concept method, on average, 10%.
The improvement occurs in all numbers of recommended tags. These results
verify that associating of resources with more than one concept gives better
performance than just considering the main concept of resources. In other words,
some minor concepts of a resource should also be examined for getting the better
performance of the keyword extraction.


                   0.2
                                                                                                      Single−Concept
                                                                                                      Multi−Concept
                  0.15
      Recall


                   0.1


                  0.05


                    0
                         1   2        3           4   5
                                 Number of Tags

                   0.2                                                 0.2


                  0.15                                                0.15
                                                          F−Measure
      Precision


                   0.1                                                 0.1


                  0.05                                                0.05


                    0                                                   0
                         1   2        3           4   5                      1   2        3             4          5
                                 Number of Tags                                      Number of Tags


             Fig. 1. Performance comparison of single- and multi-concept approach
5.3   One- vs. Two-Level Learning Hierarchy

In this section, we examine the performance of our proposed two-level learn-
ing hierarchy approach compared to the one-level version. Fig. 2 shows the
performance of the one-level learning hierarchy multi-concept based keyword
extraction and the two-level learning hierarchy multi-concept based keyword ex-
traction. From the figure, we see that the two-level learning method has better
recall, precision and f-measure. Its f-measure values, on average, are 13% better
than one of the one-level learning approach. The detailed recall, precision, and
f-measure values of the optimal performance of the two-level learning hierarchy
are given in Table 2.


                   0.2
                                                                                                One−Level Learning
                                                                                                Two−Level Learning
                  0.15
      Recall


                   0.1


                  0.05


                    0
                         1   2        3           4   5
                                 Number of Tags

                   0.2                                                 0.2


                  0.15                                                0.15
                                                          F−Measure
      Precision


                   0.1                                                 0.1


                  0.05                                                0.05


                    0                                                   0
                         1   2        3           4   5                      1   2        3           4          5
                                 Number of Tags                                      Number of Tags


 Fig. 2. Performance comparison of one- and two-level learning hierarchy approach


Table 2. Performance of two-level learning hierarchy of multi-concept based keyword
extraction method for each number of recommended tags using the optimal parameter
α = 0.05

                                  Num. of Tags Recall Precision F-Measure
                                       1       0.0538 0.1832      0.0832
                                       2       0.0908 0.1621      0.1164
                                       3       0.1187 0.1498      0.1324
                                       4       0.1386 0.1406      0.1396
                                       5       0.1533 0.1345      0.1433
6    Summary
In this paper, we propose a two-level learning hierarchy concept based keyword
extraction method for task1 of ECML PKDD Discovery Challenge 2009, that
is, a content-based tag recommendation. The tag recommendation method ex-
plores tags from textual contents of resources using concepts existing in the
textual contents of the resources. A multi-concept approach, which considers
more than one concept for each resource, improves the performance of a single-
concept approach, which only considers the most relevant concept. Moreover, our
experiment demonstrates that the proposed two-level learning hierarchy method
outperforms the common one-level learning approach for all performance mea-
sures, e.g. recall, precision, and f-measure.


7    Acknowledgments
The authors would like to thank Nicolas Neubauer for useful discussions, com-
ments and suggestions in writing this paper. The first author was funded partly
by a scholarship by the DAAD.


References
 1. K. Barker and N. Cornacchia. Using noun phrase heads to extract document
    keyphrases. In Proceedings of the 13th Canadian Conference on Artificial Intelli-
    gence, pages 417–426, Montreal, Canada, 2000.
 2. M. W. Berry, M. Browne, A. N. Langville, P. Pauca, and R. J. Plemmons. Algo-
    rithms and applications for approximate nonnegative matrix factorization. Com-
    putational Statistics and Data Analysis, 15(1):155–173, 2007.
 3. W. M. Berry, S. T. Dumais, and G. W. O’Brien. Using linear algebra for intelligent
    information retrieval. SIAM Review, 37:573–595, 1995.
 4. C. Boutsidis and E. Gallopoulos. Svd-based initialization: A head start on non-
    negative matrix factorization. Pattern Recognition, 41(4):1350–1362, 2008.
 5. C. Carpineto and G. Romano. Using concept lattices for text retrieval and mining.
    In B. G. et al., editor, Formal Concept Analysis, pages 161–179. Springer-Verlag,
    2005.
 6. M. Chu, F. Diele, R. Plemmons, and S. Ragni. Optimality, computation, and
    interpretations of nonnegative matrix factorization. Technical report, Department
    of Mathematics, North Carolina State University, 2005.
 7. S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harsh-
    man. Indexing by latent semantic analysis. Journal of the American Society of
    Information Science, 41(6):391–407, 1990.
 8. E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-Manning.
    Domain-specific keyphrase extraction. In Proceedings of the International Joint
    Conference on Artificial Intelligence, pages 668–673, 1999.
 9. S. Golder and B. Huberman. Usage patterns of collaborative tagging systems.
    Journal of Information Science, 32(2):198–208, 2006.
10. P. Heymann, D. Ramage, and H. G. Molina. Social tag prediction. In Proceeding
    of SIGIR, 2008.
11. A. Hotho, R. Jaschke, C. Schmitz, and G. Stumme. Information retrieval in folk-
    sonomies: Search and ranking. In Proceeding of the 3rd European Semantic Web
    Conference, pages 411–426, 2006.
12. P. Hoyer. Non-negative matrix factorization with sparseness constraints. Journal
    of Machine Learning Research, 5:1457–1469, 1994.
13. A. Hulth. Combining Machine Learning and Natural Language Processing for
    Automatic Keyword Extraction. PhD thesis, Stockholm University, 2004.
14. R. Jaschke, L. Marinho, A. Hotho, L. S. Thieme, and G. Stumme. Tag recom-
    mendations in folksonomies. In J. N. Kok, J. Koronacki, R. L. de Ma’ntaras,
    S. Matwin, D. Mladenic, and A. Skowron, editors, The 11th European Conference
    on Principles and Practice of Knowledge Discovery in Databases, pages 506–514,
    2007.
15. I. Katakis, G. Tsoumakas, and I. Vlahavas. Multilabel text classification for au-
    tomated tag suggestion. In Proceeding of the ECML/PKDD Discovery Challenge,
    2008.
16. F. W. Lancaster. Indexing and Abstracting in Theory and Practice. Library Asso-
    ciation Publishing, 1998.
17. D. D. Lee and H. S. Seung. Learning the parts of objects by nonnegative matrix
    factorization. Nature, 401:788–791, 1999.
18. D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In
    Advances in Neural Information Processing System, pages 556–562, 2001.
19. C. J. Lin. On the convergence of multiplicative update algorithms for non-negative
    matrix factorization. IEEE Transactions on Neural Networks, 18:1589–1596, 2007.
20. C. J. Lin. Projected gradient methods for non-negative matrix factorization. Neural
    Computation, 19:2756–2779, 2007.
21. D. Lin and P. Pantel. Concept discovery from text. In Proceedings of the Interna-
    tional Conference on Computational Linguistics, pages 577–583, 2002.
22. O. Medelyan and I. H. Witten. Domain-independent automatic keyphrase indexing
    with small training set. Journal of the American Society for Information Science
    and Technology, 59(7):1026–1040, 2008.
23. G. Mishne. Autotag: A collaborative approach to automated tag assignment for
    weblog posts. In Proceeding of the 15th International Conference on World Wide
    Web, pages 953–954, 2006.
24. P. Paatero and U. Tapper. Positive matrix factorization: A non-negative factor
    model with optimal utilization of error estimates of data values. Environmetrics,
    5:111–126, 1994.
25. C. Paice and W. Black. A three-pronged approach to the extraction of key terms
    and semantic roles. In Proceedings of the International Conference on Recent Ad-
    vances in NLP, pages 357–363, Borovets, Bulgaria, 2003.
26. U. Priss. Formal concept analysis in information science. Annual Review of Infor-
    mation Science and Technology, 40:521–543, 2005.
27. M. L. Reinberger and P. Spyns. Unsupervised text learning for the learning of
    dogma-inspired ontologies. In P. Buitelaar, P. Cimiano, and B. Magnini, editors,
    Ontology Learning from Text: Method, Application and Evaluation, pages 29–43.
    IOS Press, 2005.
28. S. C. Sood, S. H. Owsley, K. J. Hammond, and L. Birnbaum. Tagassist: Automatic
    tag suggestion for blog posts. In Proceeding of the International Conference on
    Weblogs and Social Media (ICWSM), 2007.
29. M. Tatu, M. Srikanth, and T. D’Silva. Rsdc’08: Tag recommendation using book-
    mark content. In Proceeding of the ECML/PKDD Discovery Challenge, 2008.
30. P. D. Turney. Learning algorithms for keyphrase extraction. Information Retrieval,
    2(4):303–336, 2000.
31. Z. Xu, Y. Fu, J. Mao, and D. Su. Towards the semantic web: Collaborative tag
    suggestions. In Proceeding of Collaborative Web Tagging Workshop, 2006.