=Paper=
{{Paper
|id=None
|storemode=property
|title=Semantic Annotation from Social Data
|pdfUrl=https://ceur-ws.org/Vol-830/sdow2011_paper_3.pdf
|volume=Vol-830
|dblpUrl=https://dblp.org/rec/conf/semweb/SolskinnsbakkG11
}}
==Semantic Annotation from Social Data==
Semantic Annotation from Social Data Geir Solskinnsbakk and Jon Atle Gulla Department of Computer and Information Science Norwegian University of Science and Technology Trondheim, Norway {geirsols, jag}@idi.ntnu.no Abstract. Folksonomies can be viewed as large sources of informal se- mantics. Folksonomy tags can be interpreted as concepts that can be extracted from the social data and used as a basis for creating semantic structures. In the folksonomy the connection between these concepts and the tagged resources are explicit. However, to effectively use the extracted conceptual structures it is important to be able to find connections be- tween the concepts and not only the already tagged documents, but also new documents that have not previously been seen. Thus, we present in this paper an automatic approach for annotating documents with con- cepts extracted from social data. This is based on representing each tag’s semantics with a tag signature. The tag signature is then used to generate the annotations of documents. We present an evaluation of the approach which shows promising results towards automatic annotation of textual documents. 1 Introduction The last years we have seen a growing amount of social services on the web. Amongst these are a wide range of collaborative services that offer users the possibility of tagging a multitude of resources. These resources can be anything on the web, ranging from images, videos to documents. These services can aid the user in organizing information by letting the user attach tags to the re- sources for easy access at a later time. In addition, the social aspect lets users share resources and tags, so that others can also take advantage of the effort each individual user puts into tagging. There exist many tagging systems, like Flickr (http://www.flickr.com) which lets users share and tag images, Delicious (http://www.delicious.com) which lets users tag and share any resource speci- fied with a URL, Bibsonomy (http://www.bibsonomy.org) which lets users tag and share literature references. Users are free to choose which tags to apply to resources with no centralized control of the vocabulary. The networked data structure resulting from such systems are often referred to as Folksonomies [1]. Tags in folksonomies can be seen as a basis for concept extraction for se- mantic data structures, which can also be seen in several publications lately [2, 3]. The conceptual structures are one side of the story, however, it is also an interesting problem to connect the concepts (tags) with documents on the web. This is especially interesting for applications that require search and browsing of the structure and documents. On one hand, we already have a mass of manual annotators (the users of the folksonomy) who generate annotations. Unfortu- nately, the users have not tagged every single document. This means that there is a huge amount of documents that have not yet been annotated by folksonomy users. Although the documents have not been tagged by users, the documents may be interesting for a browsing facility. Determining the correct annotation of a document automatically is thus the problem we are targeting in this paper. As a solution towards this problem we propose an approach towards fully au- tomatic annotation of documents that have never been seen by the system (i.e. documents that have not yet been tagged by any user). Since we are working on folksonomy data we will use the terms tag and tagging rather than concept and annotation, respectively, for the remainder of the paper. Tags on their own carry only limited semantics. However, we can exploit that the folksonomy can be seen as a large repository of informal semantics to extend the semantics of the tags. This is done by associating each tag with a tag signature. The signa- ture takes the form of a vector of semantically related terms, which are weighted to describe the strength of the relations between the tag and the terms in its vector. The tag signature is constructed based on the (textual) resources that have previously been tagged by the users of the folksonomy. By utilizing the tag signatures for suggesting tags to documents, we are using the content (or topic) of the document and tag signature to suggest tags. Thus our approach is not only able to suggest tags to resources that have been tagged before, but also to resources which are new to the system. The approach is evaluated (using train- ing and test data) based on a data set crawled from Delicious. The results of the evaluation are promising in terms of automatically assigning tags to documents. The remainder of the paper is organized as follows: Section 2 gives an overview of the related work, while Section 3 gives an overview of tag signatures and the approach for automatic tag suggestion. Section 4 describes the evaluation and results, followed by a discussion of our findings in Section 5. Finally the paper is concluded in Section 6. 2 Related Work The related work for this paper is directed at tag recommender systems, since these systems essentially provide some of the same functionality that we are targeting. Mishne [4] presents an approach for suggesting tags for weblog posts. This is done by first finding similar weblog posts using information retrieval techniques. The tags used on the most similar posts are retrieved and ranked before being presented to the user. Another system for tagging of blog posts is described by Qu et al. [5]. The system uses key phrase extraction applied to the blog content to find tags which can be applied to the blog post. The system described by Baruzzo et al. [6] also uses key phrase extraction for generating tag recommen- dations to the user. The keyphrases are extracted from the text and mapped to domain ontology. Spreading activation is employed in the ontology to locate common ancestors which are presented to the user as new tag recommendations. In [7], Lipczak et al. present an approach based on a combination of extracting candidate tags from the resource and using information found in the folkson- omy. Candidate tags are found from the title and the URL of the resource, tags related to the resource, and tags related to the user. Musto et al. [8] apply a combination of content-based and collaborative- based approaches to generate tag recommendations. The content-based approach analyzes the resource to tag, and extracts candidate tags from the URL, the HTML title and meta tags. The candidates are scored by taking into account type of source (URL, title etc.) and the occurrence frequency within each source type. The collaborative approach searches an underlying corpus of users, resources, and tags to find candidate tags. Finally the user is presented with tags from one or both of the candidate tag sets based on some strategy. Jäschke et al. [9] present two different algorithms for tag recommendation based on folksonomy data. The first is based on collaborative filtering, and the second is based on the FolkRank algorithm. Gemmell et al. [10] describe an approach for tag recommendation based on adapting the k-nearest neighbor algorithm to folksonomy data. Most current methods use either the content of the resource (key phrase extraction), or the data found in the folksonomy as a source of tags to recommend to the user. Our approach to automatic tagging is based on a combination (even though we do not extract tags from the content). We use the information in the folksonomy (the mapping from tag to resource) and the content of the resource to build a semantic representation of each tag. In this way our approach is able to suggest tags (that are used in the folksonomy) to documents that have not been seen before. Systems that purely use the graph structure of the folksonomy to recommend tags, will suffer when trying to recommend tags to a resource not previously seen. On the other hand, systems that purely rely on extracting tags from the content may lead to an increase in the tag vocabulary. Hence, reusing tags that already exist in the folksonomy will ensure that the vocabulary in the folksonomy is consolidated. 3 Tag Signatures Users that contribute within a community to tag and share resources on the web generate what is often referred to as a folksonomy [1]. Folksonomies consist mainly of three entities; (1) users; (2) tags; and (3) resources. Bookmarking is the action of a user attaching one or more tags to a specific resource, and the combined data is called a bookmark. Heymann views this data as triples[11] {user, tag, URL}. The interpretation of the triple is that user has applied tag to the resource identified by URL. As the user has actively engaged in applying the tag(s) to the resource we make a basic assumption that the tag(s) make up a description of the documents’ content. In terms of the user, the tag(s) applied signal the semantics of the resource and should be representative for the resource’s content, so it is later easy to find (both for the user himself and others in the community). The assumption made above is used as a basis for generating an extended semantic representation of the tags using the contents of documents to which a tag has been applied. This representation associates each tag in the folksonomy with a vector of semantically related terms. Each term is given a weight that reflects the importance of the term with respect to the tag. This means that a term can be connected to several different tags, but with different weighting, signaling that the term has a different importance with respect to each tag. We refer to our semantic representation as a Tag Signature. Two different consider- ations are made when deciding how to weigh the terms in each tag signature. The first is that the weight should reflect the internal semantics of the tag. This means that we want to give a high weight to terms that are good at character- izing important aspects of the tag. The second is that we want the weight to reflect the external semantics of the tag. This in essence means that we want the term to be good for discriminating this tag from others. Thus we apply the tf · idf [12] measure for weighting the terms in the signatures. The collection of terms and their weights collectively represent the semantic content of the tag, and we thus refer to the tag signature as an extended semantic representation of the tag, which greatly extends the pure syntactic representation of the tag. The tag signature materializes as a vector. The definition is given in [13] (in [13] we use the term Tag Vector), but we repeat it here for convenience as Definition 1.Details of the construction of the tag signature can be found in [13]. Definition 1.Tag Signature. Let V be the set of n terms (vocabulary) in the collection of tagged resources. ti ∈ V denotes term i in the set of terms. The tag signature for tag j is defined as the vector Tj = [w1 , w2 , . . . , wn ] where each wi denotes the semantic relatedness weight for each term ti with respect to tag j. 3.1 Unsupervised Tagging Approach Unsupervised tagging can be used in many application areas such as tag recom- mendation, automatic tagging of a set of documents, document classification, etc. Our approach to automatic tagging takes as input an untagged document and returns a ranked list of tags. The similarity between the document content and the tag is based on the tag signature. Since the tag signature is represented as a vector of weighted terms, and similarly the document can be viewed as a vector of weighted terms, we propose to use the cosine measure to calculate the similarity between the two. The calculation is shown as Equation 1 [12], where wi,d is the weight of term ti in the document, wi,j is the weight of term ti in Tj , and n is the number of terms. In our implementation, we have stored all tag signatures in a tag signature index, and use the document as a large query into the tag signature index. The list of tags returned can be cut off at top m tags, or at a threshold for the similarity score. ∑n i=1 wi,d × wi,j sim(d, Tj ) = √∑ (1) n 2 × w2 w i=1 i,d i,j Our approach does not increase the tag vocabulary (as for instance key word extraction techniques might do by proposing new tags). This is a benefit since the document will be tagged according to the already used tags. This means that we can classify the documents according to tags that are already used and are found in the semantic structure. However, if the coverage of the tags is not sufficient, it may be the case that new tags have to be introduced. In such cases the system could have as fallback strategy to implement one of the content based tag suggestion algorithms found in the literature. Another benefit is that the extended semantic representation of the tags allows us to adapt the semantics of a tag to the way it has been used by the users. This implies that a tag may have a different tag signature in different communities, since the tags may be used in slightly different contexts. However, this also means that there will be domain restrictions to the approach. For automatic tagging of good quality we are reliant upon a good coverage of the domain. 4 Evaluation The experiment is performed on a data set from Delicious that we crawled between December 2009 and January 2010. We only kept bookmarks point- ing at resources under “http://en.wikipedia.org/wiki/”, the English section of Wikipedia. The crawl resulted in 228536 bookmarks created by 51296 users, 72420 unique tags, and 65922 unique URLs. We kept only English Wikipedia documents so that we could map the documents to a dump of Wikipedia (from June 2008) which has been cleaned and Part of speech (POS) tagged [14]. We performed some simple filtering of the crawled data, removing bookmarks point- ing at certain document classes. All bookmarks pointing at documents prefixed with category:, user:, image:, etc. were removed from the delicious data set. This filtered 14162 bookmarks. We were able to map the URLs in 91.2% of the re- maining bookmarks to the Wikipedia dump, leaving us with a total of 195471 bookmarks. Mapping failures may have been due to encoding problems, articles that have moved, etc. Next, we filtered the bookmarks based on tags. This was done by lowercasing tags and removing all tags that had not been used by at least 5 users and in 25 bookmarks. This is to ensure that we remove some of the noisy tags found in folksonomies, and assure that the tags have been sufficiently used. The final tag set consisted of 2988 tags (used to tag 59610 documents). The data set has been randomly split into two parts based on the documents, one for generating the tag signatures (training set) and one for the evaluation (test set). The training set consists of 29845 documents while the test set consists of 29765 documents. The tag signatures have been constructed according to the description given in Section 3. Further we have performed the evaluation using both standard preprocessing and by extracting terms based on POS tags in the Wikipedia collection. The POS based pre-processing is based on extracting only noun phrases from the text, splitting phrases and stemming individual terms. The first part of our evaluation is designed to find how well the tag assign- ments made by our approach corresponds with the tags assigned to the docu- ments by the folksonomy users. This is done by constructing the tag signatures based on the training set and comparing the tag assignments generated by our approach in the test set with the original tag assignments in the bookmarks of the test set. As a simple base line, we have chosen to use keyword search (named KW Tags). The keyword search is performed by using each tag in the folksonomy (same tag set as we use for tag signatures) as a keyword query matched against the document and generates for each document a ranked list of tags which we compare our method to. All indexing and search has been implemented using Lucene1 . The second part of the evaluation, the user evaluation, has been performed by presenting a group of 6 persons (including one of the authors) with 15 randomly selected documents. For each document the user has been presented with the top 10 ranked KW Tags, and the top 10 Tag Signature based tags (in random order). Tags that have been used to tag each document in the original folksonomy data set have been removed from the evaluation set. Thus we are evaluating only new tag assignments. This is done to learn more about the quality of the tags that are suggested but that have not previously been used to describe the documents. In case of overlap between the two result sets, the list of tags has been padded with extra tags so that the user always is presented with 20 tags. The evaluators used a 5 point scale in which 1 meant that the tag was not appropriate to describe whole or parts of the document content, while 5 meant that the tag was highly descriptive of whole or parts of the document. 4.1 Results In the first part of the evaluation, we investigate how well our results compare to the tag assignments made by users in the folksonomy. We have used the training set to generate the tag signatures and the test set for evaluation. This means that the text of the documents we evaluate is not incorporated in the training phase. Consequently, the set of bookmarks has been split in two, one for the training set and one for the test set. We have calculated two different measures, the R-precision, and Precision @ 10 (P@10). R-precision for the tag assignments of a single document is calculated by taking all tags assigned to the document by the users (of the folksonomy; the original tag assignments) in the test set as the relevant set of tags, R, with |R| elements. Next we take the top |R| results from KW Tags and our method and calculate the precision in these sets. We also check the precision in the top 10 tags (ranked by the cosine measure) as these tags are the most interesting to suggest to users. We have grouped the results according to the number of unique tags assigned to the documents (Figure 1), the number of times a user has tagged the document (Figure 2(a)), and the size of the documents after preprocessing (Figure 2(b)). The average R-precision value calculated over all documents in the test set is 0.224 for our approach and 0.155 for the keyword based approach. The average P@10 is 0.238 for our approach and 0.168 for the keyword based approach. 1 http://lucene.apache.org (a) Standard preprocessing (b) POS based preprocessing Fig. 1. Results grouped by number of unique tags (X) assigned to each document. Figure 1(a) and 1(b) show the results of KW Tags and our method based on standard preprocessing and POS tag based preprocessing, respectively. The results show that the quality of the two approaches seem quite comparable, thus using the POS information does not improve the quality of the results sig- nificantly. Next we can note from the figure that our results are consistently significantly better through all groups than using the pure keyword based ap- proach. Manual examination of the results also shows that our approach is able to find tags that are not present in the document text. In Figure 2(a) we have grouped the results according to the number of tag as- signments to each document. These results show the same trends as the previous graph, as should be expected, since there is a correlation between the number of unique tags assigned to a document and the total number of tag assignments to a document. As the number of tags assigned by users to a document increases, so does the probability of being able to suggest one of these tags. The increase in the experiment metrics with increasing number of unique tags/tag assignments should thus cater for at least parts of this effect. Figure 2(b) shows the results grouped by document size (after preprocess- ing). From the graph we can see that our approach scores consistently higher than KW Tags for both measures. From the figure we can see that the results from the KW Tags seems quite stable with only small changes as the size of the documents increase. The approach based on the tag signature on the other hand seems to increase, but with a lower rate as the document size increases as in a logarithmic function. This is an quite interesting result. Since the tag signa- tures have the form of a vector, we should expect that the number of tags that mach a given document increase as the document size increases (the number of (a) Number of user tag assignments (b) Document size in number of terms Fig. 2. The figures 2(a), and 2(b) show the results grouped by the number of user tag assignments and the document size in number of terms, respectively. potential keywords to match increase). This should also be visible for the KW tags case. However, the results do not show this kind of effect, rather a decrease in the evaluation metric as the number of document terms passes 5000. Thus we interpret this as a result pointing towards that the added semantics of the vectors are able to generate better suggestions. Figure 3 shows the results from the user evaluation. The data series named Tag Signatures is based on the top 10 tags suggested by our approach, while the data series KW Tags is based on the top 10 tags suggested by using the existing tags in the system as keyword queries into the documents. Tags that have been used to tag these documents in the folksonomy data set have not been evaluated. Thus the tags evaluated are “new” to each of these documents. The evaluation is performed to check the quality of the remaining tags from the first part of the evaluation, i.e. tag assignments from our system that are not present in the form of bookmarks in the data set collected. The graphs show that the quality of the tags were assessed by the evaluators to be, on average, of higher quality for the Tag Signature data series in 10 out of 15 documents. The average value was found to be 3.18 for tags suggested by our approach and 2.91 for tag suggested by the keyword based approach. Although not statistically significant results, we see this as a positive tendency. Manual examination of the documents and tag evaluations showed that there was some disagreement (as can also be seen from Table 1 which shows the standard deviation of the user evaluation scores). This seems to point towards that it is hard to understand the mechanisms that lie behind tagging. It seems that one tag may be valuable to one user, while it is not that valuable to others. The users’ intention when tagging (or evaluating a tag in our case) seems to be very important. Some users would like to tag based on the general topic of the document, while others may want to tag based on certain details in the document. This makes it hard to evaluate tagging on single documents, and our approach seems to be more appropriate when we take a large sample of documents into consideration. Two types of tags our approach seems to not handle satisfactory are subjective tags and very general tags (like interesting, history, etc.). Subjective tags are hard to handle in general and will be discussed further in the next section. Very general or broad terms may cover a very wide topic (like history, which can be used to tag documents about World War II and music history in the 60’s). This can however be viewed as a variation on tag ambiguity which we address in the next section. Fig. 3. The results from the user evaluation. Based on 15 randomly chosen documents. Table 1. The standard deviation of the results from the user evaluation. Exp./Doc. D#1 D#2 D#3 D#4 D#5 D#6 D#7 D#8 D#9 D#10 D#11 D#12 D#13 D#14 D#15 σT agSign. 1.550 1.379 1.546 1.544 1.601 1.502 1.334 1.198 1.385 1.469 1.160 1.380 1.395 1.544 1.476 σKW T ags 1.525 1.427 1.358 1.280 1.703 1.379 1.455 1.388 1.527 1.266 1.443 1.479 1.481 1.469 1.510 Table 2 shows the tag assignments given by our approach and by the keyword based approach for the document “Comparison of layout engines (HTML5)” (based on the 2008 Wikipedia dump). The results show that the two approaches have an overlap of two tags, firefox and xhtml. If we polarize tag suggestions as being either good or bad and define good tag suggestions as those with an average score above 3, we see that in our approach 9 out of 10 suggested tags qualify, while for the keyword based approach only 5 out of 10 qualify. Table 2. Example set of tags for the document “Comparison of layout engines (HTML5)” (2008 version). Tag signatures KW Tags Tag Score Tag Score ie 4.5 firefox 4.2 firefox 4.2 xhtml 4.2 xhtml 4.2 engine 3.3 compare 3.7 emulation 2 mozilla 3.8 values 1.8 xforms 3.8 xml 4 webstandards 4.2 input 2 png 1.8 property 1.7 css 4 experimental 1.2 xslt 3.3 internet 3.7 5 Discussion The results described in the previous section show that our approach using tag signatures for automatic assignment of tags to documents previously not seen by the system has quite good performance. However, when looking at P@10 (average 0.238), we see that we are not able to find all tags applied to the documents by users of the folksonomy. What about the quality of the remaining tags suggested? The second part of the evaluation was supposed to give us an answer to this question, but due to disagreement among the users, it is hard to give a conclusive answer. In fact, the disagreement among the evaluators highlights the problem of evaluating tag assignments. The intention of a user is highly relevant as discussed in the previous section. We found that on average, the score given to tags suggested by our system (3.18) seems to indicate that tags suggested by our system have some positive aspects. Thus although we do not have any conclusive evidence, P@10 would most likely be higher since our system suggests tags that, even though not applied to the document by users in the folksonomy, seem to make sense among the evaluators. For a definite answer to the question of the overall quality of the tag suggestion, we would need to perform a larger evaluation. One of the strengths of our approach is, in our opinion, that it is able to assign tags to documents that have not been seen by the system previously. We are thus not as bound as methods that adhere to strictly using approaches based on collaborative filtering. The tags suggested are based on the content of documents previously tagged with each tag, and the terms are weighted based on balancing the internal and external representation of the tag. Thus we might say that our approach is a combination of content based and folksonomy based tagging. Further the positive results we have achieved, tell us that the quality of the tag signatures seems reasonable, they are able to describe the characteristics of the tags in terms of a weighted vector of terms. Tag disambiguation is a concern that we have not addressed in the cur- rent phase of our research. Tags have a tendency to be ambiguous (polysemy, homonymy etc.), which is also described in the literature (e.g. in Heymann et al. [15]). Take for instance the tag apple. Apple can be used in the computer com- pany sense or in the fruit sense. In our case, tag ambiguity may cause the tag signatures to be imprecise, meaning that they span two or more specific topics (causing drift of the signature). This may have affected our results negatively, by suggesting inappropriate tags to documents. Tag ambiguity can however be reduced by applying one of several measures found in the literature for tag dis- ambiguation (e.g. in Garcia-Silva et al. [16] or Angeletou et al. [17]). In our ap- proach tag disambiguation could be applied during tag signature construction, and would generate several tag signatures (one for each sense) for ambiguous tags. Subjective tags would also give rise to some degree of ambiguity. How do you quantify what cool or interesting means? These types of tags are hard to deal with in automatic systems, since what one person finds interesting may be uninteresting to another. Thus these types of tags are rather useless to apply in automatic systems, and these kinds of systems should focus on the topic of the document. 6 Conclusions In this paper we have presented an approach for automatically annotating docu- ments with folksonomy tags using tag signatures. The signatures are materialized as a vector of weighted terms, in which the weights reflect the semantic related- ness of the term with respect to the tag. Our evaluation shows that our approach beats naive tagging, using a direct match between tag and document. We found that we are able to annotate documents in which the tag is not present using the tag signature as a semantic connection. Further, the annotations are not made purely based on what a document has been tagged with in the folkson- omy, but takes into account the content of the document as well. The evaluation is based on presenting annotations to documents that have not been seen by the system before and interpret this as evidence that our tag signatures carry more semantics that the tag on its own. Acknowledgment. This research was carried out as part of the IS A project, project no. 176755, funded by the Norwegian Research Council under the VERDIKT program. References 1. Thomas Vander Wal. Folksonomy coinage and definition. http://vanderwal.net/folksonomy.html, Accessed February 8. 2011. 2. Peter Mika. Ontologies are us: A unified model of social networks and semantics. In The Semantic Web - ISWC 2005, volume 3729 of Lecture Notes in Computer Science, pages 522–536. Springer Berlin / Heidelberg, 2005. 3. Dominik Benz, Andreas Hotho, and Gerd Stumme. Semantics made by you and me: Self-emerging ontologies can capture the diversity of shared knowledge. In Proceedings of the 2nd Web Science Conference (WebSci10), Raleigh, NC, USA, 2010. 4. Gilad Mishne. Autotag: A collaborative approach to automated tag assignment for weblog posts. In Proceedings of 15th International Confernece on World Wide Web (WWW), pages 953–954. ACM Press, 2006. 5. Lizhen Qu, Christof Müller, and Iryna Gurevych. Using tag semantic network for keyphrase extraction in blogs. In Proceedings of 17th Conference on Information and Knowledge Management, pages 1381–1382. ACM, 2008. 6. Andrea Baruzzo, Antonina Dattolo, Nirmal Pudota, and Carlo Tasso. Recom- mending new tags using domain-ontologies. In WI-IAT ’09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, pages 409–412. IEEE, 2009. 7. Marek Lipczak, Yeming Hu, Yael Kollet, and Evangelos Milios. Tag sources for recommendation in collaborative tagging systems. In Proceedings of the ECML/PKDD 2009 Discovery Challenge Workshop, pages 157–172, 2009. 8. Cataldo Musto, Fedelucio Narducci, Pasquale Lops, and Marco de Gemmis. Com- bining collaborative and content-based techniques for tag recommendation. In Proceedings of 11th International Conference on E-Commerce and Web Technolo- gies (EC-Web), volume 61 of LNBIP, pages 13–23. Springer, 2010. 9. Robert Jäschke, Leandro Marinho, Andreas Hotho, Lars Schmidt-Thieme, and Gerd Stumme. Tag recommendations in folksonomies. In Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), volume 4702 of LNAI, pages 506–514. Springer, 2007. 10. Jonathan Gemmell, Thomas Schimoler, Maryam Ramezani, and Bamshad Mobasher. Adapting K-Nearest Neighbor for Tag Recommendation in Folk- sonomies. In Proceedings of the 7th Workshop on Intelligent Techniques for Web Personalization and Recommender Systems, 2009. 11. P. Heymann, G. Koutrika, and H. Garcia-Molina. Can social bookmarking improve web search? In First ACM International Conference on Web Search and Data Mining (WSDM’08). 12. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, New York, 1999. 13. Geir Solskinnsbakk and Jon Gulla. A hybrid approach to constructing tag hierar- chies. In On the Move to Meaningful Internet Systems, OTM 2010, volume 6427 of Lecture Notes in Computer Science, pages 975–982. Springer, 2010. 14. J. Artiles and S. Sekine. Tagged and Cleaned Wikipedia. Available from http://nlp.cs.nyu.edu/wikipedia-data/, Accessed December 2009. 15. Paul Heymann, Daniel Ramage, and Hector Garcia-Molina. Social tag prediction. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2008. 16. A. Garcia-Silva, M. Szomszor, H. Alani, and O. Corcho. Preliminary results in tag disambiguation using dbpedia. In CKCaR’09: Proceedings of the 1st International Workshop on Collective Knowledge Capturing and Representation at K-CAP 2009, 2009. 17. Sofia Angeletou, Marta Sabou, and Enrico Motta. Semantically enriching folk- sonomies with flor. In 1st International Workshop on Collective Semantics: Col- lective Intelligence & the Semantic Web (CISWeb 2008) at ESWC, 2008.