Tagpedia: a semantic reference to describe and search for Web resources Francesco Ronzano Andrea Marchetti Maurizio Tesconi Institute for Informatics and Institute for Informatics and Institute for Informatics and Telematics (IIT) - CNR Telematics (IIT) - CNR Telematics (IIT) - CNR Via Moruzzi, 1 Via Moruzzi, 1 Via Moruzzi, 1 Pisa, Italy Pisa, Italy Pisa, Italy francesco.ronzano@iit.cnr.it andrea.marchetti@cnr.it maurizio.tesconi@iit.cnr.it ABSTRACT 1. INTRODUCTION: KEYWORD BASED Nowadays the Web represents a growing collection of an SEARCHES enormous amount of contents where the need for better Currently, keyword based Web searches are the preferred ways to find and organize the available data is becoming way to seek for resources of interest over the Web. Each a fundamental issue, in order to deal with information over- resource, usually identified by its URL, can be accessed by load. Keyword based Web searches are actually the pre- one or more keywords describing its content. The most wide- ferred mean to seek for contents related to a specific topic. spread methods to explore links between Web resources and Search engines and collaborative tagging systems make pos- keywords are the exploitation of a search engine or the sible the search for information thanks to the association of access to a collaborative tagging service (see Figure 1). descriptive keywords to Web resources. All of them show Search engines like Google, Yahoo, Ask and so on are ex- problems of inconsistency and consequent reduction of re- amples of automated information extraction systems: they call and precision of searches, due to polysemy, synonymy analyze the data and the structure of Web contents as well and in general all the different lexical forms that can be used as the search behaviour of users and the frequency of usage to refer to a particular meaning. A possible way to face or of different search strings to collect the most appropriate at least reduce these problems is represented by the intro- keywords that can be used to access a Web resource (see the duction of semantics to characterize the contents of Web re- lower portion of Figure 1). sources: each resource is described by one or more concepts On the other side, collaborative tagging systems like deli- instead of simple and often ambiguous keywords. To support cious, Flickr, YouTube and Technocrati rely upon user con- these task the availability of a global semantic resource of tribution. They are examples of social classification systems: reference is fundamental. On the basis of our past experience each person who belongs to the community of users of a col- with the semantic tagging of Web resources and the SemKey laborative tagging system describes Web resources of inter- Project, we are developing Tagpedia, a general-domain ”en- est by means of one or more freely chosen keywords, called cyclopedia” of tags, semantically structured for generating tags. All the tags associated to Web resources are collected semantic descriptions of contents over the Web, created by and exploitable by every user in order to find many resources mining Wikipedia. In this paper, starting from an analy- of interest. A popularity value is usually associated to each sis of the weak points of non-semantic keyword based Web tag describing a Web resource to point out the number of searches, we introduce our idea of semantic characterization times it has been chosen to characterize that resource and of Web resources describing the structure and organization consequently the importance of the tag itself among those of Tagpedia. We introduce our first realization of Tagpedia, related to the specific resource (see the upper portion of suggesting all the possible improvements that can be carried Figure 1). out in order to exploit its full potential. Even if they are very popular, keyword based Web search approaches show many weak points in manag- Categories and Subject Descriptors ing language expressivity. Many keywords can identify distinct concepts (polysemy): as a consequence the precision H.4.m [Information Systems]: Miscellaneous; D.2 [Software]: of search results decreases. Moreover if we don’t search for a Software Engineering common sense of that keyword, it is often very difficult to ex- plore the search results space so as to find Web resources of General Terms interest among those retrieved. For example, let us suppose that we want to find all the resources dealing with ’ajax’ semantic resource, knowledge organization, semantic web intended as the Greek hero: choosing ’ajax’ as search text string, there are no links related to mythology among the Keywords first 30 search results of Google. If we better specify the semantics, web, social, wikipedia, data mining search string in order to solve the problem, we partition the space of relevant search results depending on the particular Copyright is held by the Authors. Copyright transfered for publishing on- word added to ’ajax’ to disambiguate its meaning. For in- line and a conference CD ROM. SWKM’2008: Workshop on Social Web and Knowledge stance, depending on the addition of the word ’hero’ or the Management @ WWW 2008, April 22, 2008, Beijing, China. word ’mythology’ to ’ajax’ in the search string, considering . resources, underlining the need for a general-domain seman- tic resource of reference in order to support this task, taking into account also our past experience with the semantic tag- ging and SemKey. In Section 3 we introduce Tagpedia, the semantic resource of reference we have created by mining Wikipedia, explaining its organization and structure (Sub- section 3.1). In Section 4 we describe how Tagpedia can be utilized, describing the Tagpedia Web API and showing all the possible improvements to Tagpedia to exploit its full potential. Conclusions are described in Section 5. 2. FROM KEYWORDS TO CONCEPTS: SEMANTIC CHARACTERIZATION We can solve, or at least substantially reduce, Web re- sources organization and classification problems by adding a further level of completeness in their characterization: the semantics. Instead of relying on post processing of search results, we can directly semantically describe resources thanks to their association with one or more properly chosen con- cepts. In this way we extend the characterization of re- sources introducing the semantic level: each resource (R) is described by one or more concepts (C) and in turn each con- cept can be accessed through one or more keywords (K) (see Figure 1: Two ways to associate keywords to re- Figure 2). When we search for some informaton of interest, sources we can better specify our informative needs and we can easily and effectively access relevant results thanks to the support and the exploitation of the collection of concepts used to the first 10 search results shown by Google, only two of them describe Web resources, referred to as semantic resource in are present in both cases. Besides polysemy, also synonymy what follows. affects precision and recall of keyword based Web searches. In fact, when a specific meaning can be accessed through two or more keywords, the set of search results is differ- ent depending on the particular keyword chosen. Moreover, the different level of precision and the many possible users points of view that can be considered describing a particular resource, often cause a considerable loss of quality of Web searches. For a deeper analysis of all the factors that affect efficiency and effectiveness of keyword based Web search sys- tems see [4] [10] [12] [25]. In order to face the different drawbacks of the systems just analyzed, many distinct methods have been applied. The aggregation of search results from different search engines and their post elaboration is experimenting a growing diffusion. Systems like Vivisimo [15], Grokker [18] and Kartoo [19] are meta search engines. They collect Figure 2: Relations between resources, keywords search results from other search engines and group them ex- and concepts ploiting, for example, the category hierarchy of Yahoo and Wikipedia (Grokker) or creating clusters of similar search This way of improving Web contents organization repre- results and characterizing each of them by one or more addi- sents an attempt to realize the semantic description of infor- tional keywords (Vivisimo). They also display search results mation that stands at the basis of the Semantic Web vision. cartographically through very expressive maps that connect At present there are many proposal of semantic classifi- the most relevant resources to the most used keywords (Kar- cation methods for Web contents. FolksAnnotation [13], too). for instance, tries to extract the tags that describe a Web Also considering tagging systems, we can find many pro- resource from a collaborative tagging system, automatically posals to better organize search results to improve their qual- mapping them to the corresponding concepts of a prede- ity and the effectiveness of the search. FolkRank [5] is an fined domain ontology. Such kind of systems usually require algorithm created to rank search results in a tagging sys- a strongly and well organized ontological frame of reference tem, calculating a ranking value for each of them and thus that is difficult to realize; they have not provided signifi- evaluating their relevance. Also user profile is exploited in cant improvements in comparison with the classical keyword order to adapt ranking calculation to the information needs based methodologies. A different approach is those exploited of every single user. by systems like Semantic Halo [3]: it improves tag based The rest of this paper is organized as follows. In Section search systems adding semantic information without relying 2 we describe our idea of semantic characterization of Web on ontologies. Analyzing co-occurrences and frequencies of tags, Semantic Halo algorithm extracts groups of tags useful 3. TAGPEDIA: A GENERAL DOMAIN SE- to better specify and drive user search, like more general or MANTIC RESOURCE OF REFERENCE more specific ones or group of keywords defining a partic- ular naming of the selected tag. Not enough experimental Starting from the need for a global semantic resource ex- data on the effectiveness and usefulness of this method to ploitable as a reference to describe Web contents and there- improve tag based searches is currently available. Summa- fore comprehensive and updated, we have proposed a possi- rizing, a strong and widespread infrastructure that organizes ble solution to this demand, designing and building Tagpe- and provides access to Web resources on the basis of seman- dia. It is a semantic organization and classification of tic classificatory information is still absent. tags, intended as words or in general brief textual expressions, that people may use to describe Web Resources. Tagpedia is based on the model of term- During the first half of 2007, we have tried to realize the concept networks [11], structured ad hoc to support possibility to semantically describe Web resources develop- the semantic characterization of Web contents and ing SemKey [4], a semantic collaborative tagging system. initially populated exploiting Wikipedia data. In par- It extends current tagging systems allowing to character- ticular we have tuned a new way of mining Wikipedia to ize resources by referring to concepts. Each user can point extract the information needed to build Tagpedia so as to out and describe Web resources of interest: starting from a support concept based descriptions of Web resources also freely chosen tag, he can disambiguate it thanks to the sup- through tag disambiguation. port of Wikipedia [14] and WordNet [17] in order to identify We have chosen Wikipedia as the starting point because one or more defined concepts. In this way he produces a it represents the most rich and constantly updated semantic assertion that is the description of a specific fea- encyclopedic reference over the Web with a huge ture of Web resources through one or more chosen concepts. set of semantic contents included, even if not ex- Thus we can potentially overcome the limits in the descrip- plicitly exposed and easily accessible. During the last tion of Web resources related to the complexity of language, few years many studies have been carried out finding new exploiting their semantic characterization as well as the se- ways to extract useful semantic data exploiting the great mantic relations between concepts present in WordNet and amount of information contained in Wikipedia. Information Wikipedia. organizational patterns like infoboxes, internal and external We have implemented a working prototype of SemKey; links, redirect and disambiguation pages have been analyzed by analyzing the usage patterns and the semantic classifi- in order to extract valuable data. The DBPedia Project cation support provided by our system, we have identified [16], for instance, is a relevant attempt to extract semantic two key factors that need to be improved in order data from Wikipedia, making them available over the Web to really make possible semantic characterization of complying with Semantic Web standards [6]. DBPedia is a Web resources, as described in the previous part of this global knowledge base derived from Wikipedia, not specif- Section. ically intended for Web resources description as Tagpedia Both Wikipedia and WordNet, even if they show impor- is. In [24] there is a description of KLYN, a system that au- tant features to support the semantic description of Web re- tonomously semantifies Wikipedia, automatically suggesting sources, are weakened by relevant lacks. WordNet presents data inconsistencies, lacks or incompletenesses. Wikipedia a rich set of parts of speech and a strongly structured set has been also successfully exploited to compute semantic re- of relations between them, but it lacks many data useful to latedness between words [21] and natural language texts [9], support proper names disambiguation and it is not collab- but also to tune new named entities disambiguation method- oratively edited. Wikipedia is an encyclopedia so its con- ologies [7] [8]. Semantic relationships between Wikipedia tent is composed mainly by a very rich set of names along categories have been studied in order to make the search with their extended descriptions. Thus Wikipedia has strong of information easier and to give articles editors relevant proper names coverage and it has been proposed as a named suggestions [20]. Moreover some research has been done to entity disambiguation resource in [7] and [8]; it is also con- understand and measure the way Wikipedia articles are cre- tinuously updated, but lacks a structured set of relations ated and their contents become mature [22] or to analyze between the concepts described, even if its documents are statistical information about the growth of the data that interconnected by a huge number of links and loosely clas- constitute Wikipedia, the types of articles, the editors, the sified through categories. As a consequence the semantic link and category structure and so on [23]. resources considered are in some way complementary, but they have been built and structured for purposes different from the semantic characterization over the Web. In order 3.1 The structure of Tagpedia to better support this task we need a semantic resource The main aim of Tagpedia is the semantic characteriza- built and structured ad hoc, which is still absent: it tion of data over the Web. In particular it must allow to must feature all the advantages of those just analyzed, re- describe a Web resource through the association with one moving pointless informative contents. or more univocally referenced concepts. Thus, the main Moreover, a great limit to the usability of SemKey and to constitutional unit of TagPedia is the concept. Each an easy definition of new semantic metadata is represented concept must be unequivocally identified but also easily ac- by the different steps users must carry out to compose a se- cessed. The main way to point out a concept is through mantic assertion. This often discourages them from creating the words that refer to it. Such words will also be called semantic metadata. Some sort of automation is neces- tags in the following. As a consequence, each concept is sary in order to speed up the tag disambiguation identified by the set of all the words or, more generally, all process or to execute it through automated proce- the alfanumeric expressions of any kind that can be adopted dures. by a community of users to refer to it, thus constituting a set of synonymous tags or syntag set. Syntag sets are the usually used to manage synonyms, abbreviations, acronyms, molecules which form Tagpedia. misspellings, other spellings, different punctuations, partic- ular capitalization rules and so on. In TagPedia we mine Wikipedia content and extract all the redirect information analyzing redirect pages; for each of them we enrich the syn- tag set related to the referred concept by adding the title of the page as a new tag (in Figure 3, considering the syntag set 1, the tag ’leo onca’ is extracted from a redirect page). Moreover, Wikipedia usually manages polysemy through the disambiguation pages. As said, each disambiguation page represents a collection of links to all the different ar- ticle pages that identify the distinct meanings pointed out by the page title (textual string). For example, the word ’ajax ’ is highly polysemous and has 49 different meanings in Wikipedia: its disambiguation page contains links to 49 dis- tinct article pages; each one identifies a particular concept. We analyze Wikipedia disambiguation pages as a futher source of information to enrich the syntag sets of Tagpedia through the addition of new words that refer to a defined meaning. In particular, for every disambiguation page, we point out each syntag set related to the concepts referenced inside its Wikipedia text and we add the title of the same disambiguation page as a new tag exploitable to access to the selected syntag sets (in Figure 3, considering the syntag Figure 3: Three syntag sets set 1, the tag ’panther’ is extracted from a disambiguation page). The creation of an initial rich collection of syntag sets is Summarizing, let us define Ci a concept derived from a the first necessary step that must be carried out to build specific Wikipedia article page Pi . To populate with tags our semantic resource. Wikipedia shows many features ex- the syntag set for Ci we extract: ploitable to create such a collection of syntag sets. In partic- ular, in Wikipedia an article usually defines a specific con- • the title of Pi ; cept. As a consequence in order to bootstrap Tagpedia, we • the title of every redirect page to Pi ; create syntag sets from the articles of Wikipedia. In Figure 3 we show three examples of syntag sets made up by tags • the title of every disambiguation page containing a link collected mining Wikipedia. to Pi . To be more precise, Wikipedia pages can be substantially divided into three groups: • article pages: each describes a particular concept, identified by the title of the same page; • redirect pages: each links an alternate literal expres- sion, that constitutes the title of the redirect page, to the corresponding concept, usually identified by the title of an article page; • disambiguation pages: each lists all the possible concepts, usually identified through the titles of arti- cle or redirect pages, that can be referred by the literal expression constituting the title of the disambiguation page. The redirect and the disambiguation page mechanisms are two important Wikipedia organizational solutions that can be exploited to build and enrich syntag sets. Once identified a concept referring to a particular article page, we create an initial version of a syntag set, pointed out by a unique identifier, including only the tag corresponding to the title of the page (in Figure 3, considering the syntag set 1, the tag ’jaguar’ is the title of an article page). Then we collect all the words and expressions that may be used Figure 4: The structure of Tagpedia to refer to that concept. As previuosly mentioned, in Wikipedia the redirect mech- Starting from a dump of the English version of Wikipedia, anism is used to link alternate literal expressions to the orig- we have developed a set of C++ routines, that automatically inal encyclopedic article that describes a specific entity. It is analyze the text of Wikipedia articles. By mining structural elements of Wikipedia syntax as well as by considering texts Number of del.icio.us URLs: 100 punctuation and by exploiting pattern matching techniques Number of distinct tags: 1087 mainly based on regular expressions and string analysis, our Percentage of successfull disambiguations: 84 % routines gather all the concepts as well as all the possible tags used to refer to each single meaning, thus defining a Table 1: Tagpedia tag disambiguation support: pre- huge collection of syntag sets. The meaning of each concept, liminary evaluation results identified by a syntag set, is also better specified by pointing to the corresponding article in Wikipedia. All these data are collected in a relational database prop- 4. EXPLOITING AND IMPROVING erly designed and optimized for a fast access. It is consti- TAGPEDIA tuted by two basic collections: the concept table and the tag In order to support the generation of semantic descrip- table. The first one gathers all the concepts of Tagpedia as- tions of Web resources or to semantically search for Web signing to each of them a unique identifier, the Concept ID contents, the information contained in Tagpedia should be and a brief definition, extracted from the English version of easily accessed, querying the whole collection of syntag sets. Wikipedia. For every concept we also collect the URL of For this purpose we have developed the Tagpedia Web the corresponding Wikipedia article. On the other side, the API that is a simple set of procedures that may be invoked tag table contains links between each concept, referenced via Web to exploit the semantic support offered by Tagpe- through its identifier, and all the tags used to access to it. dia. These procedures carry out few fundamental tasks and By mining September 2007 dump of the English version may be composed to realize more complex functions; their of Wikipedia, we have obtained more than 1,9 millions execution can be easily requested by other external Web ap- of syntag sets and more than 4 millions of tags used plications so as to integrate semantic features. to point out the intended concepts, each one referencing a The main tasks that Tagpedia Web API supports are: specific Wikipedia article (see Figure 4). Considering Figure 5, we can visualize the weight of the • the definition of all the possible meanings for a given different sources of the 4.230.740 tags of Tagpedia. The tag, i.e. all the syntag sets that contain the tag; number of tags extracted from article pages (P ) is equal to the number of syntag sets, that is 1.927.378. Among • the collection of all the tags belonging to a specific syn- the 2.303.362 remaining tags, 481.250 have been generated tag set, i.e. all the words or expressions exploitable to by mining disambiguation pages and 1.822.112 by analyzing access that particular meaning; redirect ones. • the retrieval of the short textual description of a specific syntag set. Exploiting Tagpedia Web API, we have integrated this semantic resource into SemKey, our semantic collaborative tagging system, substituting WordNet and Wikipedia so as to support the disambiguation of the meaning of tags. Once chosen one or more tags, the user specifies the right meaning for each of them, choosing a particular syntag set among those including the intended tag. An early prototypal Web- based interface useful to explore and interact with Tapgedia is accessible at the URL www.tagpedia.org. In order to evaluate the coverage of Tagpedia and also to obtain suggestions to improve this semantic resource, we have tried to manually point out the right meaning of the tags associated to the 100 most popular Web resources over del.icio.us, tagged by more than 25000 users. Relying upon Tagpedia Web API, we have developed a Web based procedure that, starting from the URL of a Web resource retrieves all the related tags in del.icio.us. All the possible meanings of each tag are retrieved from Tagpedia along with their short descriptions and the user manually verify if the right concept is present. In this way, collecting all the results of our user based tests, we have obtained a first evaluation of the disambiguation effectiveness of our semantic resource. The results are shown in Table 1. Figure 5: Sources of the tags in Tapgedia Tagpedia provides a valid support to the process of dis- ambiguation for 84% of the total number of tags considered. Anyway we have identified several different ways to im- prove its contents and, as a consequence, its semantic cover- This group of syntag sets constitutes the basis of Tag- age and its usefulness. In the following part of this section pedia providing a way to unequivocally access and refer to we will describe these proposals for future works. concepts when users must semantically describe or search Despite its good disambiguation coverage, there are dif- for Web resources. ferent particular tags like ’sem web’, ’inplaceedit’, ’web dev’ and similar ones that are not managed by Tagpedia, be- keywords into concepts or browsing the collection of syntag cause they are non conventional words, often created by sets constituting Tagpedia without complicating their usual a user to describe a particular concept and then accepted interaction patterns or compromising the usability of the and exploited by many others. One possible solution to this systems they interact with. Moreover, automated method- problem is the introduction of collaborative Web edit- ologies to derive semantic descriptions of Web resources from ing techniques for Tagpedia contents. Giving users the simple keyword based ones can also be tuned, so as to create possibility to create new syntag sets or to merge or extend an initial solid collection of semantic metadata and boot- existing ones through new tags is fundamental for such a strap this new way to characterize resources over the Web. kind of resource. Indeed the effectiveness of Tagpedia in the description of Web resources is proportional to the possibil- 5. CONCLUSIONS ity to adapt and enrich this semantic resource in respect to In this paper we have presented Tagpedia, a collection of the variability of user descriptive needs. In this context, the tags semantically structured, built ad hoc to describe Web introduction of the possibility to collaboratively collect and contents. manage data, following a Wiki-like paradigm, represents a Starting from a brief analysis of the weak points of key- key factor of current Web and is a crucial issue considering word based methodologies for information organization and Tagpedia. searching and considering also the current approaches to face Another aspect of Tagpedia that can be substantially im- these issues, we have introduced the possibility to semanti- proved is the enrichment of its semantic contents with cally describe Web resources through concepts. To make it the addition of semantic relations between syntag possible, we have developed an initial version of Tagpedia sets; they are useful to better identify concepts or to easily a general domain semantic resource of reference, created by search for them. Each syntag set, representing a meaning, mining Wikipedia. After a description of its structure and may be connected to other ones through relationships like organization and an overview of the Tagpedia Web API, specialization, generalization, relatedTo and similar ones. useful to easily access and exploit the information collected Possible ways to mine relevan relations between syntag sets in Tagpedia, we have focused our attention on the possi- are the analysis of the internal links between Wikipedia ar- ble improvements to this semantic resource. Collaborative ticle pages as well as the exploitation of the hierarchy of wiki authoring, syntag set relations enrichment, automated Wikipedia categories. For instance, relying upon relations, procedures for content extraction from external sources, sup- when we specify the concept to search for or when we must port for multilinguism and automated generation of seman- choose a specific concept to semantically characterize a re- tic descriptions of Web resources are some of the many im- source, the system can show the most general or the most provements considered that can be carried out, underlining specific ones to simplify this task. Similarly, during a se- its broad enhancement possibilities. mantic search, starting from a specific syntag set, if we can On the base of all these considerations, we believe that browse all the related ones, we can better specify our search Tagpedia, despite its initial stage of development, represents needs and thus easily retrieve the desired information. an important attempt to support the introduction of seman- A third way to improve and enrich Tagpedia is the def- tics over the Web, trying to put in practice the principles of inition of semi-automated procedures to extend its the Semantic Web on a global scale and to better structure data, exploiting other resources and importing their and manage the huge amount of data constituting the actual contents into Tagpedia. Other relevant free Web thesauri Web. or dictionaries or other language tools can be valid sources of information. For instance the Dictionary of Automotive Terms [1] or the Free Online Medical Dictionary [2] are two 6. REFERENCES domain specific resources that can be integrated in Tagpe- [1] Dictionary of automotive terms. dia. Moreover, mapping rules between Tagpedia syntag sets http://www.motorera.com/dictionary/. and other Web semantic resources can be defined to inte- [2] Free online medical dictionary. grate different sources of information thanks to the common http://cancerweb.ncl.ac.uk/omd/. gronud represented by Tagpedia itself. [3] Alessio Malizia Alan Dix, Stefano Levialdi. Semantic Another aspect that must be further addressed in Tag- halo for collaboration tagging systems. In the Social pedia, is the support for multilinguism. In Tapgedia, Navigation and Community-Based Adaptation each syntag set is language independent. The tags consti- Technologies Workshop - June 20th, 2006, Dublin, tuting that particular syntag set are specific to the partic- Ireland. ular language. Managing the possibility to collect different [4] Francesco Ronzano Marco Rosella Salvatore Minutoli tags belonging to different languages into a syntag set, we Andrea Marchetti, Maurizio Tesconi. Semkey: A can deal with different languages and once identified one or semantic collaborative tagging system. In the Tagging more particular concepts we can make language indipendent and Metadata for Social Information Organization semantic searches. We think that this possibility should be Workshop at the World Wide Web Conference 2007 - better explored and defined, trying to determine specific se- May 8, 2007, Banff, Alberta, Canada. mantic search patterns. [5] Christoph Schmitz Gerd Stumme Andreas Hotho, As already mentioned in the concluding part of Section Robert Jĺaschke. Folkrank: A ranking algorithm for 2, the definition and tuning of automated or semi- folksonomies http://www.kde.cs.uni-kassel.de. In the automated procedures to create semantic descrip- Lernen - Wissensentdeckung - Adaptivität Workshop - tions is a further important issue to be faced. Users should October 9-11, 2006, Hildesheim, Germany. be allowed to semantically describe Web resources in an easy [6] Soren Auer and Jens Lehmann. What have innsbruck way; they must be supported in the task of turning simple and leipzig in common? extracting semantics from wiki content. In the 4th European Semantic Web knowledge management, pag. 41-50 - November 6-9, Conference - June 5th, 2007, Innsbruck, Austria. 2007, Lisboa, Portugal. [7] Razvan Bunescu and Marius Pasca. Using [25] Jianchang Mao Zhichen Xu, Yun Fu and Difu Su. encyclopedic knowledge for named entity Towards the semantic web: Collaborative tag disambiguation. In the Proceedings of the 11th suggestions. In the Proceedings of the Collaborative Conference of the European Chapter of the Association Web Tagging Workshop at the World Wide Web for Computational Linguistics - April 9-16, 2006, Conference 2006 - May 23-26, 2006, Edinburgh, Trento, Italy. Scotland. [8] Silviu Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In the Empirical Methods in Natural Language Processing Conference - June 28-30, 2007, Prague, Czech Republic. [9] Evgeniy Gabrilovich and Shaul Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In the Proceedings of the 20th International Joint Conference on Artificial Intelligence - January 6-12, 2007, Hyderabad, India. [10] Scott A. Golder and Bernardo A. Huberman. The structure of collaborative tagging systems. In the Journal of Information Sciences, vol. 32, April, pag. 198-208, 2006. [11] Andrew Gregorowicz and Mark A. Kramer. Mining a large-scale term-concept network from wikipedia. Mitre Technical Report, October 2006. [12] Marieke Guy and Emma Tonkin. Tidying up tags? D-Lib Magazine, 12, January 2006. [13] Hugh C. Davis Hend S. Al-Khalifa and Lester Gilbert. Creating structure from disorder: using folksonomies to create semantic metadata. In 3rd International Conference on Web Information Systems and Technologies - 3-6 March, 2007, Barcelona, Spain. [14] http://en.wikipedia.org/wiki/. The english version of wikipedia. [15] http://vivisimo.com/. Vivisimo, search done right! [16] http://wiki.dbpedia.org. Dbpedia. [17] http://wordnet.princeton.edu/. Princeton wordnet. [18] http://www.grokker.com/. Grokker enterprise search management. [19] http://www.kartoo.com/. Kartoo meta-search engine. [20] Wolfgang Nejdl Sergey Chernov, Tereza Iofciu and Xuan Zhou. Extracting semantic relationships between wikipedia categories. In the 1st Workshop on Semantic Wikis at the 3rd European Semantic Web Conference - June 11-14, 2006, Budva, Montenegro. [21] Michael Strube and Simone Paolo Ponzetto. Wikirelate! computing semantic relatedness using wikipedia. In the Proceedings of the 45th Annual Southeast Regional Conference, pag. 106 - 110 - March 23-24, 2007, Winston-Salem, North Carolina, USA. [22] Cristopher Thomas and Amit P.Sheth. Semantic convergence of wikipedia articles. In the Proceedings of Web Intelligence Conference, pag. 600-606 - Silicon Valley, November 2-5, 2007. [23] Jakob VoSS. Measuring wikipedia. In the Proceedings of the 10 th International Conference of the International Society for Scientometrics and Informetrics - July 24-28, Stockholm, Sweden. [24] Fei Wu and Daniel S. Weld. Autonomously semantifying wikipedia. In the Proceedings of the 16th ACM conference on Conference on information and