=Paper=
{{Paper
|id=Vol-356/paper-3
|storemode=property
|title=Tagpedia: a Semantic Reference to Describe and Search for Web Resources
|pdfUrl=https://ceur-ws.org/Vol-356/paper3.pdf
|volume=Vol-356
|dblpUrl=https://dblp.org/rec/conf/www/RonzanoMT08
}}
==Tagpedia: a Semantic Reference to Describe and Search for Web Resources==
Tagpedia: a semantic reference to describe and search for
Web resources
Francesco Ronzano Andrea Marchetti Maurizio Tesconi
Institute for Informatics and Institute for Informatics and Institute for Informatics and
Telematics (IIT) - CNR Telematics (IIT) - CNR Telematics (IIT) - CNR
Via Moruzzi, 1 Via Moruzzi, 1 Via Moruzzi, 1
Pisa, Italy Pisa, Italy Pisa, Italy
francesco.ronzano@iit.cnr.it andrea.marchetti@cnr.it maurizio.tesconi@iit.cnr.it
ABSTRACT 1. INTRODUCTION: KEYWORD BASED
Nowadays the Web represents a growing collection of an SEARCHES
enormous amount of contents where the need for better Currently, keyword based Web searches are the preferred
ways to find and organize the available data is becoming way to seek for resources of interest over the Web. Each
a fundamental issue, in order to deal with information over- resource, usually identified by its URL, can be accessed by
load. Keyword based Web searches are actually the pre- one or more keywords describing its content. The most wide-
ferred mean to seek for contents related to a specific topic. spread methods to explore links between Web resources and
Search engines and collaborative tagging systems make pos- keywords are the exploitation of a search engine or the
sible the search for information thanks to the association of access to a collaborative tagging service (see Figure 1).
descriptive keywords to Web resources. All of them show Search engines like Google, Yahoo, Ask and so on are ex-
problems of inconsistency and consequent reduction of re- amples of automated information extraction systems: they
call and precision of searches, due to polysemy, synonymy analyze the data and the structure of Web contents as well
and in general all the different lexical forms that can be used as the search behaviour of users and the frequency of usage
to refer to a particular meaning. A possible way to face or of different search strings to collect the most appropriate
at least reduce these problems is represented by the intro- keywords that can be used to access a Web resource (see the
duction of semantics to characterize the contents of Web re- lower portion of Figure 1).
sources: each resource is described by one or more concepts On the other side, collaborative tagging systems like deli-
instead of simple and often ambiguous keywords. To support cious, Flickr, YouTube and Technocrati rely upon user con-
these task the availability of a global semantic resource of tribution. They are examples of social classification systems:
reference is fundamental. On the basis of our past experience each person who belongs to the community of users of a col-
with the semantic tagging of Web resources and the SemKey laborative tagging system describes Web resources of inter-
Project, we are developing Tagpedia, a general-domain ”en- est by means of one or more freely chosen keywords, called
cyclopedia” of tags, semantically structured for generating tags. All the tags associated to Web resources are collected
semantic descriptions of contents over the Web, created by and exploitable by every user in order to find many resources
mining Wikipedia. In this paper, starting from an analy- of interest. A popularity value is usually associated to each
sis of the weak points of non-semantic keyword based Web tag describing a Web resource to point out the number of
searches, we introduce our idea of semantic characterization times it has been chosen to characterize that resource and
of Web resources describing the structure and organization consequently the importance of the tag itself among those
of Tagpedia. We introduce our first realization of Tagpedia, related to the specific resource (see the upper portion of
suggesting all the possible improvements that can be carried Figure 1).
out in order to exploit its full potential. Even if they are very popular, keyword based Web
search approaches show many weak points in manag-
Categories and Subject Descriptors ing language expressivity. Many keywords can identify
distinct concepts (polysemy): as a consequence the precision
H.4.m [Information Systems]: Miscellaneous; D.2 [Software]: of search results decreases. Moreover if we don’t search for a
Software Engineering common sense of that keyword, it is often very difficult to ex-
plore the search results space so as to find Web resources of
General Terms interest among those retrieved. For example, let us suppose
that we want to find all the resources dealing with ’ajax’
semantic resource, knowledge organization, semantic web
intended as the Greek hero: choosing ’ajax’ as search text
string, there are no links related to mythology among the
Keywords first 30 search results of Google. If we better specify the
semantics, web, social, wikipedia, data mining search string in order to solve the problem, we partition the
space of relevant search results depending on the particular
Copyright is held by the Authors. Copyright transfered for publishing on- word added to ’ajax’ to disambiguate its meaning. For in-
line and a conference CD ROM.
SWKM’2008: Workshop on Social Web and Knowledge stance, depending on the addition of the word ’hero’ or the
Management @ WWW 2008, April 22, 2008, Beijing, China. word ’mythology’ to ’ajax’ in the search string, considering
.
resources, underlining the need for a general-domain seman-
tic resource of reference in order to support this task, taking
into account also our past experience with the semantic tag-
ging and SemKey. In Section 3 we introduce Tagpedia, the
semantic resource of reference we have created by mining
Wikipedia, explaining its organization and structure (Sub-
section 3.1). In Section 4 we describe how Tagpedia can
be utilized, describing the Tagpedia Web API and showing
all the possible improvements to Tagpedia to exploit its full
potential. Conclusions are described in Section 5.
2. FROM KEYWORDS TO CONCEPTS:
SEMANTIC CHARACTERIZATION
We can solve, or at least substantially reduce, Web re-
sources organization and classification problems by adding
a further level of completeness in their characterization: the
semantics. Instead of relying on post processing of search
results, we can directly semantically describe resources thanks
to their association with one or more properly chosen con-
cepts. In this way we extend the characterization of re-
sources introducing the semantic level: each resource (R) is
described by one or more concepts (C) and in turn each con-
cept can be accessed through one or more keywords (K) (see
Figure 1: Two ways to associate keywords to re- Figure 2). When we search for some informaton of interest,
sources we can better specify our informative needs and we can easily
and effectively access relevant results thanks to the support
and the exploitation of the collection of concepts used to
the first 10 search results shown by Google, only two of them describe Web resources, referred to as semantic resource in
are present in both cases. Besides polysemy, also synonymy what follows.
affects precision and recall of keyword based Web searches.
In fact, when a specific meaning can be accessed through
two or more keywords, the set of search results is differ-
ent depending on the particular keyword chosen. Moreover,
the different level of precision and the many possible users
points of view that can be considered describing a particular
resource, often cause a considerable loss of quality of Web
searches. For a deeper analysis of all the factors that affect
efficiency and effectiveness of keyword based Web search sys-
tems see [4] [10] [12] [25].
In order to face the different drawbacks of the systems just
analyzed, many distinct methods have been applied. The
aggregation of search results from different search
engines and their post elaboration is experimenting a
growing diffusion. Systems like Vivisimo [15], Grokker
[18] and Kartoo [19] are meta search engines. They collect Figure 2: Relations between resources, keywords
search results from other search engines and group them ex- and concepts
ploiting, for example, the category hierarchy of Yahoo and
Wikipedia (Grokker) or creating clusters of similar search This way of improving Web contents organization repre-
results and characterizing each of them by one or more addi- sents an attempt to realize the semantic description of infor-
tional keywords (Vivisimo). They also display search results mation that stands at the basis of the Semantic Web vision.
cartographically through very expressive maps that connect At present there are many proposal of semantic classifi-
the most relevant resources to the most used keywords (Kar- cation methods for Web contents. FolksAnnotation [13],
too). for instance, tries to extract the tags that describe a Web
Also considering tagging systems, we can find many pro- resource from a collaborative tagging system, automatically
posals to better organize search results to improve their qual- mapping them to the corresponding concepts of a prede-
ity and the effectiveness of the search. FolkRank [5] is an fined domain ontology. Such kind of systems usually require
algorithm created to rank search results in a tagging sys- a strongly and well organized ontological frame of reference
tem, calculating a ranking value for each of them and thus that is difficult to realize; they have not provided signifi-
evaluating their relevance. Also user profile is exploited in cant improvements in comparison with the classical keyword
order to adapt ranking calculation to the information needs based methodologies. A different approach is those exploited
of every single user. by systems like Semantic Halo [3]: it improves tag based
The rest of this paper is organized as follows. In Section search systems adding semantic information without relying
2 we describe our idea of semantic characterization of Web on ontologies. Analyzing co-occurrences and frequencies of
tags, Semantic Halo algorithm extracts groups of tags useful 3. TAGPEDIA: A GENERAL DOMAIN SE-
to better specify and drive user search, like more general or MANTIC RESOURCE OF REFERENCE
more specific ones or group of keywords defining a partic-
ular naming of the selected tag. Not enough experimental Starting from the need for a global semantic resource ex-
data on the effectiveness and usefulness of this method to ploitable as a reference to describe Web contents and there-
improve tag based searches is currently available. Summa- fore comprehensive and updated, we have proposed a possi-
rizing, a strong and widespread infrastructure that organizes ble solution to this demand, designing and building Tagpe-
and provides access to Web resources on the basis of seman- dia. It is a semantic organization and classification of
tic classificatory information is still absent. tags, intended as words or in general brief textual
expressions, that people may use to describe Web
Resources. Tagpedia is based on the model of term-
During the first half of 2007, we have tried to realize the concept networks [11], structured ad hoc to support
possibility to semantically describe Web resources develop- the semantic characterization of Web contents and
ing SemKey [4], a semantic collaborative tagging system. initially populated exploiting Wikipedia data. In par-
It extends current tagging systems allowing to character- ticular we have tuned a new way of mining Wikipedia to
ize resources by referring to concepts. Each user can point extract the information needed to build Tagpedia so as to
out and describe Web resources of interest: starting from a support concept based descriptions of Web resources also
freely chosen tag, he can disambiguate it thanks to the sup- through tag disambiguation.
port of Wikipedia [14] and WordNet [17] in order to identify We have chosen Wikipedia as the starting point because
one or more defined concepts. In this way he produces a it represents the most rich and constantly updated
semantic assertion that is the description of a specific fea- encyclopedic reference over the Web with a huge
ture of Web resources through one or more chosen concepts. set of semantic contents included, even if not ex-
Thus we can potentially overcome the limits in the descrip- plicitly exposed and easily accessible. During the last
tion of Web resources related to the complexity of language, few years many studies have been carried out finding new
exploiting their semantic characterization as well as the se- ways to extract useful semantic data exploiting the great
mantic relations between concepts present in WordNet and amount of information contained in Wikipedia. Information
Wikipedia. organizational patterns like infoboxes, internal and external
We have implemented a working prototype of SemKey; links, redirect and disambiguation pages have been analyzed
by analyzing the usage patterns and the semantic classifi- in order to extract valuable data. The DBPedia Project
cation support provided by our system, we have identified [16], for instance, is a relevant attempt to extract semantic
two key factors that need to be improved in order data from Wikipedia, making them available over the Web
to really make possible semantic characterization of complying with Semantic Web standards [6]. DBPedia is a
Web resources, as described in the previous part of this global knowledge base derived from Wikipedia, not specif-
Section. ically intended for Web resources description as Tagpedia
Both Wikipedia and WordNet, even if they show impor- is. In [24] there is a description of KLYN, a system that au-
tant features to support the semantic description of Web re- tonomously semantifies Wikipedia, automatically suggesting
sources, are weakened by relevant lacks. WordNet presents data inconsistencies, lacks or incompletenesses. Wikipedia
a rich set of parts of speech and a strongly structured set has been also successfully exploited to compute semantic re-
of relations between them, but it lacks many data useful to latedness between words [21] and natural language texts [9],
support proper names disambiguation and it is not collab- but also to tune new named entities disambiguation method-
oratively edited. Wikipedia is an encyclopedia so its con- ologies [7] [8]. Semantic relationships between Wikipedia
tent is composed mainly by a very rich set of names along categories have been studied in order to make the search
with their extended descriptions. Thus Wikipedia has strong of information easier and to give articles editors relevant
proper names coverage and it has been proposed as a named suggestions [20]. Moreover some research has been done to
entity disambiguation resource in [7] and [8]; it is also con- understand and measure the way Wikipedia articles are cre-
tinuously updated, but lacks a structured set of relations ated and their contents become mature [22] or to analyze
between the concepts described, even if its documents are statistical information about the growth of the data that
interconnected by a huge number of links and loosely clas- constitute Wikipedia, the types of articles, the editors, the
sified through categories. As a consequence the semantic link and category structure and so on [23].
resources considered are in some way complementary, but
they have been built and structured for purposes different
from the semantic characterization over the Web. In order 3.1 The structure of Tagpedia
to better support this task we need a semantic resource The main aim of Tagpedia is the semantic characteriza-
built and structured ad hoc, which is still absent: it tion of data over the Web. In particular it must allow to
must feature all the advantages of those just analyzed, re- describe a Web resource through the association with one
moving pointless informative contents. or more univocally referenced concepts. Thus, the main
Moreover, a great limit to the usability of SemKey and to constitutional unit of TagPedia is the concept. Each
an easy definition of new semantic metadata is represented concept must be unequivocally identified but also easily ac-
by the different steps users must carry out to compose a se- cessed. The main way to point out a concept is through
mantic assertion. This often discourages them from creating the words that refer to it. Such words will also be called
semantic metadata. Some sort of automation is neces- tags in the following. As a consequence, each concept is
sary in order to speed up the tag disambiguation identified by the set of all the words or, more generally, all
process or to execute it through automated proce- the alfanumeric expressions of any kind that can be adopted
dures. by a community of users to refer to it, thus constituting a
set of synonymous tags or syntag set. Syntag sets are the usually used to manage synonyms, abbreviations, acronyms,
molecules which form Tagpedia. misspellings, other spellings, different punctuations, partic-
ular capitalization rules and so on. In TagPedia we mine
Wikipedia content and extract all the redirect information
analyzing redirect pages; for each of them we enrich the syn-
tag set related to the referred concept by adding the title of
the page as a new tag (in Figure 3, considering the syntag
set 1, the tag ’leo onca’ is extracted from a redirect page).
Moreover, Wikipedia usually manages polysemy through
the disambiguation pages. As said, each disambiguation
page represents a collection of links to all the different ar-
ticle pages that identify the distinct meanings pointed out
by the page title (textual string). For example, the word
’ajax ’ is highly polysemous and has 49 different meanings in
Wikipedia: its disambiguation page contains links to 49 dis-
tinct article pages; each one identifies a particular concept.
We analyze Wikipedia disambiguation pages as a futher
source of information to enrich the syntag sets of Tagpedia
through the addition of new words that refer to a defined
meaning. In particular, for every disambiguation page, we
point out each syntag set related to the concepts referenced
inside its Wikipedia text and we add the title of the same
disambiguation page as a new tag exploitable to access to
the selected syntag sets (in Figure 3, considering the syntag
Figure 3: Three syntag sets set 1, the tag ’panther’ is extracted from a disambiguation
page).
The creation of an initial rich collection of syntag sets is Summarizing, let us define Ci a concept derived from a
the first necessary step that must be carried out to build specific Wikipedia article page Pi . To populate with tags
our semantic resource. Wikipedia shows many features ex- the syntag set for Ci we extract:
ploitable to create such a collection of syntag sets. In partic-
ular, in Wikipedia an article usually defines a specific con- • the title of Pi ;
cept. As a consequence in order to bootstrap Tagpedia, we • the title of every redirect page to Pi ;
create syntag sets from the articles of Wikipedia. In Figure
3 we show three examples of syntag sets made up by tags • the title of every disambiguation page containing a link
collected mining Wikipedia. to Pi .
To be more precise, Wikipedia pages can be substantially
divided into three groups:
• article pages: each describes a particular concept,
identified by the title of the same page;
• redirect pages: each links an alternate literal expres-
sion, that constitutes the title of the redirect page, to
the corresponding concept, usually identified by the
title of an article page;
• disambiguation pages: each lists all the possible
concepts, usually identified through the titles of arti-
cle or redirect pages, that can be referred by the literal
expression constituting the title of the disambiguation
page.
The redirect and the disambiguation page mechanisms are
two important Wikipedia organizational solutions that can
be exploited to build and enrich syntag sets.
Once identified a concept referring to a particular article
page, we create an initial version of a syntag set, pointed out
by a unique identifier, including only the tag corresponding
to the title of the page (in Figure 3, considering the syntag
set 1, the tag ’jaguar’ is the title of an article page). Then
we collect all the words and expressions that may be used Figure 4: The structure of Tagpedia
to refer to that concept.
As previuosly mentioned, in Wikipedia the redirect mech- Starting from a dump of the English version of Wikipedia,
anism is used to link alternate literal expressions to the orig- we have developed a set of C++ routines, that automatically
inal encyclopedic article that describes a specific entity. It is analyze the text of Wikipedia articles. By mining structural
elements of Wikipedia syntax as well as by considering texts Number of del.icio.us URLs: 100
punctuation and by exploiting pattern matching techniques Number of distinct tags: 1087
mainly based on regular expressions and string analysis, our Percentage of successfull disambiguations: 84 %
routines gather all the concepts as well as all the possible
tags used to refer to each single meaning, thus defining a Table 1: Tagpedia tag disambiguation support: pre-
huge collection of syntag sets. The meaning of each concept, liminary evaluation results
identified by a syntag set, is also better specified by pointing
to the corresponding article in Wikipedia.
All these data are collected in a relational database prop- 4. EXPLOITING AND IMPROVING
erly designed and optimized for a fast access. It is consti- TAGPEDIA
tuted by two basic collections: the concept table and the tag
In order to support the generation of semantic descrip-
table. The first one gathers all the concepts of Tagpedia as-
tions of Web resources or to semantically search for Web
signing to each of them a unique identifier, the Concept ID
contents, the information contained in Tagpedia should be
and a brief definition, extracted from the English version of
easily accessed, querying the whole collection of syntag sets.
Wikipedia. For every concept we also collect the URL of
For this purpose we have developed the Tagpedia Web
the corresponding Wikipedia article. On the other side, the
API that is a simple set of procedures that may be invoked
tag table contains links between each concept, referenced
via Web to exploit the semantic support offered by Tagpe-
through its identifier, and all the tags used to access to it.
dia. These procedures carry out few fundamental tasks and
By mining September 2007 dump of the English version
may be composed to realize more complex functions; their
of Wikipedia, we have obtained more than 1,9 millions
execution can be easily requested by other external Web ap-
of syntag sets and more than 4 millions of tags used
plications so as to integrate semantic features.
to point out the intended concepts, each one referencing a
The main tasks that Tagpedia Web API supports are:
specific Wikipedia article (see Figure 4).
Considering Figure 5, we can visualize the weight of the • the definition of all the possible meanings for a given
different sources of the 4.230.740 tags of Tagpedia. The tag, i.e. all the syntag sets that contain the tag;
number of tags extracted from article pages (P ) is equal
to the number of syntag sets, that is 1.927.378. Among • the collection of all the tags belonging to a specific syn-
the 2.303.362 remaining tags, 481.250 have been generated tag set, i.e. all the words or expressions exploitable to
by mining disambiguation pages and 1.822.112 by analyzing access that particular meaning;
redirect ones.
• the retrieval of the short textual description of a specific
syntag set.
Exploiting Tagpedia Web API, we have integrated this
semantic resource into SemKey, our semantic collaborative
tagging system, substituting WordNet and Wikipedia so as
to support the disambiguation of the meaning of tags. Once
chosen one or more tags, the user specifies the right meaning
for each of them, choosing a particular syntag set among
those including the intended tag. An early prototypal Web-
based interface useful to explore and interact with Tapgedia
is accessible at the URL www.tagpedia.org.
In order to evaluate the coverage of Tagpedia and also
to obtain suggestions to improve this semantic resource, we
have tried to manually point out the right meaning of
the tags associated to the 100 most popular Web
resources over del.icio.us, tagged by more than 25000
users. Relying upon Tagpedia Web API, we have developed
a Web based procedure that, starting from the URL of a Web
resource retrieves all the related tags in del.icio.us. All the
possible meanings of each tag are retrieved from Tagpedia
along with their short descriptions and the user manually
verify if the right concept is present. In this way, collecting
all the results of our user based tests, we have obtained a
first evaluation of the disambiguation effectiveness of our
semantic resource. The results are shown in Table 1.
Figure 5: Sources of the tags in Tapgedia
Tagpedia provides a valid support to the process of dis-
ambiguation for 84% of the total number of tags considered.
Anyway we have identified several different ways to im-
prove its contents and, as a consequence, its semantic cover-
This group of syntag sets constitutes the basis of Tag- age and its usefulness. In the following part of this section
pedia providing a way to unequivocally access and refer to we will describe these proposals for future works.
concepts when users must semantically describe or search Despite its good disambiguation coverage, there are dif-
for Web resources. ferent particular tags like ’sem web’, ’inplaceedit’, ’web dev’
and similar ones that are not managed by Tagpedia, be- keywords into concepts or browsing the collection of syntag
cause they are non conventional words, often created by sets constituting Tagpedia without complicating their usual
a user to describe a particular concept and then accepted interaction patterns or compromising the usability of the
and exploited by many others. One possible solution to this systems they interact with. Moreover, automated method-
problem is the introduction of collaborative Web edit- ologies to derive semantic descriptions of Web resources from
ing techniques for Tagpedia contents. Giving users the simple keyword based ones can also be tuned, so as to create
possibility to create new syntag sets or to merge or extend an initial solid collection of semantic metadata and boot-
existing ones through new tags is fundamental for such a strap this new way to characterize resources over the Web.
kind of resource. Indeed the effectiveness of Tagpedia in the
description of Web resources is proportional to the possibil- 5. CONCLUSIONS
ity to adapt and enrich this semantic resource in respect to
In this paper we have presented Tagpedia, a collection of
the variability of user descriptive needs. In this context, the
tags semantically structured, built ad hoc to describe Web
introduction of the possibility to collaboratively collect and
contents.
manage data, following a Wiki-like paradigm, represents a
Starting from a brief analysis of the weak points of key-
key factor of current Web and is a crucial issue considering
word based methodologies for information organization and
Tagpedia.
searching and considering also the current approaches to face
Another aspect of Tagpedia that can be substantially im-
these issues, we have introduced the possibility to semanti-
proved is the enrichment of its semantic contents with
cally describe Web resources through concepts. To make it
the addition of semantic relations between syntag
possible, we have developed an initial version of Tagpedia
sets; they are useful to better identify concepts or to easily
a general domain semantic resource of reference, created by
search for them. Each syntag set, representing a meaning,
mining Wikipedia. After a description of its structure and
may be connected to other ones through relationships like
organization and an overview of the Tagpedia Web API,
specialization, generalization, relatedTo and similar ones.
useful to easily access and exploit the information collected
Possible ways to mine relevan relations between syntag sets
in Tagpedia, we have focused our attention on the possi-
are the analysis of the internal links between Wikipedia ar-
ble improvements to this semantic resource. Collaborative
ticle pages as well as the exploitation of the hierarchy of
wiki authoring, syntag set relations enrichment, automated
Wikipedia categories. For instance, relying upon relations,
procedures for content extraction from external sources, sup-
when we specify the concept to search for or when we must
port for multilinguism and automated generation of seman-
choose a specific concept to semantically characterize a re-
tic descriptions of Web resources are some of the many im-
source, the system can show the most general or the most
provements considered that can be carried out, underlining
specific ones to simplify this task. Similarly, during a se-
its broad enhancement possibilities.
mantic search, starting from a specific syntag set, if we can
On the base of all these considerations, we believe that
browse all the related ones, we can better specify our search
Tagpedia, despite its initial stage of development, represents
needs and thus easily retrieve the desired information.
an important attempt to support the introduction of seman-
A third way to improve and enrich Tagpedia is the def-
tics over the Web, trying to put in practice the principles of
inition of semi-automated procedures to extend its
the Semantic Web on a global scale and to better structure
data, exploiting other resources and importing their
and manage the huge amount of data constituting the actual
contents into Tagpedia. Other relevant free Web thesauri
Web.
or dictionaries or other language tools can be valid sources
of information. For instance the Dictionary of Automotive
Terms [1] or the Free Online Medical Dictionary [2] are two 6. REFERENCES
domain specific resources that can be integrated in Tagpe- [1] Dictionary of automotive terms.
dia. Moreover, mapping rules between Tagpedia syntag sets http://www.motorera.com/dictionary/.
and other Web semantic resources can be defined to inte- [2] Free online medical dictionary.
grate different sources of information thanks to the common http://cancerweb.ncl.ac.uk/omd/.
gronud represented by Tagpedia itself. [3] Alessio Malizia Alan Dix, Stefano Levialdi. Semantic
Another aspect that must be further addressed in Tag- halo for collaboration tagging systems. In the Social
pedia, is the support for multilinguism. In Tapgedia, Navigation and Community-Based Adaptation
each syntag set is language independent. The tags consti- Technologies Workshop - June 20th, 2006, Dublin,
tuting that particular syntag set are specific to the partic- Ireland.
ular language. Managing the possibility to collect different [4] Francesco Ronzano Marco Rosella Salvatore Minutoli
tags belonging to different languages into a syntag set, we Andrea Marchetti, Maurizio Tesconi. Semkey: A
can deal with different languages and once identified one or semantic collaborative tagging system. In the Tagging
more particular concepts we can make language indipendent and Metadata for Social Information Organization
semantic searches. We think that this possibility should be Workshop at the World Wide Web Conference 2007 -
better explored and defined, trying to determine specific se- May 8, 2007, Banff, Alberta, Canada.
mantic search patterns. [5] Christoph Schmitz Gerd Stumme Andreas Hotho,
As already mentioned in the concluding part of Section Robert Jĺaschke. Folkrank: A ranking algorithm for
2, the definition and tuning of automated or semi- folksonomies http://www.kde.cs.uni-kassel.de. In the
automated procedures to create semantic descrip- Lernen - Wissensentdeckung - Adaptivität Workshop -
tions is a further important issue to be faced. Users should October 9-11, 2006, Hildesheim, Germany.
be allowed to semantically describe Web resources in an easy
[6] Soren Auer and Jens Lehmann. What have innsbruck
way; they must be supported in the task of turning simple
and leipzig in common? extracting semantics from
wiki content. In the 4th European Semantic Web knowledge management, pag. 41-50 - November 6-9,
Conference - June 5th, 2007, Innsbruck, Austria. 2007, Lisboa, Portugal.
[7] Razvan Bunescu and Marius Pasca. Using [25] Jianchang Mao Zhichen Xu, Yun Fu and Difu Su.
encyclopedic knowledge for named entity Towards the semantic web: Collaborative tag
disambiguation. In the Proceedings of the 11th suggestions. In the Proceedings of the Collaborative
Conference of the European Chapter of the Association Web Tagging Workshop at the World Wide Web
for Computational Linguistics - April 9-16, 2006, Conference 2006 - May 23-26, 2006, Edinburgh,
Trento, Italy. Scotland.
[8] Silviu Cucerzan. Large-scale named entity
disambiguation based on wikipedia data. In the
Empirical Methods in Natural Language Processing
Conference - June 28-30, 2007, Prague, Czech
Republic.
[9] Evgeniy Gabrilovich and Shaul Markovitch.
Computing semantic relatedness using wikipedia-based
explicit semantic analysis. In the Proceedings of the
20th International Joint Conference on Artificial
Intelligence - January 6-12, 2007, Hyderabad, India.
[10] Scott A. Golder and Bernardo A. Huberman. The
structure of collaborative tagging systems. In the
Journal of Information Sciences, vol. 32, April, pag.
198-208, 2006.
[11] Andrew Gregorowicz and Mark A. Kramer. Mining a
large-scale term-concept network from wikipedia.
Mitre Technical Report, October 2006.
[12] Marieke Guy and Emma Tonkin. Tidying up tags?
D-Lib Magazine, 12, January 2006.
[13] Hugh C. Davis Hend S. Al-Khalifa and Lester Gilbert.
Creating structure from disorder: using folksonomies
to create semantic metadata. In 3rd International
Conference on Web Information Systems and
Technologies - 3-6 March, 2007, Barcelona, Spain.
[14] http://en.wikipedia.org/wiki/. The english version of
wikipedia.
[15] http://vivisimo.com/. Vivisimo, search done right!
[16] http://wiki.dbpedia.org. Dbpedia.
[17] http://wordnet.princeton.edu/. Princeton wordnet.
[18] http://www.grokker.com/. Grokker enterprise search
management.
[19] http://www.kartoo.com/. Kartoo meta-search engine.
[20] Wolfgang Nejdl Sergey Chernov, Tereza Iofciu and
Xuan Zhou. Extracting semantic relationships
between wikipedia categories. In the 1st Workshop on
Semantic Wikis at the 3rd European Semantic Web
Conference - June 11-14, 2006, Budva, Montenegro.
[21] Michael Strube and Simone Paolo Ponzetto.
Wikirelate! computing semantic relatedness using
wikipedia. In the Proceedings of the 45th Annual
Southeast Regional Conference, pag. 106 - 110 - March
23-24, 2007, Winston-Salem, North Carolina, USA.
[22] Cristopher Thomas and Amit P.Sheth. Semantic
convergence of wikipedia articles. In the Proceedings of
Web Intelligence Conference, pag. 600-606 - Silicon
Valley, November 2-5, 2007.
[23] Jakob VoSS. Measuring wikipedia. In the Proceedings
of the 10 th International Conference of the
International Society for Scientometrics and
Informetrics - July 24-28, Stockholm, Sweden.
[24] Fei Wu and Daniel S. Weld. Autonomously
semantifying wikipedia. In the Proceedings of the 16th
ACM conference on Conference on information and