=Paper= {{Paper |id=Vol-356/paper-3 |storemode=property |title=Tagpedia: a Semantic Reference to Describe and Search for Web Resources |pdfUrl=https://ceur-ws.org/Vol-356/paper3.pdf |volume=Vol-356 |dblpUrl=https://dblp.org/rec/conf/www/RonzanoMT08 }} ==Tagpedia: a Semantic Reference to Describe and Search for Web Resources== https://ceur-ws.org/Vol-356/paper3.pdf
    Tagpedia: a semantic reference to describe and search for
                        Web resources

              Francesco Ronzano                         Andrea Marchetti                        Maurizio Tesconi
            Institute for Informatics and           Institute for Informatics and           Institute for Informatics and
              Telematics (IIT) - CNR                  Telematics (IIT) - CNR                  Telematics (IIT) - CNR
                    Via Moruzzi, 1                          Via Moruzzi, 1                          Via Moruzzi, 1
                      Pisa, Italy                             Pisa, Italy                             Pisa, Italy
          francesco.ronzano@iit.cnr.it andrea.marchetti@cnr.it                           maurizio.tesconi@iit.cnr.it

ABSTRACT                                                               1.    INTRODUCTION: KEYWORD BASED
Nowadays the Web represents a growing collection of an                       SEARCHES
enormous amount of contents where the need for better                        Currently, keyword based Web searches are the preferred
ways to find and organize the available data is becoming                  way to seek for resources of interest over the Web. Each
a fundamental issue, in order to deal with information over-              resource, usually identified by its URL, can be accessed by
load. Keyword based Web searches are actually the pre-                    one or more keywords describing its content. The most wide-
ferred mean to seek for contents related to a specific topic.             spread methods to explore links between Web resources and
Search engines and collaborative tagging systems make pos-                keywords are the exploitation of a search engine or the
sible the search for information thanks to the association of             access to a collaborative tagging service (see Figure 1).
descriptive keywords to Web resources. All of them show                      Search engines like Google, Yahoo, Ask and so on are ex-
problems of inconsistency and consequent reduction of re-                 amples of automated information extraction systems: they
call and precision of searches, due to polysemy, synonymy                 analyze the data and the structure of Web contents as well
and in general all the different lexical forms that can be used           as the search behaviour of users and the frequency of usage
to refer to a particular meaning. A possible way to face or               of different search strings to collect the most appropriate
at least reduce these problems is represented by the intro-               keywords that can be used to access a Web resource (see the
duction of semantics to characterize the contents of Web re-              lower portion of Figure 1).
sources: each resource is described by one or more concepts                  On the other side, collaborative tagging systems like deli-
instead of simple and often ambiguous keywords. To support                cious, Flickr, YouTube and Technocrati rely upon user con-
these task the availability of a global semantic resource of              tribution. They are examples of social classification systems:
reference is fundamental. On the basis of our past experience             each person who belongs to the community of users of a col-
with the semantic tagging of Web resources and the SemKey                 laborative tagging system describes Web resources of inter-
Project, we are developing Tagpedia, a general-domain ”en-                est by means of one or more freely chosen keywords, called
cyclopedia” of tags, semantically structured for generating               tags. All the tags associated to Web resources are collected
semantic descriptions of contents over the Web, created by                and exploitable by every user in order to find many resources
mining Wikipedia. In this paper, starting from an analy-                  of interest. A popularity value is usually associated to each
sis of the weak points of non-semantic keyword based Web                  tag describing a Web resource to point out the number of
searches, we introduce our idea of semantic characterization              times it has been chosen to characterize that resource and
of Web resources describing the structure and organization                consequently the importance of the tag itself among those
of Tagpedia. We introduce our first realization of Tagpedia,              related to the specific resource (see the upper portion of
suggesting all the possible improvements that can be carried              Figure 1).
out in order to exploit its full potential.                                  Even if they are very popular, keyword based Web
                                                                          search approaches show many weak points in manag-
Categories and Subject Descriptors                                        ing language expressivity. Many keywords can identify
                                                                          distinct concepts (polysemy): as a consequence the precision
H.4.m [Information Systems]: Miscellaneous; D.2 [Software]: of search results decreases. Moreover if we don’t search for a
Software Engineering                                                      common sense of that keyword, it is often very difficult to ex-
                                                                          plore the search results space so as to find Web resources of
General Terms                                                             interest among those retrieved. For example, let us suppose
                                                                          that we want to find all the resources dealing with ’ajax’
semantic resource, knowledge organization, semantic web
                                                                          intended as the Greek hero: choosing ’ajax’ as search text
                                                                          string, there are no links related to mythology among the
Keywords                                                                  first 30 search results of Google. If we better specify the
semantics, web, social, wikipedia, data mining                            search string in order to solve the problem, we partition the
                                                                          space of relevant search results depending on the particular
Copyright is held by the Authors. Copyright transfered for publishing on- word added to ’ajax’ to disambiguate its meaning. For in-
line and a conference CD ROM.
SWKM’2008: Workshop on Social Web and Knowledge                           stance, depending on the addition of the word ’hero’ or the
Management @ WWW 2008, April 22, 2008, Beijing, China.                    word ’mythology’ to ’ajax’ in the search string, considering
.
                                                                  resources, underlining the need for a general-domain seman-
                                                                  tic resource of reference in order to support this task, taking
                                                                  into account also our past experience with the semantic tag-
                                                                  ging and SemKey. In Section 3 we introduce Tagpedia, the
                                                                  semantic resource of reference we have created by mining
                                                                  Wikipedia, explaining its organization and structure (Sub-
                                                                  section 3.1). In Section 4 we describe how Tagpedia can
                                                                  be utilized, describing the Tagpedia Web API and showing
                                                                  all the possible improvements to Tagpedia to exploit its full
                                                                  potential. Conclusions are described in Section 5.

                                                                  2.   FROM KEYWORDS TO CONCEPTS:
                                                                       SEMANTIC CHARACTERIZATION
                                                                     We can solve, or at least substantially reduce, Web re-
                                                                  sources organization and classification problems by adding
                                                                  a further level of completeness in their characterization: the
                                                                  semantics. Instead of relying on post processing of search
                                                                  results, we can directly semantically describe resources thanks
                                                                  to their association with one or more properly chosen con-
                                                                  cepts. In this way we extend the characterization of re-
                                                                  sources introducing the semantic level: each resource (R) is
                                                                  described by one or more concepts (C) and in turn each con-
                                                                  cept can be accessed through one or more keywords (K) (see
Figure 1: Two ways to associate keywords to re-                   Figure 2). When we search for some informaton of interest,
sources                                                           we can better specify our informative needs and we can easily
                                                                  and effectively access relevant results thanks to the support
                                                                  and the exploitation of the collection of concepts used to
the first 10 search results shown by Google, only two of them     describe Web resources, referred to as semantic resource in
are present in both cases. Besides polysemy, also synonymy        what follows.
affects precision and recall of keyword based Web searches.
In fact, when a specific meaning can be accessed through
two or more keywords, the set of search results is differ-
ent depending on the particular keyword chosen. Moreover,
the different level of precision and the many possible users
points of view that can be considered describing a particular
resource, often cause a considerable loss of quality of Web
searches. For a deeper analysis of all the factors that affect
efficiency and effectiveness of keyword based Web search sys-
tems see [4] [10] [12] [25].
   In order to face the different drawbacks of the systems just
analyzed, many distinct methods have been applied. The
aggregation of search results from different search
engines and their post elaboration is experimenting a
growing diffusion. Systems like Vivisimo [15], Grokker
[18] and Kartoo [19] are meta search engines. They collect        Figure 2: Relations between resources, keywords
search results from other search engines and group them ex-       and concepts
ploiting, for example, the category hierarchy of Yahoo and
Wikipedia (Grokker) or creating clusters of similar search           This way of improving Web contents organization repre-
results and characterizing each of them by one or more addi-      sents an attempt to realize the semantic description of infor-
tional keywords (Vivisimo). They also display search results      mation that stands at the basis of the Semantic Web vision.
cartographically through very expressive maps that connect           At present there are many proposal of semantic classifi-
the most relevant resources to the most used keywords (Kar-       cation methods for Web contents. FolksAnnotation [13],
too).                                                             for instance, tries to extract the tags that describe a Web
   Also considering tagging systems, we can find many pro-        resource from a collaborative tagging system, automatically
posals to better organize search results to improve their qual-   mapping them to the corresponding concepts of a prede-
ity and the effectiveness of the search. FolkRank [5] is an       fined domain ontology. Such kind of systems usually require
algorithm created to rank search results in a tagging sys-        a strongly and well organized ontological frame of reference
tem, calculating a ranking value for each of them and thus        that is difficult to realize; they have not provided signifi-
evaluating their relevance. Also user profile is exploited in     cant improvements in comparison with the classical keyword
order to adapt ranking calculation to the information needs       based methodologies. A different approach is those exploited
of every single user.                                             by systems like Semantic Halo [3]: it improves tag based
   The rest of this paper is organized as follows. In Section     search systems adding semantic information without relying
2 we describe our idea of semantic characterization of Web        on ontologies. Analyzing co-occurrences and frequencies of
tags, Semantic Halo algorithm extracts groups of tags useful     3.    TAGPEDIA: A GENERAL DOMAIN SE-
to better specify and drive user search, like more general or          MANTIC RESOURCE OF REFERENCE
more specific ones or group of keywords defining a partic-
ular naming of the selected tag. Not enough experimental            Starting from the need for a global semantic resource ex-
data on the effectiveness and usefulness of this method to       ploitable as a reference to describe Web contents and there-
improve tag based searches is currently available. Summa-        fore comprehensive and updated, we have proposed a possi-
rizing, a strong and widespread infrastructure that organizes    ble solution to this demand, designing and building Tagpe-
and provides access to Web resources on the basis of seman-      dia. It is a semantic organization and classification of
tic classificatory information is still absent.                  tags, intended as words or in general brief textual
                                                                 expressions, that people may use to describe Web
                                                                 Resources. Tagpedia is based on the model of term-
   During the first half of 2007, we have tried to realize the   concept networks [11], structured ad hoc to support
possibility to semantically describe Web resources develop-      the semantic characterization of Web contents and
ing SemKey [4], a semantic collaborative tagging system.         initially populated exploiting Wikipedia data. In par-
It extends current tagging systems allowing to character-        ticular we have tuned a new way of mining Wikipedia to
ize resources by referring to concepts. Each user can point      extract the information needed to build Tagpedia so as to
out and describe Web resources of interest: starting from a      support concept based descriptions of Web resources also
freely chosen tag, he can disambiguate it thanks to the sup-     through tag disambiguation.
port of Wikipedia [14] and WordNet [17] in order to identify        We have chosen Wikipedia as the starting point because
one or more defined concepts. In this way he produces a          it represents the most rich and constantly updated
semantic assertion that is the description of a specific fea-    encyclopedic reference over the Web with a huge
ture of Web resources through one or more chosen concepts.       set of semantic contents included, even if not ex-
Thus we can potentially overcome the limits in the descrip-      plicitly exposed and easily accessible. During the last
tion of Web resources related to the complexity of language,     few years many studies have been carried out finding new
exploiting their semantic characterization as well as the se-    ways to extract useful semantic data exploiting the great
mantic relations between concepts present in WordNet and         amount of information contained in Wikipedia. Information
Wikipedia.                                                       organizational patterns like infoboxes, internal and external
   We have implemented a working prototype of SemKey;            links, redirect and disambiguation pages have been analyzed
by analyzing the usage patterns and the semantic classifi-       in order to extract valuable data. The DBPedia Project
cation support provided by our system, we have identified        [16], for instance, is a relevant attempt to extract semantic
two key factors that need to be improved in order                data from Wikipedia, making them available over the Web
to really make possible semantic characterization of             complying with Semantic Web standards [6]. DBPedia is a
Web resources, as described in the previous part of this         global knowledge base derived from Wikipedia, not specif-
Section.                                                         ically intended for Web resources description as Tagpedia
   Both Wikipedia and WordNet, even if they show impor-          is. In [24] there is a description of KLYN, a system that au-
tant features to support the semantic description of Web re-     tonomously semantifies Wikipedia, automatically suggesting
sources, are weakened by relevant lacks. WordNet presents        data inconsistencies, lacks or incompletenesses. Wikipedia
a rich set of parts of speech and a strongly structured set      has been also successfully exploited to compute semantic re-
of relations between them, but it lacks many data useful to      latedness between words [21] and natural language texts [9],
support proper names disambiguation and it is not collab-        but also to tune new named entities disambiguation method-
oratively edited. Wikipedia is an encyclopedia so its con-       ologies [7] [8]. Semantic relationships between Wikipedia
tent is composed mainly by a very rich set of names along        categories have been studied in order to make the search
with their extended descriptions. Thus Wikipedia has strong      of information easier and to give articles editors relevant
proper names coverage and it has been proposed as a named        suggestions [20]. Moreover some research has been done to
entity disambiguation resource in [7] and [8]; it is also con-   understand and measure the way Wikipedia articles are cre-
tinuously updated, but lacks a structured set of relations       ated and their contents become mature [22] or to analyze
between the concepts described, even if its documents are        statistical information about the growth of the data that
interconnected by a huge number of links and loosely clas-       constitute Wikipedia, the types of articles, the editors, the
sified through categories. As a consequence the semantic         link and category structure and so on [23].
resources considered are in some way complementary, but
they have been built and structured for purposes different
from the semantic characterization over the Web. In order        3.1   The structure of Tagpedia
to better support this task we need a semantic resource             The main aim of Tagpedia is the semantic characteriza-
built and structured ad hoc, which is still absent: it           tion of data over the Web. In particular it must allow to
must feature all the advantages of those just analyzed, re-      describe a Web resource through the association with one
moving pointless informative contents.                           or more univocally referenced concepts. Thus, the main
   Moreover, a great limit to the usability of SemKey and to     constitutional unit of TagPedia is the concept. Each
an easy definition of new semantic metadata is represented       concept must be unequivocally identified but also easily ac-
by the different steps users must carry out to compose a se-     cessed. The main way to point out a concept is through
mantic assertion. This often discourages them from creating      the words that refer to it. Such words will also be called
semantic metadata. Some sort of automation is neces-             tags in the following. As a consequence, each concept is
sary in order to speed up the tag disambiguation                 identified by the set of all the words or, more generally, all
process or to execute it through automated proce-                the alfanumeric expressions of any kind that can be adopted
dures.                                                           by a community of users to refer to it, thus constituting a
set of synonymous tags or syntag set. Syntag sets are the           usually used to manage synonyms, abbreviations, acronyms,
molecules which form Tagpedia.                                      misspellings, other spellings, different punctuations, partic-
                                                                    ular capitalization rules and so on. In TagPedia we mine
                                                                    Wikipedia content and extract all the redirect information
                                                                    analyzing redirect pages; for each of them we enrich the syn-
                                                                    tag set related to the referred concept by adding the title of
                                                                    the page as a new tag (in Figure 3, considering the syntag
                                                                    set 1, the tag ’leo onca’ is extracted from a redirect page).
                                                                       Moreover, Wikipedia usually manages polysemy through
                                                                    the disambiguation pages. As said, each disambiguation
                                                                    page represents a collection of links to all the different ar-
                                                                    ticle pages that identify the distinct meanings pointed out
                                                                    by the page title (textual string). For example, the word
                                                                    ’ajax ’ is highly polysemous and has 49 different meanings in
                                                                    Wikipedia: its disambiguation page contains links to 49 dis-
                                                                    tinct article pages; each one identifies a particular concept.
                                                                    We analyze Wikipedia disambiguation pages as a futher
                                                                    source of information to enrich the syntag sets of Tagpedia
                                                                    through the addition of new words that refer to a defined
                                                                    meaning. In particular, for every disambiguation page, we
                                                                    point out each syntag set related to the concepts referenced
                                                                    inside its Wikipedia text and we add the title of the same
                                                                    disambiguation page as a new tag exploitable to access to
                                                                    the selected syntag sets (in Figure 3, considering the syntag
              Figure 3: Three syntag sets                           set 1, the tag ’panther’ is extracted from a disambiguation
                                                                    page).
  The creation of an initial rich collection of syntag sets is         Summarizing, let us define Ci a concept derived from a
the first necessary step that must be carried out to build          specific Wikipedia article page Pi . To populate with tags
our semantic resource. Wikipedia shows many features ex-            the syntag set for Ci we extract:
ploitable to create such a collection of syntag sets. In partic-
ular, in Wikipedia an article usually defines a specific con-          • the title of Pi ;
cept. As a consequence in order to bootstrap Tagpedia, we              • the title of every redirect page to Pi ;
create syntag sets from the articles of Wikipedia. In Figure
3 we show three examples of syntag sets made up by tags                • the title of every disambiguation page containing a link
collected mining Wikipedia.                                              to Pi .
  To be more precise, Wikipedia pages can be substantially
divided into three groups:
   • article pages: each describes a particular concept,
     identified by the title of the same page;
   • redirect pages: each links an alternate literal expres-
     sion, that constitutes the title of the redirect page, to
     the corresponding concept, usually identified by the
     title of an article page;
   • disambiguation pages: each lists all the possible
     concepts, usually identified through the titles of arti-
     cle or redirect pages, that can be referred by the literal
     expression constituting the title of the disambiguation
     page.
  The redirect and the disambiguation page mechanisms are
two important Wikipedia organizational solutions that can
be exploited to build and enrich syntag sets.
  Once identified a concept referring to a particular article
page, we create an initial version of a syntag set, pointed out
by a unique identifier, including only the tag corresponding
to the title of the page (in Figure 3, considering the syntag
set 1, the tag ’jaguar’ is the title of an article page). Then
we collect all the words and expressions that may be used                    Figure 4: The structure of Tagpedia
to refer to that concept.
  As previuosly mentioned, in Wikipedia the redirect mech-            Starting from a dump of the English version of Wikipedia,
anism is used to link alternate literal expressions to the orig-    we have developed a set of C++ routines, that automatically
inal encyclopedic article that describes a specific entity. It is   analyze the text of Wikipedia articles. By mining structural
elements of Wikipedia syntax as well as by considering texts                     Number of del.icio.us URLs:           100
punctuation and by exploiting pattern matching techniques                            Number of distinct tags:         1087
mainly based on regular expressions and string analysis, our        Percentage of successfull disambiguations:          84    %
routines gather all the concepts as well as all the possible
tags used to refer to each single meaning, thus defining a         Table 1: Tagpedia tag disambiguation support: pre-
huge collection of syntag sets. The meaning of each concept,       liminary evaluation results
identified by a syntag set, is also better specified by pointing
to the corresponding article in Wikipedia.
   All these data are collected in a relational database prop-     4.     EXPLOITING AND IMPROVING
erly designed and optimized for a fast access. It is consti-              TAGPEDIA
tuted by two basic collections: the concept table and the tag
                                                                      In order to support the generation of semantic descrip-
table. The first one gathers all the concepts of Tagpedia as-
                                                                   tions of Web resources or to semantically search for Web
signing to each of them a unique identifier, the Concept ID
                                                                   contents, the information contained in Tagpedia should be
and a brief definition, extracted from the English version of
                                                                   easily accessed, querying the whole collection of syntag sets.
Wikipedia. For every concept we also collect the URL of
                                                                   For this purpose we have developed the Tagpedia Web
the corresponding Wikipedia article. On the other side, the
                                                                   API that is a simple set of procedures that may be invoked
tag table contains links between each concept, referenced
                                                                   via Web to exploit the semantic support offered by Tagpe-
through its identifier, and all the tags used to access to it.
                                                                   dia. These procedures carry out few fundamental tasks and
   By mining September 2007 dump of the English version
                                                                   may be composed to realize more complex functions; their
of Wikipedia, we have obtained more than 1,9 millions
                                                                   execution can be easily requested by other external Web ap-
of syntag sets and more than 4 millions of tags used
                                                                   plications so as to integrate semantic features.
to point out the intended concepts, each one referencing a
                                                                      The main tasks that Tagpedia Web API supports are:
specific Wikipedia article (see Figure 4).
   Considering Figure 5, we can visualize the weight of the             • the definition of all the possible meanings for a given
different sources of the 4.230.740 tags of Tagpedia. The                  tag, i.e. all the syntag sets that contain the tag;
number of tags extracted from article pages (P ) is equal
to the number of syntag sets, that is 1.927.378. Among                  • the collection of all the tags belonging to a specific syn-
the 2.303.362 remaining tags, 481.250 have been generated                 tag set, i.e. all the words or expressions exploitable to
by mining disambiguation pages and 1.822.112 by analyzing                 access that particular meaning;
redirect ones.
                                                                        • the retrieval of the short textual description of a specific
                                                                          syntag set.

                                                                      Exploiting Tagpedia Web API, we have integrated this
                                                                   semantic resource into SemKey, our semantic collaborative
                                                                   tagging system, substituting WordNet and Wikipedia so as
                                                                   to support the disambiguation of the meaning of tags. Once
                                                                   chosen one or more tags, the user specifies the right meaning
                                                                   for each of them, choosing a particular syntag set among
                                                                   those including the intended tag. An early prototypal Web-
                                                                   based interface useful to explore and interact with Tapgedia
                                                                   is accessible at the URL www.tagpedia.org.
                                                                      In order to evaluate the coverage of Tagpedia and also
                                                                   to obtain suggestions to improve this semantic resource, we
                                                                   have tried to manually point out the right meaning of
                                                                   the tags associated to the 100 most popular Web
                                                                   resources over del.icio.us, tagged by more than 25000
                                                                   users. Relying upon Tagpedia Web API, we have developed
                                                                   a Web based procedure that, starting from the URL of a Web
                                                                   resource retrieves all the related tags in del.icio.us. All the
                                                                   possible meanings of each tag are retrieved from Tagpedia
                                                                   along with their short descriptions and the user manually
                                                                   verify if the right concept is present. In this way, collecting
                                                                   all the results of our user based tests, we have obtained a
                                                                   first evaluation of the disambiguation effectiveness of our
                                                                   semantic resource. The results are shown in Table 1.
      Figure 5: Sources of the tags in Tapgedia
                                                                      Tagpedia provides a valid support to the process of dis-
                                                                   ambiguation for 84% of the total number of tags considered.
                                                                      Anyway we have identified several different ways to im-
                                                                   prove its contents and, as a consequence, its semantic cover-
  This group of syntag sets constitutes the basis of Tag-          age and its usefulness. In the following part of this section
pedia providing a way to unequivocally access and refer to         we will describe these proposals for future works.
concepts when users must semantically describe or search              Despite its good disambiguation coverage, there are dif-
for Web resources.                                                 ferent particular tags like ’sem web’, ’inplaceedit’, ’web dev’
and similar ones that are not managed by Tagpedia, be-            keywords into concepts or browsing the collection of syntag
cause they are non conventional words, often created by           sets constituting Tagpedia without complicating their usual
a user to describe a particular concept and then accepted         interaction patterns or compromising the usability of the
and exploited by many others. One possible solution to this       systems they interact with. Moreover, automated method-
problem is the introduction of collaborative Web edit-            ologies to derive semantic descriptions of Web resources from
ing techniques for Tagpedia contents. Giving users the            simple keyword based ones can also be tuned, so as to create
possibility to create new syntag sets or to merge or extend       an initial solid collection of semantic metadata and boot-
existing ones through new tags is fundamental for such a          strap this new way to characterize resources over the Web.
kind of resource. Indeed the effectiveness of Tagpedia in the
description of Web resources is proportional to the possibil-     5.   CONCLUSIONS
ity to adapt and enrich this semantic resource in respect to
                                                                     In this paper we have presented Tagpedia, a collection of
the variability of user descriptive needs. In this context, the
                                                                  tags semantically structured, built ad hoc to describe Web
introduction of the possibility to collaboratively collect and
                                                                  contents.
manage data, following a Wiki-like paradigm, represents a
                                                                     Starting from a brief analysis of the weak points of key-
key factor of current Web and is a crucial issue considering
                                                                  word based methodologies for information organization and
Tagpedia.
                                                                  searching and considering also the current approaches to face
   Another aspect of Tagpedia that can be substantially im-
                                                                  these issues, we have introduced the possibility to semanti-
proved is the enrichment of its semantic contents with
                                                                  cally describe Web resources through concepts. To make it
the addition of semantic relations between syntag
                                                                  possible, we have developed an initial version of Tagpedia
sets; they are useful to better identify concepts or to easily
                                                                  a general domain semantic resource of reference, created by
search for them. Each syntag set, representing a meaning,
                                                                  mining Wikipedia. After a description of its structure and
may be connected to other ones through relationships like
                                                                  organization and an overview of the Tagpedia Web API,
specialization, generalization, relatedTo and similar ones.
                                                                  useful to easily access and exploit the information collected
Possible ways to mine relevan relations between syntag sets
                                                                  in Tagpedia, we have focused our attention on the possi-
are the analysis of the internal links between Wikipedia ar-
                                                                  ble improvements to this semantic resource. Collaborative
ticle pages as well as the exploitation of the hierarchy of
                                                                  wiki authoring, syntag set relations enrichment, automated
Wikipedia categories. For instance, relying upon relations,
                                                                  procedures for content extraction from external sources, sup-
when we specify the concept to search for or when we must
                                                                  port for multilinguism and automated generation of seman-
choose a specific concept to semantically characterize a re-
                                                                  tic descriptions of Web resources are some of the many im-
source, the system can show the most general or the most
                                                                  provements considered that can be carried out, underlining
specific ones to simplify this task. Similarly, during a se-
                                                                  its broad enhancement possibilities.
mantic search, starting from a specific syntag set, if we can
                                                                     On the base of all these considerations, we believe that
browse all the related ones, we can better specify our search
                                                                  Tagpedia, despite its initial stage of development, represents
needs and thus easily retrieve the desired information.
                                                                  an important attempt to support the introduction of seman-
   A third way to improve and enrich Tagpedia is the def-
                                                                  tics over the Web, trying to put in practice the principles of
inition of semi-automated procedures to extend its
                                                                  the Semantic Web on a global scale and to better structure
data, exploiting other resources and importing their
                                                                  and manage the huge amount of data constituting the actual
contents into Tagpedia. Other relevant free Web thesauri
                                                                  Web.
or dictionaries or other language tools can be valid sources
of information. For instance the Dictionary of Automotive
Terms [1] or the Free Online Medical Dictionary [2] are two       6.   REFERENCES
domain specific resources that can be integrated in Tagpe-         [1] Dictionary of automotive terms.
dia. Moreover, mapping rules between Tagpedia syntag sets              http://www.motorera.com/dictionary/.
and other Web semantic resources can be defined to inte-           [2] Free online medical dictionary.
grate different sources of information thanks to the common            http://cancerweb.ncl.ac.uk/omd/.
gronud represented by Tagpedia itself.                             [3] Alessio Malizia Alan Dix, Stefano Levialdi. Semantic
   Another aspect that must be further addressed in Tag-               halo for collaboration tagging systems. In the Social
pedia, is the support for multilinguism. In Tapgedia,                  Navigation and Community-Based Adaptation
each syntag set is language independent. The tags consti-              Technologies Workshop - June 20th, 2006, Dublin,
tuting that particular syntag set are specific to the partic-          Ireland.
ular language. Managing the possibility to collect different       [4] Francesco Ronzano Marco Rosella Salvatore Minutoli
tags belonging to different languages into a syntag set, we            Andrea Marchetti, Maurizio Tesconi. Semkey: A
can deal with different languages and once identified one or           semantic collaborative tagging system. In the Tagging
more particular concepts we can make language indipendent              and Metadata for Social Information Organization
semantic searches. We think that this possibility should be            Workshop at the World Wide Web Conference 2007 -
better explored and defined, trying to determine specific se-          May 8, 2007, Banff, Alberta, Canada.
mantic search patterns.                                            [5] Christoph Schmitz Gerd Stumme Andreas Hotho,
   As already mentioned in the concluding part of Section              Robert Jĺaschke. Folkrank: A ranking algorithm for
2, the definition and tuning of automated or semi-                     folksonomies http://www.kde.cs.uni-kassel.de. In the
automated procedures to create semantic descrip-                       Lernen - Wissensentdeckung - Adaptivität Workshop -
tions is a further important issue to be faced. Users should           October 9-11, 2006, Hildesheim, Germany.
be allowed to semantically describe Web resources in an easy
                                                                   [6] Soren Auer and Jens Lehmann. What have innsbruck
way; they must be supported in the task of turning simple
                                                                       and leipzig in common? extracting semantics from
     wiki content. In the 4th European Semantic Web                  knowledge management, pag. 41-50 - November 6-9,
     Conference - June 5th, 2007, Innsbruck, Austria.                2007, Lisboa, Portugal.
 [7] Razvan Bunescu and Marius Pasca. Using                     [25] Jianchang Mao Zhichen Xu, Yun Fu and Difu Su.
     encyclopedic knowledge for named entity                         Towards the semantic web: Collaborative tag
     disambiguation. In the Proceedings of the 11th                  suggestions. In the Proceedings of the Collaborative
     Conference of the European Chapter of the Association           Web Tagging Workshop at the World Wide Web
     for Computational Linguistics - April 9-16, 2006,               Conference 2006 - May 23-26, 2006, Edinburgh,
     Trento, Italy.                                                  Scotland.
 [8] Silviu Cucerzan. Large-scale named entity
     disambiguation based on wikipedia data. In the
     Empirical Methods in Natural Language Processing
     Conference - June 28-30, 2007, Prague, Czech
     Republic.
 [9] Evgeniy Gabrilovich and Shaul Markovitch.
     Computing semantic relatedness using wikipedia-based
     explicit semantic analysis. In the Proceedings of the
     20th International Joint Conference on Artificial
     Intelligence - January 6-12, 2007, Hyderabad, India.
[10] Scott A. Golder and Bernardo A. Huberman. The
     structure of collaborative tagging systems. In the
     Journal of Information Sciences, vol. 32, April, pag.
     198-208, 2006.
[11] Andrew Gregorowicz and Mark A. Kramer. Mining a
     large-scale term-concept network from wikipedia.
     Mitre Technical Report, October 2006.
[12] Marieke Guy and Emma Tonkin. Tidying up tags?
     D-Lib Magazine, 12, January 2006.
[13] Hugh C. Davis Hend S. Al-Khalifa and Lester Gilbert.
     Creating structure from disorder: using folksonomies
     to create semantic metadata. In 3rd International
     Conference on Web Information Systems and
     Technologies - 3-6 March, 2007, Barcelona, Spain.
[14] http://en.wikipedia.org/wiki/. The english version of
     wikipedia.
[15] http://vivisimo.com/. Vivisimo, search done right!
[16] http://wiki.dbpedia.org. Dbpedia.
[17] http://wordnet.princeton.edu/. Princeton wordnet.
[18] http://www.grokker.com/. Grokker enterprise search
     management.
[19] http://www.kartoo.com/. Kartoo meta-search engine.
[20] Wolfgang Nejdl Sergey Chernov, Tereza Iofciu and
     Xuan Zhou. Extracting semantic relationships
     between wikipedia categories. In the 1st Workshop on
     Semantic Wikis at the 3rd European Semantic Web
     Conference - June 11-14, 2006, Budva, Montenegro.
[21] Michael Strube and Simone Paolo Ponzetto.
     Wikirelate! computing semantic relatedness using
     wikipedia. In the Proceedings of the 45th Annual
     Southeast Regional Conference, pag. 106 - 110 - March
     23-24, 2007, Winston-Salem, North Carolina, USA.
[22] Cristopher Thomas and Amit P.Sheth. Semantic
     convergence of wikipedia articles. In the Proceedings of
     Web Intelligence Conference, pag. 600-606 - Silicon
     Valley, November 2-5, 2007.
[23] Jakob VoSS. Measuring wikipedia. In the Proceedings
     of the 10 th International Conference of the
     International Society for Scientometrics and
     Informetrics - July 24-28, Stockholm, Sweden.
[24] Fei Wu and Daniel S. Weld. Autonomously
     semantifying wikipedia. In the Proceedings of the 16th
     ACM conference on Conference on information and