A Web 3.0 Approach for Improving Tagging Systems

          Alexander Kreiser                       Andreas Nauerz                              Fedor Bakalov
          IBM Research and                       IBM Research and                  Friedrich-Schiller University of Jena
             Development                            Development                            Ernst-Abbe-Platz 1-4
        Schönaicher Straße 220                 Schönaicher Straße 220                     07743 Jena, Germany
       71032 Böblingen, Germany                              fedor.bakalov@uni-jena.de
                                              71032 Böblingen, Germany
      akreiser@googlemail.com andreas.nauerz@de.ibm.com
                      Birgitta König-Ries               Martin Welsch
                      Friedrich-Schiller University of Jena             IBM Research and Development
                              Ernst-Abbe-Platz 1-4                         Schönaicher Straße 220
                             07743 Jena, Germany                          71032 Böblingen, Germany
                     birgitta.koenig-ries@uni-jena.de                   martin.welsch@de.ibm.com

ABSTRACT                                                           1.    INTRODUCTION
In Web 2.0 systems tagging has become one of the most pop-         The recent popularity of collaboration techniques on the In-
ular techniques to allow users (and entire user communities)       ternet, particularly tagging and rating, provides new means
to categorize content autonomously. But, current tagging           for both semantically describing web content as well as for
systems have their flipsides, though: synonyms and poly-           reasoning about users’ interests, preferences and contexts.
sems lead to littered tag spaces making it difficult for users     It can add valuable meta information and even lightweight
to find relevant content. Users suffer from retrieving content     semantics to web resources. Tagging allows non-expert users
actually not being of interest or, vice versa, from not retriev-   to develop folksonomies that categorize content available in
ing content that actually would be of interest when explor-        the system.
ing the tag space. Moreover, in current tagging systems no
relations between tags are modeled. Thus, recommending             Unfortunately most current tagging systems have two main
related tags (or content) is not possible.                         drawbacks:
                                                                   First, synonyms and polysems cannot be easily detected au-
In this paper we present an approach allowing users, i.e. the      tomatically and thus litter the tag space. Synonyms lead
community, to collaboratively model relations between tags.        to multiple tags that all can have the same meaning, either
We provide UI components allowing to model these relations         because they are only a morphological variation (apple vs.
which are then stored in a SKOS-based ontology which can           apples) or semantically similar (baby vs. infant). Polysems
be leveraged for content recommendations. Giving the com-          lead to single tags that can have multiple meanings (ap-
munity the power to consolidate tags and to relate tags to         ple can refer to the fruit or to the company Apple). From
each other and, at the same time, storing these relations in       a user perspective the two problems might manifest them-
ontologies is our Web 3.0 approach to solve tag space litter-      selves as follows: A user Alice and a user Bob might both
ing problems and to issue tag-based recommendations.               apply the tag apple to some resources, but Alice might refer
                                                                   to the fruit whereas Bob might refer to the computer man-
The concepts presented are being prototypically implemented        ufacturer. When Bob is later doing information retrieval
within IBM’s WebSphere Portal and can be presented in a            by selecting the tag apple he receives a lot of ”irrelevant
live demo at the workshop.                                         noise” as he is also presented resources that have to do with
                                                                   the computer manufacturer. This problem is referred to as
Categories and Subject Descriptors                                 low precision problem. In the second scenario both users
I.2.4 [Knowledge Representation Formalisms and Meth-               might want to tag resources providing information about
ods]: Predicate logic,Relation systems,Semantic networks;          the United States. Alice might tag these resources with
H.3.3 [Information Search and Retrieval]:                          USA, Bob with United States. When doing information re-
                                                                   trieval Alice, might miss the resources tagged by Bob and
                                                                   Bob might miss the resources tagged by Alice just because
Keywords                                                           the semantic relatedness of these tags remains invisible for
Web 2.0, Web 3.0, Semantic Web, Tagging, Recommender
                                                                   the system. This problem is referred to as low recall prob-
                                                                   lem. Current approaches to solve these problems include the
                                                                   application of normalization and stemming algorithms (cp.
                                                                   e.g. [8]) to prevent littering due to synonyms and the appli-
                                                                   cation of multiple tags to single resources to prevent littering
                                                                   due to polysems. But both approaches have their limits.
                                                                   Second, tags are flat lists of words of uncontrolled vocabular-
                                                                   ies not having any relations. Thus, current tagging systems
                                                                   can hardly recommend users with related content. Being
                                                                   able to recommend users with more generally available con-
cepts (e.g. a user interested in making Spaghetti might be in-   keywords used for tagging while the actual tags then re-
terested in making Pasta in general, too), with more specific    fer to Wikipedia pages. The social bookmarking website
concepts (i.e. recommending information about Spaghetti,         Faviki enables the annotation of bookmarks with DBpedia2
Farfalle, etc. when a user is searching for information about    concepts. The entity describer 3 is an add-on to the tag-
Pasta) or just with related concepts (i.e. recommending in-      ging system of Connotea4 where tags refer to entities from
formation about cat food when a users reads material about       the freebase5 ontology. In [2] Passant and Laublet propose
cats in general) are highly appreciatable features in such       MOAT, a client-server framework where the server provides
large system we deal with.                                       services for term disambiguation and finding matching con-
                                                                 cepts in several public knowledge bases (e.g.: DBpedia, Ya-
In this paper we present a Web 3.0 approach for solving          hoo! Geonames).
the problems just mentioned. First, we allow users, i.e. the
community, to augment tags to make them less ambiguous.          With the second type, the tag ontology is created by the
Second, we allow users to collaboratively model relations        same community that uses it for tagging. That way, the se-
between tags which can then be leveraged for content rec-        mantic data is tailored to a tagging community and kept as
ommendations. We provide the community with UI com-              small as necessary. In [3] Braun et al. describe two proto-
ponents to model these relations which are then stored in        types of semantic tagging systems where the creation of se-
a SKOS-based ontology. Giving the community the power            mantic data is merged into the tagging process: SOBOLEO
to consolidate tags and to relate tags to each other and, at     and the IMAGINATION project. RichTags ([4]) is the name
the same time, storing these relations in ontologies is our      of a social bookmarking system resulting from the master
Web 3.0 approach to solve tag space littering problems and       thesis of Fountopoulos where users build and extend a SKOS
to issue tag-based recommendations.                              ontology collaboratively. In Fuzzzy ([9]) a community cre-
                                                                 ates its own ontology using Topic Maps6 and is then able to
                                                                 bookmark sites with the created topics.
2.     RELATED WORK
Several approaches aim to improve tagging systems by us-
ing semantically rich tags instead of flat keywords. We will     3.    OUR APPROACH
refer to these systems as semantic tagging systems. These        In this section we describe our approach for solving the
approaches can be split into two groups based on the strate-     problems of tagging systems by combining Semantic Web
gies they use:                                                   technology with Web 2.0 interface components. Section 3.1
                                                                 describes our ontology design and how users can provide se-
     1. Semantifying already existing tags of a tagging system   mantic relations between tags. In Section 3.2 we give a brief
     2. Enabling a community to annotate resources with se-      overview over the web interface and its components. Section
        mantically rich tags instead of flat keywords            3.3 shows the current system architecture of our prototype.

With the first strategy flat keywords of existing tagging sys-   3.1   Linked Knowledge Islands
tems (like del.icio.us) are enriched with semantics. This is     We rely on Semantic Web technology regarding the storage
often an automatic process in which tags are mapped to           of tags and relations. SKOS (Simple Knowledge Organiza-
concepts from an ontology or relations between tags are de-      tion System, [6]) is a W3C candidate recommendation for
rived from the folksonomy structure. The system TagOnto          modeling thesauri and loose taxonomies in RDF format and
proposed in [10] automatically maps tags from a social tag-      builts upon OWL7 . Like [3] and [4] we chose the SKOS vo-
ging system to entities in domain ontologies. In [1] tags are    cabulary to store our tag ontology. In our system tags are
also mapped to ontology concepts in order to improve the         instances of skos:Concept, have multiple labels and a defi-
retrieval process and recommend related content. Heymann         nition that helps to disambiguate tags with identical labels.
and Garcia-Molina introduced an algorithm in [7] to trans-       Users are able to apply a small set of semantic relations:
form a set of flat tags into a hierarchical taxonomy. Angele-    skos:broader and skos:narrower to model a loose taxon-
tou et al. ([11]) create semantic relations between tags by      omy and skos:related to create associative links between
leveraging the semantics stored in public ontologies.            tags. Broader and narrower relations can help to solve the
                                                                 abstraction level problem: A search for resources annotated
There were also efforts of conceptualizing and implementing      with Animal should also return resources annotated with
a new kind of tagging systems, where users define the mean-      Cat, Dog, Bird, etc. The related property can be leveraged
ing of a tag when it is applied. Hence, tags are no longer       for tag and content recommendation. This is useful to ex-
keywords but references to entities in semantic repositories.    tend search parameters or reformulating a search query but
In many of these systems Semantic Web technology was used        also to suggest additional tags during the tagging process.
to store the meaning of a tag and semantic relations between
tags. The following two paragraphs describe two types of         We chose SKOS to model the tag ontology since in our opin-
semantic tagging systems and their related work. With the        ion the small predefined set of semantic relations is suitable
first type existing public ontologies were used while with the
                                                                 2
second one the tagging community creates the tag ontology          http://dbpedia.org
                                                                 3
itself.                                                            http://www.entitydescriber.org/
                                                                 4
                                                                   Online reference management for clinicians and scientists,
SemKey - a semantic collaborative tagging system described       http://www.connotea.org/
                                                                 5
in [5] - utilizes Wikipedia and WordNet1 to disambiguate           http://www.freebase.com/
                                                                 6
                                                                   http://topicmaps.org/
1                                                                7
    http://wordnet.princeton.edu/                                  Web Ontology Language, http://www.w3.org/2004/OWL/
for a big community. Users are able to create loose tax-
onomies and general associative links between tags without
having to be experts in ontology engineering. Also, our on-
tology is compatible with other SKOS-based knowledge sys-
tems which enables us to use existing SKOS ontologies as a
base or to cover certain knowledge domains.

Roy Lachica states in [9] that ”users already have a mental
representation of the world and have no need to external-
ize this view by entering their world view into the system.”
This conclusion is drawn from the lack of contribution he
experienced testing the Fuzzzy social bookmarking system
where users collaboratively build an ontology. Another ob-
servation made by Lachica was that in ”...several cases where
users do not agree with tag-relations that have been created
                                                                 Figure 1: Narrower tags are retrieved from a
by others but they take no action to correct it by voting or
                                                                 mapped tag
other means”([9]). Having one collective knowledge model is
a difficult task to achieve since the individual mental models
                                                                 with the Dojo Toolkit8 which interact with the web services
of individual users will always diverge to some degree. Dis-
                                                                 of our system.
agreement on labels and relations used within the model can
be a major issue lowering user experience and motivation to
                                                                 Locating and Disambiguating Tags
participate. In our linked knowledge islands approach users
                                                                 Tags can be found by typing a term into a type-ahead en-
are able to model their private tag space and draw immedi-
                                                                 abled combo box which in turn displays a list of tags with
ate benefits from the semantic relations, e.g. by exploiting
                                                                 matching labels. The user can then navigate through that
the taxonomy and associations to automate steps in their
                                                                 list and obtain additional information like the tag definition
retrieval process.
                                                                 to disambiguate tags with similar or identical labels. For an
                                                                 even faster disambiguation on first sight one of the broader
This is achieved by giving each user the opportunity to struc-
                                                                 tags is displayed in brackets (Figure 2). This process helps
ture and label his tags the way he wants. Additionally, a user
                                                                 the user to locate a suitable tag for her intends and lets
can map any of his tags to tags from other private models
                                                                 her specify the exact meaning of the tag she applies to a
if both tags refer to the same concept from the real world.
                                                                 resource. If no suitable tag can be found the user is free to
This mapping is stored as skos:exactMatch property in the
                                                                 create a tag in her private model by providing a preferred la-
ontology. Everyone in the community can have the benefits
                                                                 bel and a definition. Optionally, other tags that are broader
of the collective knowledge model if he wants, but is also
                                                                 or narrower than or related to the new tag can be declared,
allowed to stay within his own model. Every user has a con-
                                                                 integrating the new tag directly into the taxonomy.
cept scheme (skos:ConceptScheme) which contains all tags
created by him. Concept schemes in SKOS were designed
to aggregate concepts when dealing with data from differ-
ent knowledge organization systems - in our case the differ-
ent mental models of the users. Since the skos:exactMatch
property is symmetric and transitive the separate knowledge
islands are quickly connected to a big network. If e.g. Bob
links his tag Semantic Web in his model to Alice’s tag Web
                                                                     Figure 2: Type-ahead combo box to lookup tags
3.0 in her model he is able to exploit parts of her knowledge
model as well. If Alice has previously linked her tag Web
3.0 to a tag in Carl’s model, Bob can exploit Carl’s model       Providing Semantic Relations
as well and vice versa (Figure 1).                               Within the user interface the semantic tags always have the
                                                                 same visual representation and can be dragged and dropped
                                                                 to perform certain actions. If one tag is dropped onto an-
                                                                 other tag a context menu appears and the user can add a
                                                                 semantic relation between these two tags. For instance, if
3.2   Web 2.0 Graphical User Interface                           a user finds a tag from another user that refers to exactly
The simplicity and flexibility of folksonomies is one of the     the same concept as her tag but with a slightly different la-
main reasons for the success of social tagging systems to-       bel, she can map these two tags using the context menu (see
day. Specifying the semantics of a tag can solve some of         Figure 3). After annotating a resource, the semantic tagging
the problems of tagging systems and therefore improve user       application confirms the tagging and displays semantic tags
experience. But it also means having to extend the tagging       other users have previously applied to this resource. If the
process which results in a work overhead for the user. In        user notices that a tag another user has applied is related to
order to keep this overhead small, we tried to design a user     or about the same thing as one of her tags she can quickly
interface that allows the community to perform the neces-        provide this semantic relation using drag and drop and the
sary tasks fast and easily. One aspect of Web 2.0 is the         context menu. This was one of our main objectives: Mod-
improved usability provided by Rich Internet Applications        eling semantics should be embedded in the tagging process.
(RIA) using AJAX technology and a more desktop-like look
                                                                 8
and feel. Our web interface consists of several widgets built        http://www.dojotoolkit.org/
Furthermore, tags can be dropped into tag bags which are
visual representations of containers that hold a set of tags
(e.g. a user can put his favorite tags into one single tag bag).


Figure 3: Adding semantic relations via drag and
drop

                                                                         Figure 5: Graph visualization of the tag model
Browsing the Taxonomy and Related Tags
In several scenarios it would be valuable to browse the taxon-
omy and relations defined in the tag ontology. For instance,       containing the knowledge base. The whole system archi-
when selecting semantic tags from the ontology in order to         tecture is loosely coupled using web services to access the
tag a resource, the most specific tag should be used instead       individual components. Our choice regarding the semantic
of the most general. By exploring the tag model taxonomy           repository fell on openRDF sesame9 since we considered it
and drilling down into narrower tags the user is able to se-       to be one of the most mature non-commercial RDF repos-
lect the most specific semantic tags for a resource. With the      itories. To enable OWL inferencing (which is necessary for
help of our browsing widget users can explore a visualization      the SKOS properties we use), swiftOWLIM10 was integrated
of the tag model. In the current prototype, the hierarchical       with sesame as Storage And Inference Layer (SAIL). A web
relations of a tag can be explored with a tree view (Fig-          service layer on top of the RDF repository adds convenient
ure 4). Narrower tags are shown as child nodes in the tree         and simplified access to the semantic data which is used by
while broader tags are visualized as parent nodes. Unfortu-        the user interface widgets. Additional, customized SPARQL
nately, the tree visualization has obvious drawbacks; multi-       queries can be run against the repository making it possible
ple broader tags can not directly be visualized and related        for other applications to use the stored semantics.
tags aren’t displayed at all. In our current implementation
this information can be looked up by right-clicking on a tree      Tags and their semantic relations are kept in the repository
node where related and broader tags are listed in a context        while the tagging system just stores URI references to the
menu.                                                              tags. The loose coupling makes it possible to swap the un-
                                                                   derlying tagging system or tagging API which takes care of
                                                                   annotating resources and retrieving resources by tag refer-
                                                                   ence.


   Figure 4: Browsing widget with the tree view


Therefore, we are experimenting with different methods of
tag model visualization. A conceivable replacement for the
tree view could be a simple graph visualization (Figure 5).
Tags are displayed as nodes and straight lines between the
nodes depict semantic relations where each semantic relation
has its own color. By clicking on a node all semantic rela-
tions to directly related tags could be dynamically loaded
and visualized. In that way a user can explore the tag space
visually until she finds the information she needs. Since this              Figure 6: Abstract System Architecture
is a rather uncommon type of interface user experience and
acceptance are still to be evaluated.

Finally, leveraging the knowledge about relations between          4.      CONCLUSION AND FUTURE WORK
tags (and, implicitly, about resources) we can also automat-       In this paper we have described a Web 3.0 approach to col-
ically recommend users with related content as outlined be-        laboratively modeling semantic relations among tags. Our
fore.                                                              approach solves a number of problems associated with the
                                                                   traditional tagging approach that leverages flat lists of unre-
                                                                   lated tags. First, users are enabled to create tag hierarchies
3.3    Architecture
                                                                   9
The current system prototype is being implemented on IBM                http://openrdf.org
                                                                   10
WebSphere Portal technology with a separate component                   http://ontotext.com/owlim/
that fit their mental models, hence better organize content
of their interest. Second, the semantic relations among tags
can be used for improving the information retrieval process
and recommending related content. Finally, our approach
solves the tag disambiguation and tag space littering prob-
lems.

The ideas presented in this paper are being prototypically
implemented in IBM’s WebSphere Portal. Upon completion
of implementation, we plan to evaluate usability of the pro-
posed tagging process as well as the value of improvements
that the collaboratively created tag ontologies provide for
content recommendation. Also in the future we plan to ex-
tend our approach to let users collaboratively create more
semantically rich OWL ontologies.

IBM and WebSphere are trademarks of International Business
Machines Corporation in the United States, other countries or
both. Other company, product and service names may be trade-
marks or service marks of others.


5.   REFERENCES
 [1] Alexandre Passant. Using ontologies to strengthen
     folksonomies and enrich information retrieval in weblogs. In
     Proceedings of the First International Conference on
     Weblogs and Social Media (ICWSM), Boulder, Colorado,
     2007.
 [2] Alexandre Passant and Philippe Laublet. Meaning of a tag:
     A collaborative approach to bridge the gap between tagging
     and linked data. In Proceedings of the WWW 2008
     Workshop Linked Data on the Web (LDOW2008), Beijing,
     China, 2008.
 [3] S. Braun, A. Schmidt, and A. Walter. Ontology maturing:
     A collaborative web 2.0 approach to ontology engineering,
     08.05.2007.
 [4] G. Fountopoulos. RichTags: A Social Semantic Tagging
     System: Dissertation for MSc Web Technology. PhD
     thesis, University of Southampton, Southampton, 20
     December 2007.
 [5] A. Marchetti, M. Tesconi, F. Ronzano, M. Rosella, and
     S. Minutoli. Semkey: a semantic collaborative tagging
     system.
 [6] A. Miles and S. Bechhofer. Skos simple knowledge
     organization system reference, 29.08.2008.
 [7] P. Heymann and H. Garcia-Molina. Collaborative creation
     of communal hierarchical taxonomies in social tagging
     systems, 2006.
 [8] M. F. Porter. An algorithm for suffix stripping. pages
     313–316, 1997.
 [9] R. Lachica and D. Karabeg. Metadata creation in
     socio-semantic tagging systems: Towards holistic knowledge
     creation and interchange. In L.M. Garshol and L. Maicher,
     editors, Scaling Topic Maps. Topic Maps Research and
     Applications 2007, Lecture Notes in Computer Science.
     Springer, 2007.
[10] Silvia Bindelli, Claudio Criscione, Carlo Curino, Mauro L.
     Drago, Davide Eynard, and Giorgio Orsi. Improving search
     and navigation by combining ontologies and social tags. In
     Robert Meersman, Zahir Tari, and Pilar Herrero, editors,
     OTM Workshops, volume 5333 of Lecture Notes in
     Computer Science, pages 76–85. Springer, 2008.
[11] Sofia Angeletou, Marta Sabou, Lucia Specia, and Enrico
     Motta. Bridging the gap between folksonomies and the
     semantic web: An experience report. In Bridging the Gep
     between Semantic Web and Web 2.0 (SemNet 2007), pages
     30–43, 2007.