A Web 3.0 Approach for Improving Tagging Systems Alexander Kreiser Andreas Nauerz Fedor Bakalov IBM Research and IBM Research and Friedrich-Schiller University of Jena Development Development Ernst-Abbe-Platz 1-4 Schönaicher Straße 220 Schönaicher Straße 220 07743 Jena, Germany 71032 Böblingen, Germany fedor.bakalov@uni-jena.de 71032 Böblingen, Germany akreiser@googlemail.com andreas.nauerz@de.ibm.com Birgitta König-Ries Martin Welsch Friedrich-Schiller University of Jena IBM Research and Development Ernst-Abbe-Platz 1-4 Schönaicher Straße 220 07743 Jena, Germany 71032 Böblingen, Germany birgitta.koenig-ries@uni-jena.de martin.welsch@de.ibm.com ABSTRACT 1. INTRODUCTION In Web 2.0 systems tagging has become one of the most pop- The recent popularity of collaboration techniques on the In- ular techniques to allow users (and entire user communities) ternet, particularly tagging and rating, provides new means to categorize content autonomously. But, current tagging for both semantically describing web content as well as for systems have their flipsides, though: synonyms and poly- reasoning about users’ interests, preferences and contexts. sems lead to littered tag spaces making it difficult for users It can add valuable meta information and even lightweight to find relevant content. Users suffer from retrieving content semantics to web resources. Tagging allows non-expert users actually not being of interest or, vice versa, from not retriev- to develop folksonomies that categorize content available in ing content that actually would be of interest when explor- the system. ing the tag space. Moreover, in current tagging systems no relations between tags are modeled. Thus, recommending Unfortunately most current tagging systems have two main related tags (or content) is not possible. drawbacks: First, synonyms and polysems cannot be easily detected au- In this paper we present an approach allowing users, i.e. the tomatically and thus litter the tag space. Synonyms lead community, to collaboratively model relations between tags. to multiple tags that all can have the same meaning, either We provide UI components allowing to model these relations because they are only a morphological variation (apple vs. which are then stored in a SKOS-based ontology which can apples) or semantically similar (baby vs. infant). Polysems be leveraged for content recommendations. Giving the com- lead to single tags that can have multiple meanings (ap- munity the power to consolidate tags and to relate tags to ple can refer to the fruit or to the company Apple). From each other and, at the same time, storing these relations in a user perspective the two problems might manifest them- ontologies is our Web 3.0 approach to solve tag space litter- selves as follows: A user Alice and a user Bob might both ing problems and to issue tag-based recommendations. apply the tag apple to some resources, but Alice might refer to the fruit whereas Bob might refer to the computer man- The concepts presented are being prototypically implemented ufacturer. When Bob is later doing information retrieval within IBM’s WebSphere Portal and can be presented in a by selecting the tag apple he receives a lot of ”irrelevant live demo at the workshop. noise” as he is also presented resources that have to do with the computer manufacturer. This problem is referred to as Categories and Subject Descriptors low precision problem. In the second scenario both users I.2.4 [Knowledge Representation Formalisms and Meth- might want to tag resources providing information about ods]: Predicate logic,Relation systems,Semantic networks; the United States. Alice might tag these resources with H.3.3 [Information Search and Retrieval]: USA, Bob with United States. When doing information re- trieval Alice, might miss the resources tagged by Bob and Bob might miss the resources tagged by Alice just because Keywords the semantic relatedness of these tags remains invisible for Web 2.0, Web 3.0, Semantic Web, Tagging, Recommender the system. This problem is referred to as low recall prob- lem. Current approaches to solve these problems include the application of normalization and stemming algorithms (cp. e.g. [8]) to prevent littering due to synonyms and the appli- cation of multiple tags to single resources to prevent littering due to polysems. But both approaches have their limits. Second, tags are flat lists of words of uncontrolled vocabular- ies not having any relations. Thus, current tagging systems can hardly recommend users with related content. Being able to recommend users with more generally available con- cepts (e.g. a user interested in making Spaghetti might be in- keywords used for tagging while the actual tags then re- terested in making Pasta in general, too), with more specific fer to Wikipedia pages. The social bookmarking website concepts (i.e. recommending information about Spaghetti, Faviki enables the annotation of bookmarks with DBpedia2 Farfalle, etc. when a user is searching for information about concepts. The entity describer 3 is an add-on to the tag- Pasta) or just with related concepts (i.e. recommending in- ging system of Connotea4 where tags refer to entities from formation about cat food when a users reads material about the freebase5 ontology. In [2] Passant and Laublet propose cats in general) are highly appreciatable features in such MOAT, a client-server framework where the server provides large system we deal with. services for term disambiguation and finding matching con- cepts in several public knowledge bases (e.g.: DBpedia, Ya- In this paper we present a Web 3.0 approach for solving hoo! Geonames). the problems just mentioned. First, we allow users, i.e. the community, to augment tags to make them less ambiguous. With the second type, the tag ontology is created by the Second, we allow users to collaboratively model relations same community that uses it for tagging. That way, the se- between tags which can then be leveraged for content rec- mantic data is tailored to a tagging community and kept as ommendations. We provide the community with UI com- small as necessary. In [3] Braun et al. describe two proto- ponents to model these relations which are then stored in types of semantic tagging systems where the creation of se- a SKOS-based ontology. Giving the community the power mantic data is merged into the tagging process: SOBOLEO to consolidate tags and to relate tags to each other and, at and the IMAGINATION project. RichTags ([4]) is the name the same time, storing these relations in ontologies is our of a social bookmarking system resulting from the master Web 3.0 approach to solve tag space littering problems and thesis of Fountopoulos where users build and extend a SKOS to issue tag-based recommendations. ontology collaboratively. In Fuzzzy ([9]) a community cre- ates its own ontology using Topic Maps6 and is then able to bookmark sites with the created topics. 2. RELATED WORK Several approaches aim to improve tagging systems by us- ing semantically rich tags instead of flat keywords. We will 3. OUR APPROACH refer to these systems as semantic tagging systems. These In this section we describe our approach for solving the approaches can be split into two groups based on the strate- problems of tagging systems by combining Semantic Web gies they use: technology with Web 2.0 interface components. Section 3.1 describes our ontology design and how users can provide se- 1. Semantifying already existing tags of a tagging system mantic relations between tags. In Section 3.2 we give a brief 2. Enabling a community to annotate resources with se- overview over the web interface and its components. Section mantically rich tags instead of flat keywords 3.3 shows the current system architecture of our prototype. With the first strategy flat keywords of existing tagging sys- 3.1 Linked Knowledge Islands tems (like del.icio.us) are enriched with semantics. This is We rely on Semantic Web technology regarding the storage often an automatic process in which tags are mapped to of tags and relations. SKOS (Simple Knowledge Organiza- concepts from an ontology or relations between tags are de- tion System, [6]) is a W3C candidate recommendation for rived from the folksonomy structure. The system TagOnto modeling thesauri and loose taxonomies in RDF format and proposed in [10] automatically maps tags from a social tag- builts upon OWL7 . Like [3] and [4] we chose the SKOS vo- ging system to entities in domain ontologies. In [1] tags are cabulary to store our tag ontology. In our system tags are also mapped to ontology concepts in order to improve the instances of skos:Concept, have multiple labels and a defi- retrieval process and recommend related content. Heymann nition that helps to disambiguate tags with identical labels. and Garcia-Molina introduced an algorithm in [7] to trans- Users are able to apply a small set of semantic relations: form a set of flat tags into a hierarchical taxonomy. Angele- skos:broader and skos:narrower to model a loose taxon- tou et al. ([11]) create semantic relations between tags by omy and skos:related to create associative links between leveraging the semantics stored in public ontologies. tags. Broader and narrower relations can help to solve the abstraction level problem: A search for resources annotated There were also efforts of conceptualizing and implementing with Animal should also return resources annotated with a new kind of tagging systems, where users define the mean- Cat, Dog, Bird, etc. The related property can be leveraged ing of a tag when it is applied. Hence, tags are no longer for tag and content recommendation. This is useful to ex- keywords but references to entities in semantic repositories. tend search parameters or reformulating a search query but In many of these systems Semantic Web technology was used also to suggest additional tags during the tagging process. to store the meaning of a tag and semantic relations between tags. The following two paragraphs describe two types of We chose SKOS to model the tag ontology since in our opin- semantic tagging systems and their related work. With the ion the small predefined set of semantic relations is suitable first type existing public ontologies were used while with the 2 second one the tagging community creates the tag ontology http://dbpedia.org 3 itself. http://www.entitydescriber.org/ 4 Online reference management for clinicians and scientists, SemKey - a semantic collaborative tagging system described http://www.connotea.org/ 5 in [5] - utilizes Wikipedia and WordNet1 to disambiguate http://www.freebase.com/ 6 http://topicmaps.org/ 1 7 http://wordnet.princeton.edu/ Web Ontology Language, http://www.w3.org/2004/OWL/ for a big community. Users are able to create loose tax- onomies and general associative links between tags without having to be experts in ontology engineering. Also, our on- tology is compatible with other SKOS-based knowledge sys- tems which enables us to use existing SKOS ontologies as a base or to cover certain knowledge domains. Roy Lachica states in [9] that ”users already have a mental representation of the world and have no need to external- ize this view by entering their world view into the system.” This conclusion is drawn from the lack of contribution he experienced testing the Fuzzzy social bookmarking system where users collaboratively build an ontology. Another ob- servation made by Lachica was that in ”...several cases where users do not agree with tag-relations that have been created Figure 1: Narrower tags are retrieved from a by others but they take no action to correct it by voting or mapped tag other means”([9]). Having one collective knowledge model is a difficult task to achieve since the individual mental models with the Dojo Toolkit8 which interact with the web services of individual users will always diverge to some degree. Dis- of our system. agreement on labels and relations used within the model can be a major issue lowering user experience and motivation to Locating and Disambiguating Tags participate. In our linked knowledge islands approach users Tags can be found by typing a term into a type-ahead en- are able to model their private tag space and draw immedi- abled combo box which in turn displays a list of tags with ate benefits from the semantic relations, e.g. by exploiting matching labels. The user can then navigate through that the taxonomy and associations to automate steps in their list and obtain additional information like the tag definition retrieval process. to disambiguate tags with similar or identical labels. For an even faster disambiguation on first sight one of the broader This is achieved by giving each user the opportunity to struc- tags is displayed in brackets (Figure 2). This process helps ture and label his tags the way he wants. Additionally, a user the user to locate a suitable tag for her intends and lets can map any of his tags to tags from other private models her specify the exact meaning of the tag she applies to a if both tags refer to the same concept from the real world. resource. If no suitable tag can be found the user is free to This mapping is stored as skos:exactMatch property in the create a tag in her private model by providing a preferred la- ontology. Everyone in the community can have the benefits bel and a definition. Optionally, other tags that are broader of the collective knowledge model if he wants, but is also or narrower than or related to the new tag can be declared, allowed to stay within his own model. Every user has a con- integrating the new tag directly into the taxonomy. cept scheme (skos:ConceptScheme) which contains all tags created by him. Concept schemes in SKOS were designed to aggregate concepts when dealing with data from differ- ent knowledge organization systems - in our case the differ- ent mental models of the users. Since the skos:exactMatch property is symmetric and transitive the separate knowledge islands are quickly connected to a big network. If e.g. Bob links his tag Semantic Web in his model to Alice’s tag Web Figure 2: Type-ahead combo box to lookup tags 3.0 in her model he is able to exploit parts of her knowledge model as well. If Alice has previously linked her tag Web 3.0 to a tag in Carl’s model, Bob can exploit Carl’s model Providing Semantic Relations as well and vice versa (Figure 1). Within the user interface the semantic tags always have the same visual representation and can be dragged and dropped to perform certain actions. If one tag is dropped onto an- other tag a context menu appears and the user can add a semantic relation between these two tags. For instance, if 3.2 Web 2.0 Graphical User Interface a user finds a tag from another user that refers to exactly The simplicity and flexibility of folksonomies is one of the the same concept as her tag but with a slightly different la- main reasons for the success of social tagging systems to- bel, she can map these two tags using the context menu (see day. Specifying the semantics of a tag can solve some of Figure 3). After annotating a resource, the semantic tagging the problems of tagging systems and therefore improve user application confirms the tagging and displays semantic tags experience. But it also means having to extend the tagging other users have previously applied to this resource. If the process which results in a work overhead for the user. In user notices that a tag another user has applied is related to order to keep this overhead small, we tried to design a user or about the same thing as one of her tags she can quickly interface that allows the community to perform the neces- provide this semantic relation using drag and drop and the sary tasks fast and easily. One aspect of Web 2.0 is the context menu. This was one of our main objectives: Mod- improved usability provided by Rich Internet Applications eling semantics should be embedded in the tagging process. (RIA) using AJAX technology and a more desktop-like look 8 and feel. Our web interface consists of several widgets built http://www.dojotoolkit.org/ Furthermore, tags can be dropped into tag bags which are visual representations of containers that hold a set of tags (e.g. a user can put his favorite tags into one single tag bag). Figure 3: Adding semantic relations via drag and drop Figure 5: Graph visualization of the tag model Browsing the Taxonomy and Related Tags In several scenarios it would be valuable to browse the taxon- omy and relations defined in the tag ontology. For instance, containing the knowledge base. The whole system archi- when selecting semantic tags from the ontology in order to tecture is loosely coupled using web services to access the tag a resource, the most specific tag should be used instead individual components. Our choice regarding the semantic of the most general. By exploring the tag model taxonomy repository fell on openRDF sesame9 since we considered it and drilling down into narrower tags the user is able to se- to be one of the most mature non-commercial RDF repos- lect the most specific semantic tags for a resource. With the itories. To enable OWL inferencing (which is necessary for help of our browsing widget users can explore a visualization the SKOS properties we use), swiftOWLIM10 was integrated of the tag model. In the current prototype, the hierarchical with sesame as Storage And Inference Layer (SAIL). A web relations of a tag can be explored with a tree view (Fig- service layer on top of the RDF repository adds convenient ure 4). Narrower tags are shown as child nodes in the tree and simplified access to the semantic data which is used by while broader tags are visualized as parent nodes. Unfortu- the user interface widgets. Additional, customized SPARQL nately, the tree visualization has obvious drawbacks; multi- queries can be run against the repository making it possible ple broader tags can not directly be visualized and related for other applications to use the stored semantics. tags aren’t displayed at all. In our current implementation this information can be looked up by right-clicking on a tree Tags and their semantic relations are kept in the repository node where related and broader tags are listed in a context while the tagging system just stores URI references to the menu. tags. The loose coupling makes it possible to swap the un- derlying tagging system or tagging API which takes care of annotating resources and retrieving resources by tag refer- ence. Figure 4: Browsing widget with the tree view Therefore, we are experimenting with different methods of tag model visualization. A conceivable replacement for the tree view could be a simple graph visualization (Figure 5). Tags are displayed as nodes and straight lines between the nodes depict semantic relations where each semantic relation has its own color. By clicking on a node all semantic rela- tions to directly related tags could be dynamically loaded and visualized. In that way a user can explore the tag space visually until she finds the information she needs. Since this Figure 6: Abstract System Architecture is a rather uncommon type of interface user experience and acceptance are still to be evaluated. Finally, leveraging the knowledge about relations between 4. CONCLUSION AND FUTURE WORK tags (and, implicitly, about resources) we can also automat- In this paper we have described a Web 3.0 approach to col- ically recommend users with related content as outlined be- laboratively modeling semantic relations among tags. Our fore. approach solves a number of problems associated with the traditional tagging approach that leverages flat lists of unre- lated tags. First, users are enabled to create tag hierarchies 3.3 Architecture 9 The current system prototype is being implemented on IBM http://openrdf.org 10 WebSphere Portal technology with a separate component http://ontotext.com/owlim/ that fit their mental models, hence better organize content of their interest. Second, the semantic relations among tags can be used for improving the information retrieval process and recommending related content. Finally, our approach solves the tag disambiguation and tag space littering prob- lems. The ideas presented in this paper are being prototypically implemented in IBM’s WebSphere Portal. Upon completion of implementation, we plan to evaluate usability of the pro- posed tagging process as well as the value of improvements that the collaboratively created tag ontologies provide for content recommendation. Also in the future we plan to ex- tend our approach to let users collaboratively create more semantically rich OWL ontologies. IBM and WebSphere are trademarks of International Business Machines Corporation in the United States, other countries or both. Other company, product and service names may be trade- marks or service marks of others. 5. REFERENCES [1] Alexandre Passant. Using ontologies to strengthen folksonomies and enrich information retrieval in weblogs. In Proceedings of the First International Conference on Weblogs and Social Media (ICWSM), Boulder, Colorado, 2007. [2] Alexandre Passant and Philippe Laublet. Meaning of a tag: A collaborative approach to bridge the gap between tagging and linked data. In Proceedings of the WWW 2008 Workshop Linked Data on the Web (LDOW2008), Beijing, China, 2008. [3] S. Braun, A. Schmidt, and A. Walter. Ontology maturing: A collaborative web 2.0 approach to ontology engineering, 08.05.2007. [4] G. Fountopoulos. RichTags: A Social Semantic Tagging System: Dissertation for MSc Web Technology. PhD thesis, University of Southampton, Southampton, 20 December 2007. [5] A. Marchetti, M. Tesconi, F. Ronzano, M. Rosella, and S. Minutoli. Semkey: a semantic collaborative tagging system. [6] A. Miles and S. Bechhofer. Skos simple knowledge organization system reference, 29.08.2008. [7] P. Heymann and H. Garcia-Molina. Collaborative creation of communal hierarchical taxonomies in social tagging systems, 2006. [8] M. F. Porter. An algorithm for suffix stripping. pages 313–316, 1997. [9] R. Lachica and D. Karabeg. Metadata creation in socio-semantic tagging systems: Towards holistic knowledge creation and interchange. In L.M. Garshol and L. Maicher, editors, Scaling Topic Maps. Topic Maps Research and Applications 2007, Lecture Notes in Computer Science. Springer, 2007. [10] Silvia Bindelli, Claudio Criscione, Carlo Curino, Mauro L. Drago, Davide Eynard, and Giorgio Orsi. Improving search and navigation by combining ontologies and social tags. In Robert Meersman, Zahir Tari, and Pilar Herrero, editors, OTM Workshops, volume 5333 of Lecture Notes in Computer Science, pages 76–85. Springer, 2008. [11] Sofia Angeletou, Marta Sabou, Lucia Specia, and Enrico Motta. Bridging the gap between folksonomies and the semantic web: An experience report. In Bridging the Gep between Semantic Web and Web 2.0 (SemNet 2007), pages 30–43, 2007.