Collaboratively building structured knowledge with DBin: from del.icio.us tags to an “RDFS Folksonomy" Giovanni Tummarello Christian Morbidoni DERI Galway SEMEDIA National University of Ireland Università Politecnica delle Marche, Ancona, Italy +(353) 091 495285 +(39) 071 2204841 g.tummarello@gmail.com c.morbidoni@deit.univpm.it ABSTRACT participate into Semantic Web communities (from here referred to DBin is a Semantic Web application that enables groups of users as “regular” users) or might want to start up and/or maintaining with a common interest to cooperatively create semantically them (power users). To participate means to be able to structured knowledge bases. These user groups, which we call cooperatively build the community shared semantic knowledge. “Semantic Web Communities”, are made possible by creating The power user starts up a new community by first creating a customized user environments called “Brainlets”. Brainlets customized user environment for the editing and exploitation of provide user interfaces and domain specific tools (e.g. querying, semantically structured annotations. These environments are viewing and editing facilities) which enable community called Brainlets. participants to interact with the data of interest. Brainlets are directly created by domain experts using an XML description 2.1 Brainlets Brainlets [1], are plug-ins in the DBin platform (therefore, language. DBin clients communicate and exchange annotations technically Eclipse Plug-ins) and can be though as “configuration using a P2P infrastructure. Access control and digital signatures packages” preparing the client application to operate on a specific put by DBin inside the authored RDF enable trust and information domain (e.g. Wine lovers, Italian Opera fans etc.). From the user filtering. In this paper we show a specific use case where a perspective, the relationship between Brainlets and the DBin “Semantic Web Community” is created to enable a group of users platform is similar to that between HTML and a Web Browser. to share their del.icio.us tags and organize them into a Much like HTML web sites, Brainlets are created in XML and cooperatively built RDFS ontology. RDF and do not require any programming skills. They customize aspects such as: Keywords Semantic Web, Tags, Ontology creation, DBin, peer-to-peer.  The ontologies to be used for supporting knowledge creation and presentation of data; 1. DBIN PLATFORM OVERVIEW  GUI Layout and coordination. Widgets are first “instantiated” from a rich set of predefined ones and The DBin project is an integrated, end-user oriented, Semantic then configured for the domain of interest, e.g., an Web Platform. More in detail, it is a Semantic Personal ontology navigator might be configured to show certain Knowledge Manager (Semantic PKM) with the following main classes or instances and to hide others. The components features: are then interlinked among each other; this means that  Based on the Semantic Web languages stack chains of reactions to actions such as a focus change can  Topic independent, yet customizable to be domain be defined; specific.  Templates for domain specific annotations (e.g. a  Ontology based reasoning used whenever possible for “Movie Brainlet” might have a “Review” template, with assisting the user (e.g. automatic rich user interface associated slots, that users can fill); creation) in visualizing, editing and browsing data;  Templates for readily available “pre-cooked” domain  Works as personal information manager and is run in a queries, which are structurally complex domain queries local desktop environment. with only a few simple free parameters (e.g. “give me the name of the cinemas where a movie of genre X is  Using a P2P algorithm, it can synchronize aspects of the being shown tonight”); local knowledge with that of other online DBin users.  A trust model and information filtering rules for the  Is not a programmer toolkit. Most customizations can be domain (e.g. public keys of well known “founding done using XML scripting languages and ontologies. members” or authorities, preset “browsing levels”);  Rich client multiplatform software. Based on the  Scripts for guiding the user in creating new URIs for Eclipse RCP, enjoys its plug-in system. domain resources (e.g. adding a new "paper" to the knowledge base);  Scripts connected to Brainlet specific menus or buttons 2. SEMANTIC WEB COMMUNITIES: THE that implement domain specific functions; USER EXPERIENCE  Support material, customized icons, help files etc.; In this section we present the overall user interaction model as implemented by the DBin platform. Users might simply want to  Optionally Brainlets might contain support to Java code possible for a given resource. and libraries for add on capabilities beyond those provided by the standard Brainlet widgets; 2.2 The overall scenario Once Brainlets have been created by power users, they are  A basic RDF knowledge package. installed by the regular users into their local DBin client. Brainlets are downloadable files and as such they can be made available at a Web site by their creator. DBin itself, however, provides a mechanism for discovering new Brainlets as the user is browsing the P2P channels; as a user join a channel which was created for the users of a specific Brainlet, DBin will optionally guide the user to the Brainlet download and installation. The overall scenario is depicted in Figure 3. On top of what has been illustrated in the previous section, Brainlets also have roles in how a user can connect to the others. In particular, a Brainlet contains pointers to P2P channels which are either known to contain information pertaining to the domain of interest or that the power user has previously created for this purpose. Creating a P2P channel for a specific topic is a simple operation that has to be performed on the configuration of an RDFGrowth server. RDFGrowth servers act as “meeting point” for the DBin clients but do not carry themselves metadata or binary attachments. Binary attachments are stored by DBin automatically in a web Figure 2. A Brainlet as experienced by an end user. The accessible space. This is done by DBin interfacing with a web Semantic aware widget are positioned and made to publishing system much similar to WebDAV1 which we call interoperate by the Brainlet configuration. “Data Publishing Service” (DPS). Unlike WebDAV, a DBin publishing service is a simple PHP script and, as such, it can be deployed with ease in most low cost commercial web hosting To the end user, most of the above aspects are simply hidden environments. For the end user convenience, the DBin platform behind the integrated Brainlets UI which presents itself, for comes with a default DPS setting2. The same Data Publishing example, as shown in Figure 1 (ESWC Budva Brainlet). mechanism provides the DBin users with the ability to create and It is important to notice that the Brainlet UI is not simply a mash publish RSS feeds and RDF dumps derived from the internal up of visualizers. As the components are coordinated among each knowledge. other, the result is that a Brainlet guides the user into a The Brainlet provides for a domain specific user interface as it meaningful and domain specific workflow interaction with the instantiates and positions RDF aware widgets which are structured data. At any time, the domain ontologies are used as connected together to create an application workflow. It is much as possible for assisting users in editing and browsing important however to notice that they do not “take over” the knowledge, for example to suggest which kind of annotations are individual installations; many Brainlets can coexist as needed. 1 http://www.webdav.org/ 2 Which Figure 3 DBin and its relationship with different actors uses in the our installation "Semantic of the Data Publishing Service Web Communities" located at http://public.dbin.org 2.3 The RDFGrowth P2P algorithm opinions about the tools, pointing at web tutorials or at web sites In this section we quickly overview the basic ideas and principles that use specific technologies. On top of pure metadata, behind the RDFGrowth P2P metadata exchange algorithm, refer annotations, they can also point at rich media posted on the web to [2] for a complete description of the algorithm. (e.g. pictures, documents, long texts, etc.). Other users who receive such annotations in the group can then reply or further Unlike previous approaches, which have explored P2P annotate each of these for their personal use or into public interactions among peers based on distributed queries, collecting knowledge. and returning results, as in works like [3], [4], [5] and [6], RDFGrowth operates in a “greedy” and uncommitted scenario As mentioned earlier, the operator that selects which resources a where cooperation between peers is minimal. It operates by direct client shares with the others is the GUED. A GUED for the Web queries that are in general of fixed computational cost. Without development community might contain queries such as “all the going into details, the algorithm provides synchronization of RDF resources of type WebTechnology”, with respect to a specific knowledge among the user’s DBin installations. Such ontology, chosen or developed by the community’s creator, where synchronization is not performed in full, but along “aspects” of the class WebTechnology is defined. Only the metadata involving knowledge; it is restricted to those RDF triples which are very resources that fit this definition of 'common interest' are made closely connected with a set of URIs defined “interesting” by a available by a peer to the others in the community. In this case community “banner”. The P2P community creator, usually the such metadata would be for example statements like “Web site X same person who created the Brainlet, defines an “URI interest uses web technology Y” or “Web page X deals with issues in banner”, that we call Group URIs Exposing Definition (GUED), using technology Y”. usually queries which have as a result a list of URIs. An example Users like Bob (marked B) have interests, which go beyond those of GUED can be “select all resources of type Papers which have of a single community. In this example Bob is interested in topic Semantic Web”. Upon joining a community, a peer runs developing a collaborative tagging application, so he joins both such queries to select the local set of resources about which the ‘Web development’ community and the ‘collaborative knowledge will be synchronized with that of the other systems’ one, thus being able to import into his own DBin participants. metadata coming from the two sources. At this point Bob is able At user interaction level, DBin shows an interface that is to make joint queries across the two domains, e.g. “which are the somehow similar to that of popular file-sharing software. A list of technologies on which existing collaborative systems are based servers is presented and, upon selecting one, the list of semantic on”. Finally, Carole (marked C), is a Semantic Web researcher, so P2P channels is displayed for the user to join. Furthermore, an she might decide to join all the communities as they all contain access control mechanism allows for restricted P2P groups. information which might be useful for her research activity. The interconnection between Semantic Web Communities can be 3. INTERACTION AMONG seen also under a second, very novel point of view. If COMMUNITIES Communities share identifiers (e.g. their own URLs for available It is interesting to see how multiple Semantic Web Communities web applications, URLs of their specification for web relate both to each other and to the individual user. technologies) then an annotation (e.g. web site X is based on technology Y), originally posted in one community is Figure 4 shows a possible use case where each user participates in automatically cross posted to the other community since the URI one or more communities with different topics of interest. is of interest to both (belongs to the GUED of both groups). This aspect, to our opinion, represents a particularly novel feature of Semantic Web Communities as a communication mean. Information in fact flows across group boundaries when it is in fact relevant to the users participating in the different communities. This is opposed to what happens with traditional means such as mailing lists, web forums or newsgroups where information, arguably, has to be manually cross-posted. 4. THE DEL.ICIO.US BRAINLET The tagging paradigm is increasingly been adopted by people for organizing web resources they visit. Systems like del.icio.us3 allows to associate simple keywords to web resources while the user is navigating the Web. However, such applications only allow annotations to be a flat list of terms, while it would be obviously useful to organize them in taxonomies or establish Figure 4 An example of users participating in multiple relations among them and possibly with existing ontologies. In communities this section we illustrate the del.icio.us Brainlet, that deals with this issue. To think of a specific use case let us consider a group of Users in groups such as that of Alice (marked A in the colleagues, each one using del.icio.us to tag web articles and illustration) are Web developers. Within their community the resources of interest for their work. They also use a knowledge resources of interest are, for example, available web technologies and tools (such as PHP, Ajax, JSP, etc.). Participants in the community annotate such resources for example expressing 3 http://del.icio.us management application (such as DBin) for cooperating and The screenshots shows this Brainlet in action. In Figure 5, the organizing internal documents. It is likely that a subset of the tags ontology view visualizes the taxonomy of the classes and provides they created in their del.icio.us accounts will be conceptually functionalities to add new classes and subclasses as items of the related to or equivalent to some concepts present in the domain tree, while the tags view shows the flat list of tags and gives the ontology. Using the DBin del.icio.us Brainlet it is possible to capability to update such a list from a del.icio.us account. Once a import lists of tags into the local RDF store, transform them into tag has been selected, Web pages which have been marked with ontology classes and insert them in the class hierarchy. that tag are listed in the related bookmarks view and the their content can be displayed in the browser view. Upon selecting a tag (e.g. J2EE), a “transform into a sub-class” action is available to state that a tag is a sub-concept of a class in the ontology (e.g. Technology). This results in a new class being added to the ontology. As shown in Figure 6, when the user selects the class Technology, the web pages tagged with ‘J2EE’ are displayed in the right view, as such a tag has been stated to be a specification of the concept of technology. The tags, as well as the pages and the other ontological terms, can then be annotates as any RDF resource in DBin. This enables annotations with comments, binary attachments, votes and any kind of structured annotation as defined by the Ontologies. 5. REFERENCES [1] Tummarello, G., Morbidoni, C., Nucci, M. and Panzarino, O. Brainlets: "instant" Semantic Web applications. In Proceedings of the 2nd Workshop on Scripting for the Semantic Web at the Figure 5. Upon selecting a tag the related bookmarks are listed European Semantic Web Conference (Budva, Montenegro, 2006) and each of them can be visualized in the embedded browser. [2] Tummarello, G., Morbidoni, C., Petersson, J., Puliti, P., Piazza, F. RDFGrowth, a P2P annotation exchange algorithm for scalable Semantic Web applications. 1st International Workshop on Peer-to-Peer and Knowledge Management (Boston, USA, 2004) [3] Nejdl, W., Wolf, B., Qu, C., Decker, S., Sintek, M., Naeve, A., Nilsson, M., Palmer, M. and Risch, T. EDUTELLA: A P2P Networking Infrastructure Based on RDF. In Proceedings of the International World Wide Web Conference (Honolulu, Hawaii, 2002) [4] Cai, M. and Frank, M. RDFPeers: A Scalable Distributed RDF Repository based on A Structured Peer-to-Peer Network. In Proceedings of the 13th International World Wide Web Conference (New York, USA, 2004) [5] Nejdl, W., Siberski, W., Wolpers, M., Lser, A. and Bruckhorst, I. SuperPeer Based Routing and Clustering Strategies Figure 6. The JSEE tag has been identified as a sub-class of for RDF Based Peer-To-Peer Networks. In Proceedings of the 12th the Technology class, that automatically inherits the relation International World Wide Web Conference (Budapest, Hungary, with the web resources tagged with JSEE. 2003) [6] Chirita, P. A., Idreos, S., Koubarakis, M. and Nejdl, W. Publish/Subscribe for RDF-based P2P Networks. In Proceedings By using the DBin P2P capabilities, such process is cooperative of the 1st European Semantic Web Symposium (Heraklion, across the team. If necessary, DBin digital signature infrastructure Greece, 2004) would enable each team member to apply filters to see only contributions from certain members.