=Paper=
{{Paper
|id=Vol-273/paper-17
|storemode=property
|title=Collaboratively building structured knowledge with DBin: from del.icio.us tags to an “RDFS Folksonomy"
|pdfUrl=https://ceur-ws.org/Vol-273/paper_99.pdf
|volume=Vol-273
|dblpUrl=https://dblp.org/rec/conf/www/TummarelloM07
}}
==Collaboratively building structured knowledge with DBin: from del.icio.us tags to an “RDFS Folksonomy"==
Collaboratively building structured knowledge with DBin:
from del.icio.us tags to an “RDFS Folksonomy"
Giovanni Tummarello Christian Morbidoni
DERI Galway SEMEDIA
National University of Ireland Università Politecnica delle Marche, Ancona, Italy
+(353) 091 495285 +(39) 071 2204841
g.tummarello@gmail.com c.morbidoni@deit.univpm.it
ABSTRACT participate into Semantic Web communities (from here referred to
DBin is a Semantic Web application that enables groups of users as “regular” users) or might want to start up and/or maintaining
with a common interest to cooperatively create semantically them (power users). To participate means to be able to
structured knowledge bases. These user groups, which we call cooperatively build the community shared semantic knowledge.
“Semantic Web Communities”, are made possible by creating The power user starts up a new community by first creating a
customized user environments called “Brainlets”. Brainlets customized user environment for the editing and exploitation of
provide user interfaces and domain specific tools (e.g. querying, semantically structured annotations. These environments are
viewing and editing facilities) which enable community called Brainlets.
participants to interact with the data of interest. Brainlets are
directly created by domain experts using an XML description
2.1 Brainlets
Brainlets [1], are plug-ins in the DBin platform (therefore,
language. DBin clients communicate and exchange annotations
technically Eclipse Plug-ins) and can be though as “configuration
using a P2P infrastructure. Access control and digital signatures
packages” preparing the client application to operate on a specific
put by DBin inside the authored RDF enable trust and information
domain (e.g. Wine lovers, Italian Opera fans etc.). From the user
filtering. In this paper we show a specific use case where a
perspective, the relationship between Brainlets and the DBin
“Semantic Web Community” is created to enable a group of users
platform is similar to that between HTML and a Web Browser.
to share their del.icio.us tags and organize them into a
Much like HTML web sites, Brainlets are created in XML and
cooperatively built RDFS ontology.
RDF and do not require any programming skills. They customize
aspects such as:
Keywords
Semantic Web, Tags, Ontology creation, DBin, peer-to-peer. The ontologies to be used for supporting knowledge
creation and presentation of data;
1. DBIN PLATFORM OVERVIEW GUI Layout and coordination. Widgets are first
“instantiated” from a rich set of predefined ones and
The DBin project is an integrated, end-user oriented, Semantic
then configured for the domain of interest, e.g., an
Web Platform. More in detail, it is a Semantic Personal
ontology navigator might be configured to show certain
Knowledge Manager (Semantic PKM) with the following main
classes or instances and to hide others. The components
features:
are then interlinked among each other; this means that
Based on the Semantic Web languages stack chains of reactions to actions such as a focus change can
Topic independent, yet customizable to be domain be defined;
specific. Templates for domain specific annotations (e.g. a
Ontology based reasoning used whenever possible for “Movie Brainlet” might have a “Review” template, with
assisting the user (e.g. automatic rich user interface associated slots, that users can fill);
creation) in visualizing, editing and browsing data; Templates for readily available “pre-cooked” domain
Works as personal information manager and is run in a queries, which are structurally complex domain queries
local desktop environment. with only a few simple free parameters (e.g. “give me
the name of the cinemas where a movie of genre X is
Using a P2P algorithm, it can synchronize aspects of the
being shown tonight”);
local knowledge with that of other online DBin users.
A trust model and information filtering rules for the
Is not a programmer toolkit. Most customizations can be
domain (e.g. public keys of well known “founding
done using XML scripting languages and ontologies.
members” or authorities, preset “browsing levels”);
Rich client multiplatform software. Based on the
Scripts for guiding the user in creating new URIs for
Eclipse RCP, enjoys its plug-in system.
domain resources (e.g. adding a new "paper" to the
knowledge base);
Scripts connected to Brainlet specific menus or buttons
2. SEMANTIC WEB COMMUNITIES: THE
that implement domain specific functions;
USER EXPERIENCE
Support material, customized icons, help files etc.;
In this section we present the overall user interaction model as
implemented by the DBin platform. Users might simply want to
Optionally Brainlets might contain support to Java code possible for a given resource.
and libraries for add on capabilities beyond those
provided by the standard Brainlet widgets; 2.2 The overall scenario
Once Brainlets have been created by power users, they are
A basic RDF knowledge package.
installed by the regular users into their local DBin client.
Brainlets are downloadable files and as such they can be made
available at a Web site by their creator. DBin itself, however,
provides a mechanism for discovering new Brainlets as the user is
browsing the P2P channels; as a user join a channel which was
created for the users of a specific Brainlet, DBin will optionally
guide the user to the Brainlet download and installation.
The overall scenario is depicted in Figure 3. On top of what has
been illustrated in the previous section, Brainlets also have roles
in how a user can connect to the others. In particular, a Brainlet
contains pointers to P2P channels which are either known to
contain information pertaining to the domain of interest or that the
power user has previously created for this purpose. Creating a
P2P channel for a specific topic is a simple operation that has to
be performed on the configuration of an RDFGrowth server.
RDFGrowth servers act as “meeting point” for the DBin clients
but do not carry themselves metadata or binary attachments.
Binary attachments are stored by DBin automatically in a web
Figure 2. A Brainlet as experienced by an end user. The accessible space. This is done by DBin interfacing with a web
Semantic aware widget are positioned and made to publishing system much similar to WebDAV1 which we call
interoperate by the Brainlet configuration. “Data Publishing Service” (DPS). Unlike WebDAV, a DBin
publishing service is a simple PHP script and, as such, it can be
deployed with ease in most low cost commercial web hosting
To the end user, most of the above aspects are simply hidden environments. For the end user convenience, the DBin platform
behind the integrated Brainlets UI which presents itself, for comes with a default DPS setting2. The same Data Publishing
example, as shown in Figure 1 (ESWC Budva Brainlet). mechanism provides the DBin users with the ability to create and
It is important to notice that the Brainlet UI is not simply a mash publish RSS feeds and RDF dumps derived from the internal
up of visualizers. As the components are coordinated among each knowledge.
other, the result is that a Brainlet guides the user into a The Brainlet provides for a domain specific user interface as it
meaningful and domain specific workflow interaction with the instantiates and positions RDF aware widgets which are
structured data. At any time, the domain ontologies are used as connected together to create an application workflow. It is
much as possible for assisting users in editing and browsing important however to notice that they do not “take over” the
knowledge, for example to suggest which kind of annotations are individual installations; many Brainlets can coexist as needed.
1
http://www.webdav.org/
2
Which
Figure 3 DBin and its relationship with different actors uses
in the our installation
"Semantic of the Data Publishing Service
Web Communities"
located at http://public.dbin.org
2.3 The RDFGrowth P2P algorithm opinions about the tools, pointing at web tutorials or at web sites
In this section we quickly overview the basic ideas and principles that use specific technologies. On top of pure metadata,
behind the RDFGrowth P2P metadata exchange algorithm, refer annotations, they can also point at rich media posted on the web
to [2] for a complete description of the algorithm. (e.g. pictures, documents, long texts, etc.). Other users who
receive such annotations in the group can then reply or further
Unlike previous approaches, which have explored P2P annotate each of these for their personal use or into public
interactions among peers based on distributed queries, collecting knowledge.
and returning results, as in works like [3], [4], [5] and [6],
RDFGrowth operates in a “greedy” and uncommitted scenario As mentioned earlier, the operator that selects which resources a
where cooperation between peers is minimal. It operates by direct client shares with the others is the GUED. A GUED for the Web
queries that are in general of fixed computational cost. Without development community might contain queries such as “all the
going into details, the algorithm provides synchronization of RDF resources of type WebTechnology”, with respect to a specific
knowledge among the user’s DBin installations. Such ontology, chosen or developed by the community’s creator, where
synchronization is not performed in full, but along “aspects” of the class WebTechnology is defined. Only the metadata involving
knowledge; it is restricted to those RDF triples which are very resources that fit this definition of 'common interest' are made
closely connected with a set of URIs defined “interesting” by a available by a peer to the others in the community. In this case
community “banner”. The P2P community creator, usually the such metadata would be for example statements like “Web site X
same person who created the Brainlet, defines an “URI interest uses web technology Y” or “Web page X deals with issues in
banner”, that we call Group URIs Exposing Definition (GUED), using technology Y”.
usually queries which have as a result a list of URIs. An example Users like Bob (marked B) have interests, which go beyond those
of GUED can be “select all resources of type Papers which have of a single community. In this example Bob is interested in
topic Semantic Web”. Upon joining a community, a peer runs developing a collaborative tagging application, so he joins both
such queries to select the local set of resources about which the ‘Web development’ community and the ‘collaborative
knowledge will be synchronized with that of the other systems’ one, thus being able to import into his own DBin
participants. metadata coming from the two sources. At this point Bob is able
At user interaction level, DBin shows an interface that is to make joint queries across the two domains, e.g. “which are the
somehow similar to that of popular file-sharing software. A list of technologies on which existing collaborative systems are based
servers is presented and, upon selecting one, the list of semantic on”. Finally, Carole (marked C), is a Semantic Web researcher, so
P2P channels is displayed for the user to join. Furthermore, an she might decide to join all the communities as they all contain
access control mechanism allows for restricted P2P groups. information which might be useful for her research activity.
The interconnection between Semantic Web Communities can be
3. INTERACTION AMONG seen also under a second, very novel point of view. If
COMMUNITIES Communities share identifiers (e.g. their own URLs for available
It is interesting to see how multiple Semantic Web Communities web applications, URLs of their specification for web
relate both to each other and to the individual user. technologies) then an annotation (e.g. web site X is based on
technology Y), originally posted in one community is
Figure 4 shows a possible use case where each user participates in automatically cross posted to the other community since the URI
one or more communities with different topics of interest. is of interest to both (belongs to the GUED of both groups). This
aspect, to our opinion, represents a particularly novel feature of
Semantic Web Communities as a communication mean.
Information in fact flows across group boundaries when it is in
fact relevant to the users participating in the different
communities. This is opposed to what happens with traditional
means such as mailing lists, web forums or newsgroups where
information, arguably, has to be manually cross-posted.
4. THE DEL.ICIO.US BRAINLET
The tagging paradigm is increasingly been adopted by people for
organizing web resources they visit. Systems like del.icio.us3
allows to associate simple keywords to web resources while the
user is navigating the Web. However, such applications only
allow annotations to be a flat list of terms, while it would be
obviously useful to organize them in taxonomies or establish
Figure 4 An example of users participating in multiple relations among them and possibly with existing ontologies. In
communities this section we illustrate the del.icio.us Brainlet, that deals with
this issue.
To think of a specific use case let us consider a group of
Users in groups such as that of Alice (marked A in the colleagues, each one using del.icio.us to tag web articles and
illustration) are Web developers. Within their community the resources of interest for their work. They also use a knowledge
resources of interest are, for example, available web technologies
and tools (such as PHP, Ajax, JSP, etc.). Participants in the
community annotate such resources for example expressing 3
http://del.icio.us
management application (such as DBin) for cooperating and The screenshots shows this Brainlet in action. In Figure 5, the
organizing internal documents. It is likely that a subset of the tags ontology view visualizes the taxonomy of the classes and provides
they created in their del.icio.us accounts will be conceptually functionalities to add new classes and subclasses as items of the
related to or equivalent to some concepts present in the domain tree, while the tags view shows the flat list of tags and gives the
ontology. Using the DBin del.icio.us Brainlet it is possible to capability to update such a list from a del.icio.us account. Once a
import lists of tags into the local RDF store, transform them into tag has been selected, Web pages which have been marked with
ontology classes and insert them in the class hierarchy. that tag are listed in the related bookmarks view and the their
content can be displayed in the browser view.
Upon selecting a tag (e.g. J2EE), a “transform into a sub-class”
action is available to state that a tag is a sub-concept of a class in
the ontology (e.g. Technology). This results in a new class being
added to the ontology. As shown in Figure 6, when the user
selects the class Technology, the web pages tagged with ‘J2EE’
are displayed in the right view, as such a tag has been stated to be
a specification of the concept of technology.
The tags, as well as the pages and the other ontological terms, can
then be annotates as any RDF resource in DBin. This enables
annotations with comments, binary attachments, votes and any
kind of structured annotation as defined by the Ontologies.
5. REFERENCES
[1] Tummarello, G., Morbidoni, C., Nucci, M. and Panzarino, O.
Brainlets: "instant" Semantic Web applications. In Proceedings of
the 2nd Workshop on Scripting for the Semantic Web at the
Figure 5. Upon selecting a tag the related bookmarks are listed European Semantic Web Conference (Budva, Montenegro, 2006)
and each of them can be visualized in the embedded browser. [2] Tummarello, G., Morbidoni, C., Petersson, J., Puliti, P.,
Piazza, F. RDFGrowth, a P2P annotation exchange algorithm for
scalable Semantic Web applications. 1st International Workshop
on Peer-to-Peer and Knowledge Management (Boston, USA,
2004)
[3] Nejdl, W., Wolf, B., Qu, C., Decker, S., Sintek, M., Naeve, A.,
Nilsson, M., Palmer, M. and Risch, T. EDUTELLA: A P2P
Networking Infrastructure Based on RDF. In Proceedings of the
International World Wide Web Conference (Honolulu, Hawaii,
2002)
[4] Cai, M. and Frank, M. RDFPeers: A Scalable Distributed RDF
Repository based on A Structured Peer-to-Peer Network. In
Proceedings of the 13th International World Wide Web
Conference (New York, USA, 2004)
[5] Nejdl, W., Siberski, W., Wolpers, M., Lser, A. and
Bruckhorst, I. SuperPeer Based Routing and Clustering Strategies
Figure 6. The JSEE tag has been identified as a sub-class of for RDF Based Peer-To-Peer Networks. In Proceedings of the 12th
the Technology class, that automatically inherits the relation International World Wide Web Conference (Budapest, Hungary,
with the web resources tagged with JSEE. 2003)
[6] Chirita, P. A., Idreos, S., Koubarakis, M. and Nejdl, W.
Publish/Subscribe for RDF-based P2P Networks. In Proceedings
By using the DBin P2P capabilities, such process is cooperative
of the 1st European Semantic Web Symposium (Heraklion,
across the team. If necessary, DBin digital signature infrastructure
Greece, 2004)
would enable each team member to apply filters to see only
contributions from certain members.