Easing Participation in the Semantic Web Stefan Haustein, Jörg Pleumann Computer Science VIII, X University of Dortmund, Baroper Str. 301 D-44221 Dortmund, Germany fstefan.haustein, joerg.pleumanng@udo.edu ABSTRACT  Immediate feedback. After an HTML page had been Although a promising idea, the Semantic Web currently designed in a text-editor, the result could be displayed seems to have a problem duplicating the success story of in any HTML client to get an impression of the results. its predecessor, the World Wide Web. The number of peo- Thus, the user had an immediate feedback on his or ple actively participating in the Semantic Web has been very her work. limited until now, because people can't see the bene ts origi-  Additional bene ts. Even though their original pur- nating from the extra e ort they invest into semantically rich pose was to present information to other people, web pages. Unfortunately, this advantage is barely visible at HTML pages could be used as a means of discussion or all until a critical mass of RDF-annotated pages is available documentation for people participating in a project or on the net, thus making is diÆcult to recruit new partici- even for personal use. Thus, there was an additional pants for the Semantic Web. The article tries to break this gain users got from participating in the world wide vicious circle by showing that the use of appropriate tools web, which made the system even more attractive to may both ease participation in the semantic web and pro- them. vide a number of additional advantages not directly related to the Semantic Web. The latter, in particular, may con-  Low critical mass. As a networked e ort, the World vince a larger number of people to participate, and thus Wide Web required a minimum (but large enough) bring the Semantic Web nearer its critical mass. number of participants to raise the interest of out- side people, convincing them to become involved. Yet, 1. INTRODUCTION since the World Wide Web was the rst system of its kind, and there was no similar system to compete with, The Semantic Web is a great idea. Yet, it did not quite this critical mass was relatively low. take o until now. Why is this the case? Some argue that RDF [19], the language for adding the semantic information When we compare these points to the Semantic Web in its to existing web pages is the problem. These critics see RDF current form, we notice that most of them are not ful lled: as being too complicated or under-speci ed [11, 6]. While RDF truly has its problems in some areas, we don't think  Simplicity is only partially given. The mixture of RDF that the language itself is the main obstacle that hinders and DAML+OIL is understood in all its details only people from participating in the Semantic Web. But to nd by people that have a background in AI or related out where the problem actually lies, we rst need to take a elds. Novices will only be able to use basic concepts step back and look at what made the original web such a of RDF and might thus have problems to see the real tremendous success. advantages of the Semantic Web. In our opinion, there were four important reasons for the success of the World Wide Web:  Immediate feedback is not given. Unfortunately, there is no speci c client software for the Semantic Web that  Simplicity. HTML was easily understood and quickly gives users an impression of their RDF fact base. One written down. Even novices could design a few basic could argue that it doesn't even make sense to ask for web pages with little e ort, put them in a matching such a software, because the clients of the Semantic directory structure and start an HTTP daemon to de- Web are programs rather than human beings. liver the content to clients.  There are no additional bene ts, at least none that are ovious to "`ordinary end-users"'. While human- readable HTML pages primarily designed for other Permission to make digital or hard copies of all or part of this work for people can also be used for personal purposes, this personal or classroom use is granted without fee provided that copies are is not true for RDF facts, which are meant to be read not made or distributed for profit or commercial advantage and that copies by programs. bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific  The critical mass is considerably higher. Why is this permission by the authors. Semantic Web Workshop 2002 Hawaii, USA the case? This time, there already is an existing Copyright by the authors. system | the original World Wide Web | , and Figure 1: Simple UML Diagram for university department's web site most people nowadays tend to use the "`brute force"' ontology and fact management. Information is stored in a method to nd a speci c piece of information in it, knowledge base providing ne grained access. The ontology namely Google or some other search engine. Thus, it is utilized to make sure that the content corresponds to the is more diÆcult to convince people to take part in an- desired structure. The two systems mentioned above are other system, even if it is an extension to the existing able to export their fact base to an RDF representation. one. While these tools aim into the right direction, they still have a problem: As long as one wants a machine-readable As long as the rst three points are true, the critical mass RDF-version of the facts as well as a human-readable of users needed to make the Semantic Web "`take o "' will HTML-version, duplicate e ort is required to maintain both. be hard to reach. Unfortunately, seen the other way round, Take, for example, a typical web site for a university depart- the Semantic Web hardly has some kind of real bene t unless ment containing information about the department's sta , there is a large-enough number of participants that makes research topics, projects, and publications. A highly struc- available RDF-speci ed information to others, that is, until tured site like this is suitable for participating in the seman- the critical mass is reached. The current situation could be tic web, and it can easily be modelled using a corresponding seen as some kind of vicious circle that has to be broken domain ontology. Yet, a change as simple as a telephone before the Semantic Web has a chance to succeed. number has to be propagated to the RDF version as well as the HTML version. 2. TOOLS TO BREAK THE CIRCLE Given a Semantic Web tool followed a generative ap- To break the circle, we have to get rid of as many as pos- proach, the situation would be easier: Assume this tool were sible of the four problems shown in the previous section. able to incorporate regular HTML for the unstructured part Since we cannot lower the critical mass for mainstream ac- of the web site, and these pages could contain placeholders ceptance of the Semantic Web (possibly by forcing people for insertion of information contained in the fact base. The into it), we have to focus on the other three: Simplicity, im- tool would then be able to generate the actual HTML pages mediate feedback, and additional bene ts. A very promising automatically from the existing RDF information { or even way to achieve this seems to be the use of appropriate tools. both from a common fact base { , thus requiring the user These tools would have to ease participation in the Semantic to maintain this fact base only, at least as far as structured Web, but would also have to provide some "`added value"' information is concerned. If the generation of pages takes that makes them attractive to end-users. Obviously, when place at run-time, we arrive at a tool that could be seen as using the tools, people will also likely participate in the se- a "`Semantic Web-enabled HTTP server"' mantic web, even if that is not their original motivation. While the avoidance of redundancy already is a big advan- The following sections try to show what features these tools tage addressing simplicity, the generative approach provides might o er. other advantages that fall into the area of "`added value"': 2.1 Generative approach  In contrast to editing HTML directly, a unique look and feel can easily be established for the whole site, Looking at existing tools developed for or related to the given an appropriate template mechanism. Semantic Web, for example Protege-2000 [23] or Ontobroker [15], one notices that these are primarily designed to support  In addition to HTML and RDF, other target formats like WML and cHTML can be generated from the same In its current form, the Semantic Web requires users to fact base, lowering redundancy even further. learn yet another formal description language. Users having an background in AI may be expected to be familiar with de-  In contrast to plain HTML les, ontology-based con- scription logics and corresponding ontology modelling tools. sistency checks can be performed automatically while For mainstream acceptance, though, integration of recog- entering data, e.g. avoiding dangling links inside the nised standards like UML [20] may help to improve accep- system. tance of Semantic Web tools and thus lower the entrance barrier [13]. Most students of computer science or related 2.2 Incorporation of database features engineering disciplines can be assumed to be familiar with To broaden the possible target audience of our Seman- UML and modelling tools like Together or Rational Rose. tic Web server, we might try to incorporate database-like These students could easily apply their modelling knowl- features and thus position it as an alternative to a "`heavy- edge to the Semantic Web and thus contribute to its group weight"' database solution. of early adopters. While relational databases with HTML-generating front- end are quite common these days (e.g. Cold Fusion [8], PHP 3. THE INFORMATION LAYER [2], Enhydra [1] etc.), these solutions are mainly used for In order to demonstrate that participation in the Seman- sites with a simple, low-dimensional structure, such as guest tic web actually can be simple, and that using a server based books or news pages (e.g. Slashdot.org). More complex on a ne grained fact base instead of HTML- or RDF les domains such as university departments often still use plain can provide immediate gains, we have started to model our HTML les for their web presentation, or make only limited own unit's web pages accordingly. For this purpose, we used use of database tables. our Information Layer system, which stores data in a simple Here, the reason may be that a high number of ta- XML format that is determined by a given ontology. The bles would be required for modeling even simple ontologies, information layer uses an object-oriented model for data rep- mainly because associations are not rst class members of resentation. Objects consist of atomic attributes and rela- relational database systems. Revisiting the university de- tions to other objects. The consistency of relations in both partment scenario, we need at least tables for persons, re- directions is ensured automatically, avoiding inconsistencies search topics, projects, and publications. Figure 1 shows inside the system. The concepts and relations are de ned a possible UML class diagram of the database's conceptual application-dependent in an external ontology de nition le. model. Since all n:n associations require separate associa- All les used by the information layer are stored as XML tion tables, this results in quite a lot of normalised tables documents. (more than 10), each of which potentially contains only a The InfoLayer system was originally designed as an inte- very small subset of all the possible instances. grated information platform for software agents and human In this case, the bene t for the creator, that is, the dy- users in a conference scenario. The system was used in the namic generation of HTML or { in our case { RDF from a COMRIS project [21] in order to make conference informa- single set of data, does not outweigh the extra e ort inherent tion available in appropriate formats to human users as well in maintaining the tables. as software agents, utilizing the same underlying knowledge Using Semantic Web tools, the picture may change signif- base. Access to the content is possible via a generic HTML icantly. For a low number of instances, the internal knowl- interface as well as a FIPA [16] based XML interface [18]. edge base provided by a Semantic Web tool may be suÆ- Obviously, when information is machine readable for soft- cient. Associations are direcly supported, and the ontology ware agents, it is not a big leap to make this information language also allows to specify integrity constraints for them available for the Semantic Web as well. at an appropriate level. Since Semantic Web tools usually In the process of modelling our unit web pages, we made come with a generic user interface, the need to create HTML several improvements to our system, simplifying the use as forms for editing the tables is avoided. a replacement for a \regular" web server. While there may be alternative paths appropriate for other systems, our main 2.3 Incorporation of Content Management purpose was to show that using semantic web systems may Features provide direct advantages over regular web servers, even Another area that a Semantic Web tool might address is without relying on advanced features such as knowledge inte- content management. Content management systems, such gration from di erent sources (e.g. KAON-REVERSE [17]). as Hyperwave [3], Zope [5] or OpenCMS [4] provide user, version and metadata management for a set of HTML pages 3.1 XMI Import or binary documents in other formats such as PDF or Word. The original version of the Information Layer system used Their set of meta data, hoewever, is usually xed and tai- its own proprietary XML-based ontology description lan- lored to the most common needs. Here, ontology-based Se- guage. In order to simplify the initial step of generating the mantic Web tools provide much more exibility, and may be application ontology, we have replaced the internal format superior to general content management systems in domains by XMI [20], the XML based exchange format for UML di- where the meta data requirements signi cantly di er from agrams. Figure 1 shows a simpli ed version of the UML the standard set provided by content management systems. model currently used as a basis for our unit web pages. We have chosen UML as ontology modelling language [13] 2.4 Openess to Alternative Schema Languages instead of RDFS [7] because it is diÆcult to avoid contact In the introduction, we claimed that beneath providing no with UML when working in computer science or in the IT in- gain that becomes immediately obvious, RDF annotation is dustry in general. For most computer scientists, a UML ed- complex. itor like Rational Rose or Together is part of their standard Figure 2: A Subset of the Semantic Web Research Community ontology concept Hierarchy tool box. Thus, the extra e ort of installing and getting tance hierarchy of the SWRC ontology. familiar with an RDFS editor, possibly preventing people Since our \local" research unit ontology was primarily de- from getting in touch with the Semantic Web, is avoided. signed to t the needs of our \regular" web presentation, it Compared to other languages suitable for ontology mod- does not match the \shared" SWRC ontology exactly. How- elling, UML currently still lacks clearly de ned semantics. ever, using the template mechanism of our system, we are However, there are signi cant e orts to solve this problems able to generate RDF pages corresponding to the SWRC [22, 10]. ontology on the y. Figure 3 shows a simpli ed example This aspect may be less important for systems providing template that is used to generate SWRC-compliant RDF their own comfortable Ontology editor. content for instances of the class \Member". In the tem- plates, elements in a special namespace, denoted by the t 3.2 HTML Generation pre x in the example, are replaced by content queried from The most important capability required for being able to the Information Layer with respect to the current instance replace existing web servers is { of course { the generation which is determined from the page URL. of HTML pages. Thus, it is possible to participate in the Semantic Web The information layer contains a module that provides without needing to extend a prede ned shared ontology, built-in web-server functionality. The server is able to gen- which may be bloated and still not full ll all local require- erate HTML dynamically: For any object, the attributes ments. Instead, the domain of interest can be modelled us- are simply displayed, and the associations to other objects ing a lean domain speci c local ontology. The SWRC person are converted to sets of hyperlinks to the related objects. name slot illustrates the advantage of this approach: SWRC Concepts are displayed as a clickable list of instances corre- contains only one person name slot that is not split into rst sponding to the concept. The HTML interface can also be and last name. If the local application requires having both used to edit the content of the system using forms generated parts available separately, it would be necessary to duplicate dynamically based on the ontology. In the COMRIS project, the corresponding information, when building the local on- the HTML interface was used for interaction with the end tology on top of the SWRC ontology. Also, SWRC concepts user as well for as debugging and inspection purposes. like \Organization" may not be required in a local ontology In addition to generic HTML generation, templates can be covering a single organization. Information about the local used in order to generate HTML pages conforming to a given organization can be stored in a single static RDF le, not look and feel. In the COMRIS project, we have also used bloating the local ontology. the template mechanism to generate the input structure re- In addition to template based RDF generation, it would quired by the text generation system TG/2 ([9]) which was be possible to generate RDF directly corresponding to the used to generate natural language output for a wearable de- local ontology automatically [12]. However, this feature is vice. The template mechanism is described in some more not implemented yet. detail in the next section. 3.4 Infrastructure Integration 3.3 SWRC and RDF Integration For simpler integration with the existing Web server in- The Semantic Web Research Community (SWRC) On- frastructure, we changed the Information Layer implemen- tology [24] is an ontology designed in order to describe tation to become a Java Servlet instead of a stand alone the structure of the Semantic Web Research Community, program. Running the Information Layer as a Java Servlet namely the members, events, topics and projects, in a allows smooth integration with existing Web presentations, machine-readable manner. It is available in DAML+OIL without any hard switch. The service can simply be added and FLogic formats. Figure 2 shows a subset of the inheri- where it makes most sense, and then later be extended to Templates Servlet-Container Servlet-Container(e.g. (e.g.Tomcat) Tomcat) Infolayer Template based XML generation XHTML RDF