A Browser-based Tool for Collaborative Distributed Annotation for the Semantic Web Onni Valkeapää Eero Hyvönen Helsinki University of Technology (TKK) Helsinki University of Technology (TKK) and Semantic Computing Research Group (SeCo) University of Helsinki P.O. Box 5500, Otaniementie 17 Semantic Computing Research Group (SeCo) FI-02015 TKK P.O. Box 5500, Otaniementie 17 http://www.seco.tkk.fi/ FI-02015 TKK onni.valkeapaa@tkk.fi http://www.seco.tkk.fi/ eero.hyvonen@tkk.fi ABSTRACT web portals. It is being applied to various applications within the This paper presents a prototype of an ontology-based semantic National Semantic Web Ontology Project in Finland annotation tool Saha. The tool eases the process of creating (FinnONTO)2. ontological descriptions of documents by providing a simple user interface that hides the complexity of ontologies from 2. SAHA ANNOTATION SYSTEM annotators. Saha is used with a web browser, and it supports collaborative distributed creation of metadata by centrally 2.1 Design rationale and implementation storing annotations, which can be viewed and edited by different In order to develop an annotation system that would be easy to annotators. Concepts defined in external ontologies can be use and that would support the creation of semantically rich imported and used in annotations by connecting Saha to metadata, we identified four basic requirements for our system. ontology servers. The tool is being tested in practical semantic These were also features that we felt were not supported well portal projects. enough in many of the current annotation platforms: 1) The system should, as a rule, hide technical concepts Categories and Subject Descriptors related to markup languages and ontologies from its user. H.3.1 [Information storage and retrieval]: Content Analysis Typically, this means e.g. hiding URIs and complex class and Indexing – Indexing methods. hierarchies from the annotators. This should not, however, be done at the expense of expressiveness of the annotations. General Terms Design, Experimentation. 2) Annotations should be based on annotation schemas, which are ontologies that define the structure of annotations and guide annotators in their task. The system Keywords should form its user-interface automatically according to Semantic Annotation, Ontologies, Annotation Schema. the annotation schema loaded in it. 3) The system should support collaborative distributed 1. INTRODUCTION annotation, where the annotation process can be shared Provision of semantically rich, ontology-based metadata is one among different annotators at different locations. of the major challenges in developing the Semantic Web. In 4) In order to make the system platform-independent from recent years, various annotation systems have been developed to the annotator’s point of view, it should be implemented as face this challenge [14]. There is, however, a lack of systems a web application. that 1) can be easily used by annotators unfamiliar with technical side of the Semantic Web, and that 2) are able to It has been widely argued that automation is needed in the support distributed creation of semantic metadata based on annotation for the Semantic Web [1],[12],[14]. There are, complex metadata annotation schemas (ontologies). In this however, limitations to what can be done automatically. These paper, we present an annotation tool, Saha1 [15], aiming to limitations usually lead to either missing or incorrect satisfy these needs. Saha is browser-based in order to support annotations (low recall/precision) [14]. For example, it is wide and distributed usage. It has simple user interface that difficult for an automated system to recognize semantic relations hides complexity of ontologies from the annotator, and adapts between entities it has extracted from a document [3]. Due to the automatically to different metadata schemas. Saha supports limitations related to automation, most of the current collaborative annotation of web-documents and it can utilize (semi)automatic annotation systems still need human ontology services for sharing URIs and importing concepts intervention at some point in the annotation process [12]. In defined in various external ontologies. The tool is targeted Saha, our primary goal has not been the automation of the especially for creating metadata of web resources in semantic annotation process, but rather to support the creation of 1 2 http://www.seco.tkk.fi/applications/saha/ http://www.seco.tkk.fi/projects/finnonto/ annotations that cannot be produced automatically. Although annotation classes of an annotation schema and is able to search requiring a lot of work, such annotation can be seen as a and open existing annotations or create a new one. The collaborative effort, comparable to the creation of different kinds annotator can also create a new subclass for an annotation of Wikis3. schema class in order to specify the class hierarchy. This is an optional feature and can be disabled. In order to keep the user The basic architecture of Saha is depicted in figure 1. It consists interface as simple as possible, more elaborate ontology-editing of the following functional parts: features, such as creating new properties, are excluded. A new 1) Saha application, which is run on a web-server. It stores annotation is created by choosing a class and typing the URL of and distributes annotations and creates web-pages which the document to be annotated. Existing annotations can be form the user interface used in creating annotations. searched by their labels or by the documents they annotate. 2) PostgreSQL-database, which is used to store the Jena’s ontology model containing schema and annotations. 3) Annotators using web browsers to interact with the system. 4) The ONKI ontology-service, which is used to fetch concepts defined in external ontologies and to share instances created by the annotators. 5) Applications using the annotations created with Saha. Annotations can be retrieved in RDF/XML using HTTP- GET Figure 1. Architecture of Saha Saha is a web application implemented using the Apache Cocoon4 and Jena5 frameworks. It is designed as a web application in order to impose as little requirements as possible to the end user’s computational environment. To use Saha, all an annotator needs is an appropriate web browser with an Internet connection. Saha uses extensively techniques such as Javascript and Ajax6 in order to provide annotator with simple and versatile user interface. Figure 2. User interface of Saha: the class selection page 2.2 User interface The user interface of Saha is comprised of two main pages. In In the second page of Saha’s user interface, depicted in figure 3, the first page, depicted partly in figure 2, the annotator sees the an annotator can edit annotations and view the documents to be annotated. With Saha, any kind of a document (HTML-page, 3 PDF-document etc.) that can be referenced by a URI can be http://en.wikipedia.org/wiki/Wiki annotated. However, the web browser being used sets the limits 4 http://cocoon.apache.org/ to the kinds of documents that can be viewed in Saha’s user 5 http://jena.sourceforge.net/ 6 http://www.w3.org/2006/webapi/ Figure 3. User interface of Saha: the annotation page interface7. The annotation page shows properties of the selected In Saha, annotations are instantiated classes and properties of an annotation class in a simple form, which can be used to supply annotation schema, which are linked to the document being values for the properties. In figure 3, an annotation belonging to described. The linking plays an essential role in annotations, the class “Article” and annotating the document with the URL because 1) an annotation is separate from the document it “http://www.seco.tkk.fi” is shown. The annotation has one literal describes, and 2) the way linking is done affects the meaning of value defined for the property “Title”. The annotator can edit the the relation between the annotation and the document. There are value by clicking on it. Defining an object value, such as two ways to associate an annotation with a document in Saha. “Creator” shown in figure 3, is explained in detail in subsection The first one makes the assertion that the document is an 2.6. instance of one specific class defined in the schema. It can be thus used efficiently to classify documents. An example of this 2.3 Annotations in Saha kind of an annotation is expressed in RDF/XML below: Annotations created in Saha are based on different metadata annotation schemas, which are defined in OWL. In typical use Semantic Computing Research Group scenario, a schema describes some specific area of interest and is created by an administrator of annotation project. The purpose of an annotation schema is to define a description template for The second method associates an annotation with the document annotation construction [13]. The schema helps annotators to by using a named property, which is defined in the annotation describe resources in a consistent way and it can be effectively schema. This idea is similar, e.g., to the usage the property used to construct a generic user-interface for the application. “annotates” in Annotea [7]: 7 The document being annotated is viewed in a frame. If the web Semantic Computing Research Group browser cannot view the document due to unsupported format, a blank page is displayed. An annotation created with Saha describes a document as a meta-schema of Saha is used to define, among others, the whole, in a sense that it is not being associated with any following settings for an annotation project: particular section, sentence or word inside the document. If we wanted to annotate certain parts of the document, we would • The classes of annotation schema, which are shown on have to use some fragment identifiers, such as XPointers8, and the class selection page (see figure 2) and which can could only annotate documents which support fragment thus be used to create new annotations. identification. The reason why we are not using any identifiers • The property (or properties) of a class, whose value is is, that in our approach the main goal of annotations is to put a used as rdfs:label of an instance of the class. These are document in relation to other documents and to serve as a typically literal properties, like the property “name” of semantically described index, which can be used e.g. to the class “Person”. categorize documents or to search a particular document among a group of documents. This is similar to the idea proposed in • The order of properties of a class, in which they are [9]. Our way of using annotations differs from the approach shown on the annotation page (see figure 3). where annotations are used as additional pieces of information • The classes of the annotation schema whose instances associated with some specific parts of a document and are shown are exported to an ontology service. This feature will be to the reader. This kind of an approach is used e.g. in Annotea. explained later in subsection 2.7. As stated earlier, an annotation in Saha is an instance of a The meta-schema plays an important role in improving the schema’s class that describes some document and is being linked usability of the system. For example, when an annotator creates to it using the document’s URI. We make a distinction between a new annotation, the rdfs:label can be automatically constructed the annotation of a document and the description of some other for the annotation instance according to the meta-schema, saving resource or concept that is somehow related to the document the annotator from defining it manually. The meta-schema being annotated. In addition to containing classes and properties improves the usability of the system, but it may also enable the used to annotate documents, an annotation schema used with use of a same annotation schema in different annotation projects Saha can also contain classes and properties for describing by allowing the schema to be used differently in each project. resources that are not documents. In other words, an annotation For example, there may be a different set of annotation classes schema can form a basis for the local knowledge base (KB) that shown in the class-selection page of each project. The use of the contains descriptions of different kinds of resources that may or same annotation-schema in different projects enhances the may not exist on the web. These descriptions or KB instances interoperability of annotations as it decreases the need to can be used as values of properties in annotations. The KB is develop distinct annotation schemas for different projects. In refined and extended when new annotations are produced, as other words, when the same schema is used in different they require new KB instances to be created. annotation projects, there is no need to specify mappings between annotations created in them. 2.4 Annotation projects Saha supports collaborative annotation and sharing of annotations through annotation projects. Each annotation 2.6 Creating and using KB instances An annotation schema’s object properties may have two schema loaded to Saha forms an annotation project, which can different kinds of values. They can be either concepts defined in have multiple users as annotators. In practice, an annotation an external ontology or, alternatively, KB instances. The project is a Jena’s ontology model stored in a database. rdfs:range can be used to define the types of instances that are Each user of an annotation project sees all the annotations and allowed as values of a property. If rdfs:range is not defined, an KB instances of the project and can edit them, as well as create annotator may choose any type of resource he or she wants to new ones. An annotation project can belong e.g. to an use. When a concept of an external ontology is to be used as a organization or some other group of people, that are producing value of a certain property, the project’s meta-schema must annotations using a certain annotation schema. Even though the contain a mapping between the property and the ontology-server annotations and KB-instances created in one project cannot be hosting the external ontology. According to this mapping, Saha directly edited and used in other projects, KB-instances can be can send the query used to find a concept to the right ontology- exchanged between projects to allow semantic relations between server. them (see subsection 2.7). When defining an instance value for a property, the annotator must first check if the annotation project’s KB already contains 2.5 Meta-schema an applicable instance to be used as a value. This is to prevent In addition to describing annotations’ structure in an annotation annotators from creating multiple KB instances that all refer to schema, we also need a way to define how the schema is actually the same resource. If an appropriate instance cannot be found, used during the annotation process. Although rules defining the the annotator can create a new one. Figure 4 illustrates the input- use of a schema could in most cases be expressed using the field of object property named “Creator”, which has a class schema itself, it is often useful to separate the schema design named “Person” in its range. An annotator has typed in “tom” in from its use [2]. In Saha, this is done using the meta-schema, the input-field. The system has searched the KB for the instances which is a simple RDF-file that describes how a certain of the class “Person” with an rdfs:label value that contains a annotation schema is used in a particular annotation project. The (sub)string “tom”. The search is done on the background using a technique similar to semantic autocompletion [5], a semantic 8 http://www.w3.org/XML/Linking extension for the idea of completing input search keywords online, proposed in Google Suggest9. The results of the KB 2.7 Utilizing ontology services search are shown in a menu appearing below the input-field after An important feature of Saha is its ability to connect to the the search. The annotator may choose to pick one of the ONKI10 ontology service framework developed in the instances returned by the query, list all instances of the class FinnONTO project [10]. It allows annotators to import concepts “Person” or create a new one. defined in external ontologies and also to share KB-instances with other annotators that work on different annotation projects. This kind of instance sharing and use of shared ontologies is vital when pursuing the semantic interoperability between different Semantic Web systems. Figure 6 illustrates an example of how the ONKI-service can be used when creating annotations in Saha. An annotator is defining a value for an object property and uses the ONKI- browser to find a concept defined in the ontology of an ONKI- service. In this case, the annotator is browsing the Finnish Upper Ontology YSO. When the annotator finds a concept he or she wants to use, it can be imported to Saha by clicking the link named “Fetch Concept”. This will set the URI of the selected Figure 4. Instance-search using autocompletion concept as the property’s value in an annotation being created. Another way of finding and importing a concept from the ONKI-service is identical to the instance KB search presented in If the annotator chooses to create new instance, a dialog box is section 2.6. When using it, an annotator does not have to use the opened providing fields to supply values for instance’s ONKI-browser to locate the concept, but is able to find it with properties. Figure 5 illustrates a dialog box, which is used to semantic autocompletion. create a new instance of the class “Person”. The input form of the dialog box has the same functionality as the annotation page of Saha has. The instance is created and stored in the KB by closing the dialog box. After that, the instance will be available to be used in annotations. Figure 6. Using the ONKI-browser Figure 5. Dialog box for creating new KB-instance In addition to providing a way to browse and use concepts defined in external ontologies such as YSO, ONKI-service can be used to share the instances created in different annotation The previous example showed how new instances can be created projects. This is done by declaring an annotation schema’s class on the fly when annotating with Saha. The classes used to create “public” in the project’s meta-schema. When the annotator new instances need not be as simple as the class “Person” creates an instance of public class, the instance’s data will be presented here. Instead, they may well have some object- sent to the ONKI-service. After this, the instance’s URI can be properties that require other instances as values and so forth. used in other annotation projects using the same ONKI-service. Saha can be thus used to describe diverse semantic relations and The mechanism described here is practical, as it allows the structures. creation and use of both public and private instances in projects. 9 10 http://labs.google.com/suggest/ http://www.seco.tkk.fi/applications/onki/ 3. DISCUSSION portal Tervesuomi.fi11 [4]. Here much of the content and Ontology-based semantic annotations are needed when building metadata for the portal will be provided by health experts the Semantic Web. Although various annotation systems and working at various health organizations in Finland. Saha has methods have been developed, the question of how to effectively also been tested in metadata creation for the Opintie12 portal, a produce quality metadata still remains largely unanswered. follow-up version of the educational semantic portal Orava13 Automated systems are of necessity when masses of documents [11] using Learning Object Metadata (LOM). are being annotated. On the other hand, when there is a need to Initial feedback from end-users not involved in developing the express more complicated semantic structures and when the software has been promising but further experimenting is still precision and quality of annotations are important factors, needed. manual systems are still needed. This shows that different approaches to semantic annotation should not be seen as 3.3 Future work mutually exclusive, but rather completing each other. We have Future plans include using Saha to provide metadata for tried to tackle the problem of creating semantically rich additional semantic portals such as CultureSampo14, the next annotations by developing an annotation system Saha that generation version of the MuseumFinland15 portal [6]. We also supports the distributed creation of metadata and that can be aim to research the integration of the semiautomatic annotation easily used by non-experts in the field of the Semantic Web. framework Poka16 with Saha. 3.1 Contributions and related work A number of semantic annotation systems and tools exist today ACKNOWLEDGEMENTS [12],[14]. These systems are primarily used to create and This research is a part of the FinnONTO project funded mainly maintain semantic metadata descriptions of web pages. by the Finnish Funding Agency for Technology and Innovation (Tekes). Annotea [7] supports collaborative, RDF-based markup of web pages and distribution of annotations using annotation servers. Annotations created with Annotea can be regarded as semi- REFERENCES formal, since it does not support the use of ontological concepts [1] Corcho, O. (2006) Ontology based document annotation: in annotations. Instead, they are textual notes which are trends and open research problems. International Journal associated with certain sections of the documents they describe. of Metadata, Semantics and Ontologies, Vol. 1, No. 1, pp. 47-57. The Semantic Markup Tool [9] has user interface that is [2] Handschuh, S. and Staab, S. (2002) Authoring and generated according to an annotation schema in a similar way as annotation of web pages in CREAM. Proceedings of the is done in Saha. It uses Information Extraction techniques to 11th international conference on World Wide Web, find different kinds of entities in documents and proposes them Honolulu, USA. for values of the annotation’s properties. The schemas it supports are relatively simple and it cannot be thus used to [3] Handschuh, S., Staab, S. and Ciravegna, F. (2002) S- describe more complex semantic relations. The Ont-O-Mat [2], CREAM – Semi-automatic CREAtion of Metadata, in turn, can be used to describe diverse semantic structures as Proceedings of the 13th International Conference on well as to edit ontologies. It also has a support for automated Knowledge Engineering and Knowledge Management annotation. The user interface of the Ont-O-Mat is not, however, (EKAW 2002), Siguenza, Spain. very well suited for the annotators unfamiliar with concepts [4] Holi, M., Lindgren, P., Suominen, O., Viljanen, K. and related to ontologies and the semantic annotation in general. Hyvönen, E. (2006) TerveSuomi.fi – A Semantic Health Another example of the user interface of an annotation tool Portal for Citizens. Proceedings of the 1st Asian Semantic requiring understanding of the Semantic Web concepts can be Web Conference (ASWC2006), Beijing, China, poster found in SMORE [8]. papers. Most of the current annotation systems, like the ones mentioned [5] Hyvönen, E. and Makelä, E. Semantic autocompletion. here, are applications that run locally on the annotator’s Proceedings of the 1st Asian Semantic Web Conference computer. Because of this, the systems may not necessarily be (ASWC2006), Beijing, China. Springer-Verlag, forth- platform-independent and must always be installed on the user’s coming. system, before the annotation can begin. In Saha, these problems [6] Hyvönen, E., Mäkelä, E., Salminen, M., Valo, A., Viljanen, are addressed by implementing the system as a web-application. K., Saarela, S., Junnila, M. and Kettula S. (2005) By doing so, the system can be installed and maintained centrally and the requirements for the annotator’s computational environment are minimal. The way Saha is designed and 11 implemented also strongly supports the collaboration in http://www.seco.tkk.fi/applications/tervesuomi/ annotation, making the sharing of annotations easy. 12 http://www.seco.tkk.fi/applications/opintie/ 13 3.2 Applications Operational at http://demo.seco.tkk.fi/orava/ 14 Saha is currently a working prototype. It is in trial use for the http://www.seco.tkk.fi/applications/kulttuurisampo/ distributed content creation of the semantic health promotion 15 Operational at http://www.museosuomi.fi/ 16 http://www.seco.tkk.fi/applications/poka/ MuseumFinland - Finnish Museums on the Semantic Web. [11] Känsälä,T. and Hyvönen, E. (2006) A Semantic View- Journal of Web Semantics, vol. 3, no. 2. Based Portal Utilizing Learning Object Metadata. [7] Kahan, J., Koivunen, M.R., Prud'Hommeaux, E. and Swick Proceedings of the Workshop on Semantic Web R.R. (2001) Annotea: An Open RDF Infrastructure for Applications and Tools, the 1st Asian Semantic Web Shared Web Annotations, Proceedings of the 10th Conference (ASWC2006), Beijing, China. Forth-coming. International World Wide Web Conference (WWW10), [12] Reeve, L. and Han, H. (2005) Survey of Semantic Hong Kong, China. Annotation Platforms. Proceedings of the 2005 ACM [8] Kalyanpur, A., Hendler, J., Parsia, B. and Golbeck, J. Symposium on Applied Computing, Santa Fe, USA. ACM (2005) SMORE – Semantic Markup, Ontology, and RDF Press. Editor. Available at: [13] Schreiber, G., Dubbeldam, B., Wielemaker, J.,Wielinga, B. http://www.mindswap.org/papers/SMORE.pdf (2001) Ontology-Based Photo Annotation. IEEE Intelligent [9] Kettler, B., Starz, J., Miller, W. and Haglich, P. (2005) A Systems, 16, 3, pp. 66-74. Template-based Markup Tool for Semantic Web Content. [14] Uren, V., Cimiano, P., Iria, J., Handschuh, S., Vargas-Vera, 4th International Semantic Web Conference ISWC2005, M., Motta, E. and Ciravegna, F. (2006) Semantic Galway, Ireland. Lecture Notes in Computer Science 3729, annotation for knowledge management: Requirements and Springer. pp. 446-460 a survey of the state of the art. Journal of Web Semantics, [10] Komulainen, V., Valo, A. and Hyvönen, E. (2005) A Tool 4(1):14–28, January 2006. for Collaborative Ontology Development for the Semantic [15] Valkeapää, O. and Hyvönen, E. (2006) A Browser–based Web. Proceedings of International Conference on Dublin Semantic Annotation Tool for Distributed Content Core and Metadata Applications (DC 2005), Madrid, Creation. Proceedings of the 1st Asian Semantic Web Spain. Conference (ASWC2006), Beijing, China, poster papers.