A Browser-based Tool for Collaborative Distributed
Annotation for the Semantic Web
Onni Valkeapää Eero Hyvönen
Helsinki University of Technology (TKK) Helsinki University of Technology (TKK) and
Semantic Computing Research Group (SeCo) University of Helsinki
P.O. Box 5500, Otaniementie 17 Semantic Computing Research Group (SeCo)
FI-02015 TKK P.O. Box 5500, Otaniementie 17
http://www.seco.tkk.fi/ FI-02015 TKK
onni.valkeapaa@tkk.fi http://www.seco.tkk.fi/
eero.hyvonen@tkk.fi
ABSTRACT web portals. It is being applied to various applications within the
This paper presents a prototype of an ontology-based semantic National Semantic Web Ontology Project in Finland
annotation tool Saha. The tool eases the process of creating (FinnONTO)2.
ontological descriptions of documents by providing a simple
user interface that hides the complexity of ontologies from 2. SAHA ANNOTATION SYSTEM
annotators. Saha is used with a web browser, and it supports
collaborative distributed creation of metadata by centrally 2.1 Design rationale and implementation
storing annotations, which can be viewed and edited by different In order to develop an annotation system that would be easy to
annotators. Concepts defined in external ontologies can be use and that would support the creation of semantically rich
imported and used in annotations by connecting Saha to metadata, we identified four basic requirements for our system.
ontology servers. The tool is being tested in practical semantic These were also features that we felt were not supported well
portal projects. enough in many of the current annotation platforms:
1) The system should, as a rule, hide technical concepts
Categories and Subject Descriptors related to markup languages and ontologies from its user.
H.3.1 [Information storage and retrieval]: Content Analysis Typically, this means e.g. hiding URIs and complex class
and Indexing – Indexing methods. hierarchies from the annotators. This should not, however,
be done at the expense of expressiveness of the
annotations.
General Terms
Design, Experimentation. 2) Annotations should be based on annotation schemas,
which are ontologies that define the structure of
annotations and guide annotators in their task. The system
Keywords should form its user-interface automatically according to
Semantic Annotation, Ontologies, Annotation Schema. the annotation schema loaded in it.
3) The system should support collaborative distributed
1. INTRODUCTION annotation, where the annotation process can be shared
Provision of semantically rich, ontology-based metadata is one among different annotators at different locations.
of the major challenges in developing the Semantic Web. In
4) In order to make the system platform-independent from
recent years, various annotation systems have been developed to
the annotator’s point of view, it should be implemented as
face this challenge [14]. There is, however, a lack of systems
a web application.
that 1) can be easily used by annotators unfamiliar with
technical side of the Semantic Web, and that 2) are able to It has been widely argued that automation is needed in the
support distributed creation of semantic metadata based on annotation for the Semantic Web [1],[12],[14]. There are,
complex metadata annotation schemas (ontologies). In this however, limitations to what can be done automatically. These
paper, we present an annotation tool, Saha1 [15], aiming to limitations usually lead to either missing or incorrect
satisfy these needs. Saha is browser-based in order to support annotations (low recall/precision) [14]. For example, it is
wide and distributed usage. It has simple user interface that difficult for an automated system to recognize semantic relations
hides complexity of ontologies from the annotator, and adapts between entities it has extracted from a document [3]. Due to the
automatically to different metadata schemas. Saha supports limitations related to automation, most of the current
collaborative annotation of web-documents and it can utilize (semi)automatic annotation systems still need human
ontology services for sharing URIs and importing concepts intervention at some point in the annotation process [12]. In
defined in various external ontologies. The tool is targeted Saha, our primary goal has not been the automation of the
especially for creating metadata of web resources in semantic annotation process, but rather to support the creation of
1 2
http://www.seco.tkk.fi/applications/saha/ http://www.seco.tkk.fi/projects/finnonto/
annotations that cannot be produced automatically. Although annotation classes of an annotation schema and is able to search
requiring a lot of work, such annotation can be seen as a and open existing annotations or create a new one. The
collaborative effort, comparable to the creation of different kinds annotator can also create a new subclass for an annotation
of Wikis3. schema class in order to specify the class hierarchy. This is an
optional feature and can be disabled. In order to keep the user
The basic architecture of Saha is depicted in figure 1. It consists
interface as simple as possible, more elaborate ontology-editing
of the following functional parts:
features, such as creating new properties, are excluded. A new
1) Saha application, which is run on a web-server. It stores annotation is created by choosing a class and typing the URL of
and distributes annotations and creates web-pages which the document to be annotated. Existing annotations can be
form the user interface used in creating annotations. searched by their labels or by the documents they annotate.
2) PostgreSQL-database, which is used to store the Jena’s
ontology model containing schema and annotations.
3) Annotators using web browsers to interact with the
system.
4) The ONKI ontology-service, which is used to fetch
concepts defined in external ontologies and to share
instances created by the annotators.
5) Applications using the annotations created with Saha.
Annotations can be retrieved in RDF/XML using HTTP-
GET
Figure 1. Architecture of Saha
Saha is a web application implemented using the Apache
Cocoon4 and Jena5 frameworks. It is designed as a web
application in order to impose as little requirements as possible
to the end user’s computational environment. To use Saha, all an
annotator needs is an appropriate web browser with an Internet
connection. Saha uses extensively techniques such as Javascript
and Ajax6 in order to provide annotator with simple and versatile
user interface.
Figure 2. User interface of Saha: the class selection page
2.2 User interface
The user interface of Saha is comprised of two main pages. In In the second page of Saha’s user interface, depicted in figure 3,
the first page, depicted partly in figure 2, the annotator sees the an annotator can edit annotations and view the documents to be
annotated. With Saha, any kind of a document (HTML-page,
3 PDF-document etc.) that can be referenced by a URI can be
http://en.wikipedia.org/wiki/Wiki
annotated. However, the web browser being used sets the limits
4
http://cocoon.apache.org/ to the kinds of documents that can be viewed in Saha’s user
5
http://jena.sourceforge.net/
6
http://www.w3.org/2006/webapi/
Figure 3. User interface of Saha: the annotation page
interface7. The annotation page shows properties of the selected In Saha, annotations are instantiated classes and properties of an
annotation class in a simple form, which can be used to supply annotation schema, which are linked to the document being
values for the properties. In figure 3, an annotation belonging to described. The linking plays an essential role in annotations,
the class “Article” and annotating the document with the URL because 1) an annotation is separate from the document it
“http://www.seco.tkk.fi” is shown. The annotation has one literal describes, and 2) the way linking is done affects the meaning of
value defined for the property “Title”. The annotator can edit the the relation between the annotation and the document. There are
value by clicking on it. Defining an object value, such as two ways to associate an annotation with a document in Saha.
“Creator” shown in figure 3, is explained in detail in subsection The first one makes the assertion that the document is an
2.6. instance of one specific class defined in the schema. It can be
thus used efficiently to classify documents. An example of this
2.3 Annotations in Saha kind of an annotation is expressed in RDF/XML below:
Annotations created in Saha are based on different metadata
annotation schemas, which are defined in OWL. In typical use
Semantic Computing Research Group
scenario, a schema describes some specific area of interest and is
created by an administrator of annotation project. The purpose
of an annotation schema is to define a description template for The second method associates an annotation with the document
annotation construction [13]. The schema helps annotators to by using a named property, which is defined in the annotation
describe resources in a consistent way and it can be effectively schema. This idea is similar, e.g., to the usage the property
used to construct a generic user-interface for the application. “annotates” in Annotea [7]:
7
The document being annotated is viewed in a frame. If the web Semantic Computing Research Group
browser cannot view the document due to unsupported format,
a blank page is displayed.
An annotation created with Saha describes a document as a meta-schema of Saha is used to define, among others, the
whole, in a sense that it is not being associated with any following settings for an annotation project:
particular section, sentence or word inside the document. If we
wanted to annotate certain parts of the document, we would • The classes of annotation schema, which are shown on
have to use some fragment identifiers, such as XPointers8, and the class selection page (see figure 2) and which can
could only annotate documents which support fragment thus be used to create new annotations.
identification. The reason why we are not using any identifiers • The property (or properties) of a class, whose value is
is, that in our approach the main goal of annotations is to put a used as rdfs:label of an instance of the class. These are
document in relation to other documents and to serve as a typically literal properties, like the property “name” of
semantically described index, which can be used e.g. to the class “Person”.
categorize documents or to search a particular document among
a group of documents. This is similar to the idea proposed in • The order of properties of a class, in which they are
[9]. Our way of using annotations differs from the approach shown on the annotation page (see figure 3).
where annotations are used as additional pieces of information
• The classes of the annotation schema whose instances
associated with some specific parts of a document and are shown
are exported to an ontology service. This feature will be
to the reader. This kind of an approach is used e.g. in Annotea.
explained later in subsection 2.7.
As stated earlier, an annotation in Saha is an instance of a
The meta-schema plays an important role in improving the
schema’s class that describes some document and is being linked
usability of the system. For example, when an annotator creates
to it using the document’s URI. We make a distinction between
a new annotation, the rdfs:label can be automatically constructed
the annotation of a document and the description of some other
for the annotation instance according to the meta-schema, saving
resource or concept that is somehow related to the document
the annotator from defining it manually. The meta-schema
being annotated. In addition to containing classes and properties
improves the usability of the system, but it may also enable the
used to annotate documents, an annotation schema used with
use of a same annotation schema in different annotation projects
Saha can also contain classes and properties for describing
by allowing the schema to be used differently in each project.
resources that are not documents. In other words, an annotation
For example, there may be a different set of annotation classes
schema can form a basis for the local knowledge base (KB) that
shown in the class-selection page of each project. The use of the
contains descriptions of different kinds of resources that may or
same annotation-schema in different projects enhances the
may not exist on the web. These descriptions or KB instances
interoperability of annotations as it decreases the need to
can be used as values of properties in annotations. The KB is
develop distinct annotation schemas for different projects. In
refined and extended when new annotations are produced, as
other words, when the same schema is used in different
they require new KB instances to be created.
annotation projects, there is no need to specify mappings
between annotations created in them.
2.4 Annotation projects
Saha supports collaborative annotation and sharing of
annotations through annotation projects. Each annotation
2.6 Creating and using KB instances
An annotation schema’s object properties may have two
schema loaded to Saha forms an annotation project, which can
different kinds of values. They can be either concepts defined in
have multiple users as annotators. In practice, an annotation
an external ontology or, alternatively, KB instances. The
project is a Jena’s ontology model stored in a database.
rdfs:range can be used to define the types of instances that are
Each user of an annotation project sees all the annotations and allowed as values of a property. If rdfs:range is not defined, an
KB instances of the project and can edit them, as well as create annotator may choose any type of resource he or she wants to
new ones. An annotation project can belong e.g. to an use. When a concept of an external ontology is to be used as a
organization or some other group of people, that are producing value of a certain property, the project’s meta-schema must
annotations using a certain annotation schema. Even though the contain a mapping between the property and the ontology-server
annotations and KB-instances created in one project cannot be hosting the external ontology. According to this mapping, Saha
directly edited and used in other projects, KB-instances can be can send the query used to find a concept to the right ontology-
exchanged between projects to allow semantic relations between server.
them (see subsection 2.7).
When defining an instance value for a property, the annotator
must first check if the annotation project’s KB already contains
2.5 Meta-schema an applicable instance to be used as a value. This is to prevent
In addition to describing annotations’ structure in an annotation annotators from creating multiple KB instances that all refer to
schema, we also need a way to define how the schema is actually the same resource. If an appropriate instance cannot be found,
used during the annotation process. Although rules defining the the annotator can create a new one. Figure 4 illustrates the input-
use of a schema could in most cases be expressed using the field of object property named “Creator”, which has a class
schema itself, it is often useful to separate the schema design named “Person” in its range. An annotator has typed in “tom” in
from its use [2]. In Saha, this is done using the meta-schema, the input-field. The system has searched the KB for the instances
which is a simple RDF-file that describes how a certain of the class “Person” with an rdfs:label value that contains a
annotation schema is used in a particular annotation project. The (sub)string “tom”. The search is done on the background using a
technique similar to semantic autocompletion [5], a semantic
8
http://www.w3.org/XML/Linking extension for the idea of completing input search keywords
online, proposed in Google Suggest9. The results of the KB 2.7 Utilizing ontology services
search are shown in a menu appearing below the input-field after An important feature of Saha is its ability to connect to the
the search. The annotator may choose to pick one of the ONKI10 ontology service framework developed in the
instances returned by the query, list all instances of the class FinnONTO project [10]. It allows annotators to import concepts
“Person” or create a new one. defined in external ontologies and also to share KB-instances
with other annotators that work on different annotation projects.
This kind of instance sharing and use of shared ontologies is
vital when pursuing the semantic interoperability between
different Semantic Web systems.
Figure 6 illustrates an example of how the ONKI-service can be
used when creating annotations in Saha. An annotator is
defining a value for an object property and uses the ONKI-
browser to find a concept defined in the ontology of an ONKI-
service. In this case, the annotator is browsing the Finnish Upper
Ontology YSO. When the annotator finds a concept he or she
wants to use, it can be imported to Saha by clicking the link
named “Fetch Concept”. This will set the URI of the selected
Figure 4. Instance-search using autocompletion concept as the property’s value in an annotation being created.
Another way of finding and importing a concept from the
ONKI-service is identical to the instance KB search presented in
If the annotator chooses to create new instance, a dialog box is
section 2.6. When using it, an annotator does not have to use the
opened providing fields to supply values for instance’s
ONKI-browser to locate the concept, but is able to find it with
properties. Figure 5 illustrates a dialog box, which is used to
semantic autocompletion.
create a new instance of the class “Person”. The input form of
the dialog box has the same functionality as the annotation page
of Saha has. The instance is created and stored in the KB by
closing the dialog box. After that, the instance will be available
to be used in annotations.
Figure 6. Using the ONKI-browser
Figure 5. Dialog box for creating new KB-instance In addition to providing a way to browse and use concepts
defined in external ontologies such as YSO, ONKI-service can
be used to share the instances created in different annotation
The previous example showed how new instances can be created projects. This is done by declaring an annotation schema’s class
on the fly when annotating with Saha. The classes used to create “public” in the project’s meta-schema. When the annotator
new instances need not be as simple as the class “Person” creates an instance of public class, the instance’s data will be
presented here. Instead, they may well have some object- sent to the ONKI-service. After this, the instance’s URI can be
properties that require other instances as values and so forth. used in other annotation projects using the same ONKI-service.
Saha can be thus used to describe diverse semantic relations and The mechanism described here is practical, as it allows the
structures. creation and use of both public and private instances in projects.
9 10
http://labs.google.com/suggest/ http://www.seco.tkk.fi/applications/onki/
3. DISCUSSION portal Tervesuomi.fi11 [4]. Here much of the content and
Ontology-based semantic annotations are needed when building metadata for the portal will be provided by health experts
the Semantic Web. Although various annotation systems and working at various health organizations in Finland. Saha has
methods have been developed, the question of how to effectively also been tested in metadata creation for the Opintie12 portal, a
produce quality metadata still remains largely unanswered. follow-up version of the educational semantic portal Orava13
Automated systems are of necessity when masses of documents [11] using Learning Object Metadata (LOM).
are being annotated. On the other hand, when there is a need to Initial feedback from end-users not involved in developing the
express more complicated semantic structures and when the software has been promising but further experimenting is still
precision and quality of annotations are important factors, needed.
manual systems are still needed. This shows that different
approaches to semantic annotation should not be seen as 3.3 Future work
mutually exclusive, but rather completing each other. We have Future plans include using Saha to provide metadata for
tried to tackle the problem of creating semantically rich additional semantic portals such as CultureSampo14, the next
annotations by developing an annotation system Saha that generation version of the MuseumFinland15 portal [6]. We also
supports the distributed creation of metadata and that can be aim to research the integration of the semiautomatic annotation
easily used by non-experts in the field of the Semantic Web. framework Poka16 with Saha.
3.1 Contributions and related work
A number of semantic annotation systems and tools exist today ACKNOWLEDGEMENTS
[12],[14]. These systems are primarily used to create and This research is a part of the FinnONTO project funded mainly
maintain semantic metadata descriptions of web pages. by the Finnish Funding Agency for Technology and Innovation
(Tekes).
Annotea [7] supports collaborative, RDF-based markup of web
pages and distribution of annotations using annotation servers.
Annotations created with Annotea can be regarded as semi-
REFERENCES
formal, since it does not support the use of ontological concepts [1] Corcho, O. (2006) Ontology based document annotation:
in annotations. Instead, they are textual notes which are trends and open research problems. International Journal
associated with certain sections of the documents they describe. of Metadata, Semantics and Ontologies, Vol. 1, No. 1, pp.
47-57.
The Semantic Markup Tool [9] has user interface that is [2] Handschuh, S. and Staab, S. (2002) Authoring and
generated according to an annotation schema in a similar way as annotation of web pages in CREAM. Proceedings of the
is done in Saha. It uses Information Extraction techniques to 11th international conference on World Wide Web,
find different kinds of entities in documents and proposes them Honolulu, USA.
for values of the annotation’s properties. The schemas it
supports are relatively simple and it cannot be thus used to [3] Handschuh, S., Staab, S. and Ciravegna, F. (2002) S-
describe more complex semantic relations. The Ont-O-Mat [2], CREAM – Semi-automatic CREAtion of Metadata,
in turn, can be used to describe diverse semantic structures as Proceedings of the 13th International Conference on
well as to edit ontologies. It also has a support for automated Knowledge Engineering and Knowledge Management
annotation. The user interface of the Ont-O-Mat is not, however, (EKAW 2002), Siguenza, Spain.
very well suited for the annotators unfamiliar with concepts [4] Holi, M., Lindgren, P., Suominen, O., Viljanen, K. and
related to ontologies and the semantic annotation in general. Hyvönen, E. (2006) TerveSuomi.fi – A Semantic Health
Another example of the user interface of an annotation tool Portal for Citizens. Proceedings of the 1st Asian Semantic
requiring understanding of the Semantic Web concepts can be Web Conference (ASWC2006), Beijing, China, poster
found in SMORE [8]. papers.
Most of the current annotation systems, like the ones mentioned [5] Hyvönen, E. and Makelä, E. Semantic autocompletion.
here, are applications that run locally on the annotator’s Proceedings of the 1st Asian Semantic Web Conference
computer. Because of this, the systems may not necessarily be (ASWC2006), Beijing, China. Springer-Verlag, forth-
platform-independent and must always be installed on the user’s coming.
system, before the annotation can begin. In Saha, these problems [6] Hyvönen, E., Mäkelä, E., Salminen, M., Valo, A., Viljanen,
are addressed by implementing the system as a web-application. K., Saarela, S., Junnila, M. and Kettula S. (2005)
By doing so, the system can be installed and maintained
centrally and the requirements for the annotator’s computational
environment are minimal. The way Saha is designed and
11
implemented also strongly supports the collaboration in http://www.seco.tkk.fi/applications/tervesuomi/
annotation, making the sharing of annotations easy. 12
http://www.seco.tkk.fi/applications/opintie/
13
3.2 Applications Operational at http://demo.seco.tkk.fi/orava/
14
Saha is currently a working prototype. It is in trial use for the http://www.seco.tkk.fi/applications/kulttuurisampo/
distributed content creation of the semantic health promotion 15
Operational at http://www.museosuomi.fi/
16
http://www.seco.tkk.fi/applications/poka/
MuseumFinland - Finnish Museums on the Semantic Web. [11] Känsälä,T. and Hyvönen, E. (2006) A Semantic View-
Journal of Web Semantics, vol. 3, no. 2. Based Portal Utilizing Learning Object Metadata.
[7] Kahan, J., Koivunen, M.R., Prud'Hommeaux, E. and Swick Proceedings of the Workshop on Semantic Web
R.R. (2001) Annotea: An Open RDF Infrastructure for Applications and Tools, the 1st Asian Semantic Web
Shared Web Annotations, Proceedings of the 10th Conference (ASWC2006), Beijing, China. Forth-coming.
International World Wide Web Conference (WWW10), [12] Reeve, L. and Han, H. (2005) Survey of Semantic
Hong Kong, China. Annotation Platforms. Proceedings of the 2005 ACM
[8] Kalyanpur, A., Hendler, J., Parsia, B. and Golbeck, J. Symposium on Applied Computing, Santa Fe, USA. ACM
(2005) SMORE – Semantic Markup, Ontology, and RDF Press.
Editor. Available at: [13] Schreiber, G., Dubbeldam, B., Wielemaker, J.,Wielinga, B.
http://www.mindswap.org/papers/SMORE.pdf (2001) Ontology-Based Photo Annotation. IEEE Intelligent
[9] Kettler, B., Starz, J., Miller, W. and Haglich, P. (2005) A Systems, 16, 3, pp. 66-74.
Template-based Markup Tool for Semantic Web Content. [14] Uren, V., Cimiano, P., Iria, J., Handschuh, S., Vargas-Vera,
4th International Semantic Web Conference ISWC2005, M., Motta, E. and Ciravegna, F. (2006) Semantic
Galway, Ireland. Lecture Notes in Computer Science 3729, annotation for knowledge management: Requirements and
Springer. pp. 446-460 a survey of the state of the art. Journal of Web Semantics,
[10] Komulainen, V., Valo, A. and Hyvönen, E. (2005) A Tool 4(1):14–28, January 2006.
for Collaborative Ontology Development for the Semantic [15] Valkeapää, O. and Hyvönen, E. (2006) A Browser–based
Web. Proceedings of International Conference on Dublin Semantic Annotation Tool for Distributed Content
Core and Metadata Applications (DC 2005), Madrid, Creation. Proceedings of the 1st Asian Semantic Web
Spain. Conference (ASWC2006), Beijing, China, poster papers.