=Paper=
{{Paper
|id=Vol-273/paper-11
|storemode=property
|title=BOWiki - a collaborative annotation and ontology curation framework
|pdfUrl=https://ceur-ws.org/Vol-273/paper_46.pdf
|volume=Vol-273
|dblpUrl=https://dblp.org/rec/conf/www/BackhausKBHHLV07
}}
==BOWiki - a collaborative annotation and ontology curation framework==
BOWiki - a collaborative annotation and ontology curation
framework
Michael Backhaus Janet Kelso
Max Plank Institute for Max Plank Institute for
Evolutionary Anthropology Evolutionary Anthropology
Deutscher Platz 6 Deutscher Platz 6
04103 Leipzig, Germany 04103 Leipzig, Germany
michael backhaus@eva.mpg.de kelso@eva.mpg.de
ABSTRACT to enrich the description of functions in biomedical ontolo-
As the amount of data being generated in biology has in- gies. Applying OF introduces new, more specific relations
creased, a major challenge has been how to store and rep- such as hasFunction and isRealization for the annotation of
resent this data in a way that makes it easily accessible to gene products. This framework allows for the preservation
researchers from diverse domains. Understanding the rela- of the original ontological structure while providing a more
tionship between genotype and phenotype is a major focus detailed description of the function and the genes involved.
of biological research. Various approaches to providing the To date the majority of gene annotation to biomedical
link between genes and their functions have been undertaken ontologies has been performed by expert curators who use
– most require significant and dedicated manual curation. the public literature to determine the appropriate ontolog-
Advances in web technologies make possible an alternative ical terms for the annotation of genes and gene products.
route for the construction of such knowledge bases - large- This method provides a highly accurate annotation, but is
scale community collaboration. We describe here a system, manually intensive and time-consuming. To overcome this
the BOWiki, for the collaborative annotation of gene infor- bottleneck, automatic electronic annotation was also imple-
mation. We argue that a semantic wiki provides the func- mented, but this is generally less reliable than manual cura-
tionality required for this project since this can capitalize tion. The ability to tap the expertise of the large biological
on the existing representations in biological ontologies. We community potentially provides a stable, long-term strat-
describe our implementation and show how formal ontolo- egy for maintaining and extending the information captured
gies could be used to increase the usability of the software about gene functions. To this end we have developed the
through type-checking and automatic reasoning. BOWiki, which provides a vehicle for the collaborative an-
notation of genes with concepts in the biomedical ontologies,
as well as a suitable framework for the detailed description
Categories and Subject Descriptors of functions as specified by OF.
H.3.5 [On-line Information Services]: Web-based ser- The aim of the BOWiki is to capitalize on community
vices; H.5.3 [Group and Organization Interfaces]: Col- knowledge to build an accurate and easy-to-use knowledge
laborative computing; K.4.3 [Organizational Impacts]: base about genes and gene functions [5]. Since the target
Computer-supported collaborative work; I.2.4 [Knowledge audience of the BOWiki is largely made up of molecular
Representation Formalisms and Methods]: Represen- biologists with little experience of ontologies or formal log-
tations ics, the primary design principle was useability. To this end
both data entry and retrieval make extensive use of a sim-
General Terms ple syntax, graphical representations, and web forms where
appropriate. Additionally, BOWiki will provide automated
Ontology of Functions, GFO-Bio consistency checking, which uses a type system based on a
biological core ontology and an upper level ontology in or-
Keywords der to ensure the consistency of the knowledge base during
semantic wiki, ontology curation editing.
We discuss here in some detail how information capture,
storage, searching and quality assurance are managed inside
1. INTRODUCTION the BOWiki.
There has been recent interest in the development of a
public knowledge base for information about genes and gene
functions [11]. Biomedical ontologies such as the Gene On-
2. INFORMATION CAPTURE
tology (GO) [1] provide terms that have been used to anno-
tate genes with information about their function and cellu-
2.1 Wiki
lar location. The Ontology of Functions (OF) [2] provides We evaluated the suitability of various technologies to
an additional function description layer that can be used serve as a framework for curating biomedical knowledge bases.
Due to our need for facilitating intuitive collaboration edit-
Copyright is held by the author/owner(s).
WWW2007, May 8–12, 2007, Banff, Canada. ing, we chose a wiki as the base system for BOWiki [6]. The
. MediaWiki software is used by Wikipedia, probably making
Figure 2: The figure above describes the general
database layout used to store semantic data in
Figure 1: The Function Edit Form together with an BOWiki.
example function, ”Sugar transporter activity” and
the corresponding function determinants, is shown.
An example of a three place relation in BOWiki is the fol-
lowing function ascription (assumed to appear on the MAL21
it the most well-known wiki software. MediaWiki further wiki page describing the sugar transporting protein MAL21 ):
supports a well-documented system for creating extensions, [[realizes::function=sugar transporter activity;
making it easy for developers to customize a wiki to meet realization=sugar transport;
their specific needs. The BOWiki is based on the Seman- context=human body]]
tic MediaWiki [10] v0.4, an extension of the MediaWiki that
employs binary relations and attributes, together with a for- This means that the protein MAL21 realizes the function
mal semantics for their use. We extended the Semantic Me- ”sugar transporter activity” (taken from the GO) by means
diaWiki to satisfy to our requirements, and describe these of a process ”sugar transport” (taken from the GO) in the
extensions in the following sections. context of a human body.
2.2 Function Edit Form 2.4 Database layout
The ability to edit biological functions in the BOWiki is In order to store the semantic content of the BOWiki, we
realized through an implementation of the framework pro- extended the Semantic MediaWiki’s database model. The
vided by OF. The OF treats functions as goal -oriented en- database model must accommodate n-ary relations, roles
tities, and provides a formal way to represent functions in and types from BOWiki’s type system (see section 4.1).
a function structure scheme. The scheme consists of a set Therefore, a database layout that differs from the triple-
of labels 1 , a set of requirements 2 , a goal 3 and a functional representation used in the Semantic MediaWiki [10] is re-
item 4 , which together form the function determinants. Via quired. In the BOWiki, we introduce separate tables for
the function edit form (see figure 1), the function determi- storing subjects, relations, roles, types and objects.
nants are presented for editing. When the information con- The database model, as shown in figure 2, is subdivided
cerning the function determinants is stored, concepts with into two parts, one focusing on classes (relations, roles and
function-, requirement- and goal-name are created or modi- types), the other on instances (instances of relations, of roles
fied, together with the appropriate semantic links and struc- and objects). Relations are instantiated each time they ap-
ture. The complete content of the function determinants, as pear with a different set of arguments. These instances ac-
well as a link to the function edit form for further editing, is quire an identifier from the relation instance table. The
presented inside an information box on the function page. arguments table is comprised of the identifiers of the roles
and objects used in a relation-instance. It is, therefore, a
2.3 N -ary relations table of instances of roles. Using the relation-instance and
The OF framework requires the use of n-ary relations. In argument table, the database layout allows storage of n-ary
order to use OF and to allow for intuitive knowledge acquisi- relations, and most operations can be completed by integer
tion, the BOWiki extends the Semantic MediaWiki to allow queries, which ensures rapid content retrieval from a large
n-ary relations between wiki pages. In order to enable the knowledge base.
use of n-ary relations, the BOWiki structures arguments of For the incorporation of automated consistency checks,
relations into roles, or named argument slots. This allows n a table with available types will be added to the current
arguments to be used in an arbitrary order. We believe that database layout, and the types will be linked to the roles that
this provides an intuitive way to describe complex relations may be used in a relation. This allows for the classification
between entities in the BOWiki. This feature is not present of objects occurring as arguments to relations in the wiki.
in most semantic wikis. For the semantics of n-ary relations
in the BOWiki, we use the proposal from the W3C Semantic 3. DATA ACCESS
Web Working Group[8].
1
labels are used as expressions which name a function
3.1 Semantic Search
2
requirements are necessary preconditions Building a structured knowledge base with semantic anno-
3
a goal specifies the part of the world directly affected by tations allows for improving regular search techniques through
the function content based queries. In particular, in the context of gene
4
the functional item describes the role played by a certain annotations, it is important to be able to find correlations
entity in any realization of the function between genes, i.e., to see which gene has a specific relation
4. QUALITY & CONSISTENCY
4.1 Type System
Collaborative curation of a knowledge base may quickly
lead to semantic inconsistencies. A simple example of such
an inconsistency is the use of a part of relation between a
Figure 3: This figure shows a search using the process and an object, which is often prohibited[3]. Another
semantic search tuple with the parameters ob- example may be the classification of some entity as both
ject.1=sugar, object.2=activity, role.2=realization, an object and a process, leading to an inconsistency when
and the result showing the facts about MAL21 ful- objects and processes are considered disjoint. In order to
filling this search. ensure the quality of the wiki’s content, at least some of these
possible inconsistencies must be identified and prevented.
Since the wiki contains descriptions of biological concepts
and entities, we use an upper biological ontology as a type
system.
4.1.1 Core ontology
A core ontology provides definitions for the most basic
concepts within a domain. We developed GFO-Bio [4] as
Figure 4: Shown above is the semantic graph of the a core ontology for biology. GFO-Bio extends the General
concept Epithelial cell differentiation from the Gene Formal Ontology (GFO) [3] by adding definitions for bio-
Ontology, which is generated on the special pages logical concepts and relations. GFO-Bio is formalized in the
for the external ontologies. decidable fragment of the Web Ontology Language (OWL)
[7]. It contains definitions for concepts such as Cell, Biolog-
ical process and Protein.
to another, or which gene plays a certain role in a specific GFO-Bio incorporates higher order categories and em-
relation. In order to allow for content-based searches within ploys an explicit instantiation relation, which are particu-
the BOWiki, the Semantic Search Triple from the Semantic larly relevant for its efficient use in a collaborative knowl-
MediaWiki v0.4 has been expanded into a Search Tuple (see edge acquisition environment, where both concepts and in-
figure 3). This facilitates searching for an arbitrary num- dividuals are described formally. Higher order categories
ber of terms of the following types: subject, relation, object, allow biological concepts to be treated as instances instead
role and attributes. The search function employs numerical of classes in OWL. We believe that this will also result in a
identifiers in order to increase its efficiency. This is in par- more rapid and efficient classification.
ticularly significant when querying large databases, such as
the GO, which contains more than 50,000 relations between 4.1.2 Reasoning
concepts. All BOWiki pages, whether concepts or individuals, are
Inline Queries, as introduced by the Semantic MediaWiki, instances of a class in GFO-Bio; we use instantiation in-
are adapted to the more complex representation schema of stead of subsumption to increase the efficiency in reasoning.
semantic information within BOWiki. Inline queries allow In some sense, reasoning is restricted to the core ontology
for the performance of semantic searches within wiki pages, because all wiki pages appear as OWL individuals for the
representing the results as part of the rendered wiki page. reasoner. Consequently, subsumption links among concepts
in the wiki are currently not exploited, which is appropriate
3.2 External Ontologies for basic type checking and a tribute to the performance of
For referencing, browsing and as a starting point for local current reasoners. Two types of inconsistencies may arise
articles, the BOWiki allows access to several ontologies like that can be avoided.
the Gene Ontology [1] and ontologies about cell types or First, consider the BOWiki concept Hydrogen and the
anatomy. These ontologies are accessible via special pages, GFO-Bio concepts M olecule and Atom. In the BOWiki,
which prohibit changing the ontology’s concepts in the ex- Hydrogen may become an instance of M olecule. After in-
ternal ontology directly, but permit editing or updating the stantiation, we want to prevent a user from making the
local article about this concept. In this way, it is not possi- Hydrogen also an instance of GFO-Bio’s Atom concept, be-
ble to change the external ontologies, but a user may modify cause those concepts are disjoint in GFO-Bio. A molecule
the information contained in these ontologies for use in an consist of atoms, but it is not an atom.
extended description of these concepts within the BOWiki. Second, concepts taken from GFO-Bio may serve as type-
The ontologies are maintained separately from the BOWiki’s restrictions for roles in n-ary relations. Consider the realizes
semantic storage, and are updated regularly to current ver- relation and its roles: the implicit subject role, the function,
sions. It is possible to browse these ontologies graphically, realization and context roles. These are restricted to the
represented as a directed acyclic graph. This graph is au- GFO-Bio concepts: Biological object, Biological function,
tomatically generated from the relations that hold between Biological process and Entity. The use of a particular con-
the concepts in these ontologies, by extracting these infor- cept ”A” to fill the realization role of the realized relation
mations and using the MediaWiki plugin Graph::Easy5 for causes this concept to be classified as a Biological process. If
visualization. the same concept ”A” is used in the same relation, but in the
subject role, it is then classified as both Biological Process
5
http://bloodgate.com/perl/graph/ and Biological Object, resulting in an inconsistency because
in GFO-Bio, Biological Processes and Biological Objects are used for annotating genomic data, as well as for describing
disjoint6 . and defining both biological functions and other biological
Both types of inconsistencies are automatically detected concepts9 .
within the BOWiki using the description logic reasoner Pel-
let6. The description logic reasoner Pellet7 is capable of 7. ADDITIONAL AUTHORS
performing these operations, and will be used for automated
Joshua Bacher, email: bacher@eva.mpg.de, Heinrich Herre,
consistency checking in the next version of the BOWiki.
email: heinrich.herre@imise.uni-leipzig.de, Robert
4.2 Quality rating Hoehndorf, email: hoehndorf@eva.mpg.de, Frank Loebe,
email: frank.loebe@imise.uni-leipzig.de, Johann Vis-
To further improve the quality of the articles, we are con-
agie, email: visagie@eva.mpg.de
sidering the inclusion of a content evaluation system, which
may take the form of a direct evaluation of an article, or
a cumulative score for an author’s body of work. Further 8. REFERENCES
research is needed in order to represent these ratings in the [1] M. Ashburner et al. Gene ontology: tool for the
semantics of BOWiki’s content. unification of biology. The Gene Ontology
Consortium. Nat Genet, 25(1):25–29, May 2000.
5. OUTLOOK [2] P. Burek, R. Hoehndorf, F. Loebe, J. Visagie,
H. Herre, and J. Kelso. A top-level ontology of
The BOWiki is still under development. The main task for
functions and its application in the Open Biomedical
the future will therefore be to enhance the user interface in
Ontologies. Bioinformatics, 22(14):e66–e73, July 2006.
order to make the BOWiki more user-friendly and intuitive.
We are considering integrating the BOWiki with a grid [3] H. Herre, B. Heller, P. Burek, R. Hoehndorf,
platform such as the MediGRID8 , a portal for applications F. Loebe, and H. Michalek. General Formal Ontology
and web-services for biomedical research and the life sci- (GFO) – A foundational ontology integrating objects
ences. Using web-services, the BOWiki may access biomed- and processes [Version 1.0]. Onto-Med Report 8,
ical ontologies directly rather than relying on daily updates. Research Group Ontologies in Medicine, Institute of
Web-services also provide the ability for users to perform Medical Informatics, Statistics and Epidemiology,
queries of the BOWiki content within an integrated web- University of Leipzig, Leipzig, 2006.
service platform. [4] R. Hoehndorf, F. Loebe, R. Poli, J. Kelso, and
For more restricted domains, a more expressive syntax H. Herre. GFO-Bio: A biological core ontology. a new
may be required. For example, logical conjunction, negation methodology for integrating ontologies. Applied
or existential and universal quantification may be allowed. Ontology, 2007. under review.
This may make it possible to extend the type system itself, [5] R. Hoehndorf, K. Prüfer, M. Backhaus, H. Herre,
instead of classifying every article as an instance of a concept J. Kelso, F. Loebe, and J. Visagie. A proposal for a
in the type system. This will facilitate the collaborative gene functions wiki. In R. Meersman, Z. Tari, and
construction of expressive ontologies. However, the main P. H. et al., editors, OTM Workshops 2006, number
limitation will be the efficiency of the available reasoners, 4277 in LNCS, pages 669–678. Springer-Verlag, 2006.
as well as the integration of a syntax for these constructs in [6] B. Leuf and W. Cunningham. The wiki way: Quick
the BOWiki. collaboration on the web. Addison-Wesley, 2001.
To provide a graphical representation of the structured [7] D. L. McGuinness and F. van Harmelen. OWL Web
content, we are considering conceptual graphs [9]. Con- Ontology Language overview. W3C recommendation,
ceptual graphs may provide a more intuitive representation World Wide Web Consortium (W3C), February 2004.
schema for logical formulae then their representation in text http://www.w3.org/TR/2004/
format. REC-owl-features-20040210/.
[8] N. Noy and A. Rector. Defining n-ary relations on the
6. CONCLUSION semantic web.
http://www.w3.org/TR/swbp-n-aryRelations/,
The BOWiki can be used to collaboratively create a knowl-
2006.
edge base about biological domains. It allows for the speci-
fication and description of functions according to the frame- [9] J. Sowa. Conceptual Structures: Information
work laid out by the Ontology of Functions [2]. By em- Processing in Mind and Machine. Addison-Wesley,
ploying the core ontology GFO-Bio as a type system, the Reading, MA, 1984.
BOWiki facilitates the creation of a structured knowledge [10] M. Völkel, M. Krötzsch, D. Vrandecic, H. Haller, and
base while preserving consistency. It further extends an- R. Studer. Semantic wikipedia. In Proceedings of the
other semantic wiki, the Semantic MediaWiki, for a cus- 15th international conference on World Wide Web,
tomized handling of n-ary relations, making knowledge ac- pages 585–594, 2006.
quisition more intuitive. Within the biological and biomed- [11] K. Wang. Gene-function wiki would let biologists pool
ical community, the BOWiki provides a framework for con- worldwide resources. Nature, 439(7076):534, Feb 2006.
structing a high-quality, large-scale knowledge base to be
6
It is possible to discuss articles and the restrictions of rela-
tions in the BOWiki on a discussion page available on each
concept page. 9
The interested reader can find a running system using the
7
http://pellet.owldl.com/ current stable release of BOWiki at http://onto.eva.mpg.
8
http://www.medigrid.de/ de/bowiki