Promotion of Ontological Comprehension: Exposing Terms and Metadata with Web 2.0 Andrew Gibson Katy Wolstencroft Robert Stevens University of Manchester School of Computer Science, Kilburn Building, Oxford Road, Manchester, UK +44 161 275 0649 andrew.p.gibson@manchester.ac.uk ABSTRACT W3C 1 . This next generation Web promises to transform the Knowledge artifacts that have been labeled as ontologies have information Web into a machine computable utopia for many different qualities and intended outcomes. This is semantically described data and information. Despite the particularly true of bio-ontologies where high demand has led to a development of the technologies, there is, however, only little rapid growth in the number of these artifacts. Good evidence of the materialization of the Semantic Web (or Webs). communication between the human agents involved in the life Simple RDFS vocabularies such as Friend of a Friend have cycle of ontologies is essential for the ontologist to encode the provided small views on the potential of the Semantic Web [9]. right knowledge in the ontology. Not only this, but it should be Rich ontological views supported by reasoning have appeared in encoded such that subsequent retrieval of the knowledge from the applications [27, 30, 31], but less so in the Web itself, and when ontology by any agent can be clear and precise. The ontologist they do, they often represent unconnected niche pockets of can encode ontological statements, for interpretation by a interest. computer agent, or meta-ontological statements, for interpretation In contrast, Web 2.0 is in the here and now, in use by large by human agents. We consider how the current communication interconnected user communities, and is ever growing as more between agents and ontologies produces drawbacks that add to the people adopt and contribute to various community efforts. To try considerable overheads associated with ontology development. and specify Web 2.0 would almost be a contradiction in terms, We describe the processes of communication between human and restricting its users with strong recommendations would be agents and ontologies as Ontology Comprehension. We then seen as an attempt to unnecessarily limit the creativity of those suggest how these processes could be augmented, particularly who have something new to try. Taxonomies give way to with the use of Web 2.0 ideas. By exposing and enhancing the folksonomies, letting the user mark-up things lightly on the Web social interactions involved in ontology comprehension, rather than specify a typed URI. The technologies of Web 2.0 development overheads are potentially reduced and the prospect were not specified; they evolved out of clear and present needs of of ontology sharing and reuse is improved. users to connect with one another. The principles of Web 2.0 grow out of a mixture of hindsight and insight to current practice, Categories and Subject Descriptors and revolve around online community building, quick and easy I.2.4 [Artificial Intelligence]: Knowledge Representation linking, unlimited customization in the hands of the masses. In Formalisms and Methods – representations, representation this article we use ‘Web 2.0’ to refer to these principles rather languages. than any specific technology. It has not gone unnoticed however that the artifacts, such as General Terms vocabularies and ontologies, that will support the Semantic Web Design, Human Factors, Standardization, Languages. need populating [25, 26], and for this to happen, both the technology and the nature of ontology building need to be accessible to the masses. Similarly in the computer science view Keywords of knowledge artifacts such as ontologies inherently have this Ontologies, Semantic Web, Web 2.0, OWL, Ontology community aspect—they are shared conceptualizations that aim to Comprehension. enable both human and computational interoperation of diverse resources at a semantic level. 1. INTRODUCTION The simplicity and robustness of HTML fuelled the growth of the The technologies of the Semantic Web [6] have been centrally current Web, but the highly-specified nature of the technologies conceived, specified and designed with recommendations by the in the Semantic Web recommendations suggests that the semantic side of the development, delivered through ontologies, will be driven mostly by experts. In this way, it is key that somehow this Copyright is held by the author/owner(s). barrier of complexity is lowered through creating an easier user WWW 2007, May 8--12, 2007, Banff, Canada. 1 http://www.w3.org/2001/sw/ experience, and that the motivators that are driving Web 2.0 are domains, and have the virtues of being sharable and reusable. As harnessed to promote uptake of Semantic Web ideas. yet, it is difficult to find an ontology that could be said to have In this paper we consider the social and communication been designed to fit the criteria for enabling a Semantic Web by dependent aspects of the ontology development life cycle, and being domain general and rich in content. One prominent example identify problems encountered by people with specific roles of of an ontology approaching these criteria is the Foundational interaction. From this, we suggest that a clear, layered separation Model of Anatomy (FMA) [12, 23]. The FMA could be said to be is made between statements in ontologies that are logical and more of a true domain ontology (or reference ontology) than any those that are linguistic, supporting annotations on the ontology. other in bio-medicine. However, even the FMA has barriers to the In doing so, the annotations can be exposed to the collaborative Semantic Web goals of sharing and reuse because of its large size, aspects of Web 2.0, promoting light discussion at the level of perhaps because it was developed in Frames and later converted natural language about the meanings of terms, whilst leaving the to OWL. heavier encoding of knowledge into OWL as a task for In computer science, what are called ontologies covers a broad ontologists. range of knowledge artifacts. Glossaries, vocabularies, thesauri, informal and formal ontologies (both in language and ontological 2. ONTOLOGIES AND DEVELOPMENT discrimination) are all used at various points in the Semantic The central premise of the Semantic Web is enabling Web. Different levels of expressiveness (sometimes called computational processing of Web resources through knowledge formality) come from the purpose and demands of the ontology artifacts. The W3C have provided the Resource Description being developed [28]. These demands can be considered with Framework (RDF) and the Web Ontology Language (OWL) increasing levels of expressiveness from very “light-weight” term recommendations. The latter, particularly in its OWL-DL variant, lists, thesauri, dictionaries or hierarchies up to “heavy-weight” is offered as a means of building robust property based with very expressive constraints [10, 25]. OWL-DL offers a descriptions with a logical underpinning that can be used to formal language and can be used to build rich, logical provide vocabulary for describing Web content, but also support representations of descriptions of what exists; it can also be used, reasoning across Web content [20]. Such ontologies are to be the in various forms, to develop other forms of knowledge artifact semantic backbone for linking resources in the Semantic Web. while still retaining strict language semantics in the Additionally, these ontologies are to represent knowledge of representation, but weakening the ontological distinctions made in Figure 1: Ontology Comprehension: Current model of interactions between various agents and an ontology, as described in Section 3. The human agents are not necessarily different individuals, but rather are separated here by the roles fulfilled in the development and inspection processes. the knowledge artifact. of the ontology development life cycle, the ontologist (assuming they have no domain knowledge) will usually rely on the domain Building OWL-DL logic based ontologies is a difficult process expert to provide a core set of terms from the domain of interest [21] and reaching a community consensus is hard, especially in as a starting point. The initial scope of the ontology, rather than complex domains such as biology, where knowledge for making being rigidly defined, is often roughly determined from the initial ontological distinctions can be incomplete. These issues need to term list and this will get refined as things move on. At this early be addressed if ontologies are to play their role in the Semantic stage it is necessary for the domain expert to be able to quickly Web. Here, we are mostly interested in the aspect of reaching a assess if the terms are appropriate. As things are, the easiest way community consensus. Focus is often placed on the aspect of to do this is for the domain expert to be able to access the collaborative ontology building, that is, a group of people ontology for themselves and browse the hierarchy of terms, whilst working directly with one ontology. We do not aim to discuss this checking and adding in textual annotations for the terms, as well type of system, as we see such systems as expert systems for as any comments about the specific or contextual use of any of logic-savvy ontologists rather than currently being suitable for the terms. “the masses”. Much more work needs to be done on enabling true collaboration in logic based ontologies. Instead, we currently The ontologist will be using one of the commonly available envisage a core of expertise for logic encoding supported by ontology development tools such as Protégé-OWL 2 , Swoop 3 and people conceptualizing and gathering linguistic material. We OBO-Edit 4 . All of these tools are centered on the user interacting acknowledge that there is a wealth of methodologies that address with a class hierarchy view, which the ontologist will be building certain aspects of the ontology development lifecycle [10, 29] and from the terms given to them by the domain expert. At this stage, evaluation [8, 24], good reviews of these fields can be found in the domain expert will primarily be concerned with having the the references. For the purposes of this paper, we wish to focus on correct term-definition pairs represented in the proto-ontology. the social interactions during these processes rather than the Decisions regarding the class hierarchy signal the beginning of a processes themselves. slightly more complex level of expressivity, as the ontologist will be making assertions between classes about subsumption relationships [14]. This is especially true of OWL ontologies, and 3. ONTOLOGY COMPREHENSION such decisions do not necessarily need to be considered for We learn from the field of software engineering that effective simpler controlled structured vocabularies in which hierarchical reuse of elements of object oriented frameworks is reliant on relationships “broader than” and “narrower than” are possible. many levels of understanding from the point of view of the The ontologist may also start to guide the domain expert in how programmer [4, 15]. In software engineering, improving these to transfer knowledge regarding some of the more fundamental levels of understanding is known as “software comprehension”, object properties such as part-hood. and we extend the principles to ontology development. We outline ontology comprehension as the interaction between human At some point, the domain experts need to let the ontologists start agents and the knowledge expressed in an ontology. to make even more expressive assertions in the ontology that they may not necessarily understand the implications of for Figure 1 outlines the interactions between various agents and an themselves. This signals the next stage of ontology development, ontology that are considered in this section. There are two main in which the balance shifts so that the ontologist starts to refine modes in which ontology comprehension is important: the assertions in the ontology. Instead of being instructed and 1. Development mode. Ontology development requires guided by the domain expert, the ontologist now needs to ask that there is efficient interaction between experts that careful questions of the domain expert. The aim of these questions represent the knowledge of the domain in the scope of should be to extract the intrinsic meaning of the terms that the the ontology (domain experts) and the ontologist that is domain expert has provided so that the ontologist can encode responsible for the construction and continued these meanings into the ontology using more and more expressive maintenance of the ontology. Here we assume a model restrictions and axioms. Significantly, unless the domain expert where, for a specific ontology development exercise, has had training in understanding the meanings of logical there is a limited cohort of domain experts that are assertions of ontologies, they will still primarily rely on the involved with an ontologist. lexical annotations and definitions when evaluating the ontology. 2. Inspection mode. Ontology inspection is a light Once the content of the ontology has begun to stabilize (i.e. there evaluative process that an agent will go through the are fewer major revisions in the content of the ontology being ontology to quickly assess whether or not that ontology made) it will be made available to a wider audience. This can is of good quality and whether what it contains is signal a whole new critical process of revision for the ontology. In suitable for some specific needs of the inspector. the next section we will consider what sort of interactions may What follows is an outline of task models that highlight how occur between different agents and ontologies when they are first currently, the interactions of agents involved with ontologies encountered. leads to discrepancies in ontology comprehension. Eventually, the increase in the content of the ontology, both lexical and logical, should start to level off as the content and the 3.1 Task Model 1: Ontology Development intended scope, at which time further structural modifications We consider early ontology development as a process that begins may be made, such as modularization, which could happen once with the lightest possible knowledge structure, essentially a term list, and subsequently moves up through levels of complexity and 2 expressiveness of the types discussed in [10]. This happens http://protege.stanford.edu/ 3 socially as well as in the ontology as all those involved in the http://code.google.com/p/swoop/ development become more familiar with scope. At the beginning 4 http://www.oboedit.org/ the micro-organization of the knowledge in a domain has become It is hard not to liken an ontology inspection process to some sort clear. A publicly available and relatively stable ontology has a of evaluation. What we describe here is fairly close to ontology new set of requirements, for which the topics of ontology selection [24], except that ontology inspection is more of a evolution and change management address [19]. Change browsing process, driven by what access there is to comparative management of ontologies has been considered in a technological information between several ontologies. Selection has much better sense for some time, and it should be clear that changes to a defined initial parameters for the desired outcome, and can give a publicly available ontology need to be transparent. However, more targeted outcome. We do not wish to label this inspection as there is a growing trend for including extra hierarchical structures an evaluation however, as we do not make the assumption that the into the ontology that represent deprecated classes (e.g. [30]). The inspector will be following any pre-determined criteria, and if need to do this is obvious; it is less so how to do it neatly and they are, that they are rational criteria. ontologically. Versioning etc. are all parts of the ontology life- The ontology inspection process is short lived, and for many cycle that have no really, consistent support. people’s goals, the choice of beginning a new ontology that they The following discrepancies in ontology comprehension should know will satisfy their criteria is more favorable than editing an be clear from this section. existing one. However, such inspections can quickly be deemed fruitless when the term searched for turns out not to be defined by 1. Discrepancies in Early Development logical statements in an ontology. This is a common occurrence, a. The most convenient means of constructing, as such ‘classes’ can be placeholders for future development or looking at and sharing the early term list is, intrinsically defined terms where no logical definition was unusually, from within an ontology file, thought necessary. Ontologies can be intensely developed in one which implies some hierarchical structure. particular area where immediate goals are important, yet there is no way to effectively discover this other than through thorough b. Early revisions of the ontology are browsing. For the goals of the Semantic Web, it is imperative that experimental for the ontologist, yet are still such information required to carry out this inspection process be subject to inspection and lexical evaluation made as clear as possible for the inspector, such that we do not from the domain expert. see immense reproduction of individual effort and no clear “shared conceptualizations”. c. Domain experts, having looked directly at revisions of the ontology file, may be resistant The domain knowledge these ontologies describe can require a to subsequent major changes in structure and considerable amount of understanding for anyone trying to terminology by the ontologist as knowledge is inspect them. There are several ways in which this can be the disambiguated. case. 2. Emerging Discrepancies 1. The domain knowledge encoded may be outside the experience of the inspector, or in a different context to a. Inclusion of information regarding deprecated what was expected. The inspector may not be able to classes into the class hierarchy of the tell if the knowledge represented is valid because it is ontology. not within their expertise, and will need to seek help 3. Communicative Discrepancies and advice from a domain expert. 2. The knowledge may be appropriate, but encoded with a. Discussions between the domain experts axioms and restrictions that the inspector may not be about terminology that are potentially crucial able to accurately interpret as real world meaning, such for ontology comprehension are lost or are that they have to find the advice of an ontologist. completely separated from the ontology itself. 3. The ontology may have been written for a specific purpose. The inspector may not be able to tell whether b. Discussions between the domain experts and this is the case, and could therefore assume that the first the ontologists about disambiguation of terms or second scenario above is true, unless it is possible to are lost or are completely separated from the seek advice from the original authors or find a resource ontology itself. containing this information. c. Potential for misinterpretation of logical The three scenarios above are serious issues for the future of aspects of the ontology by the domain experts ontologies in the Semantic Web. Most ontologies are developed through exposure to the logical component. as part of projects, and projects are usually pragmatic in terms of their goals. Hence, people build these ontologies as application ontologies that serve the immediate needs of the project. There is 3.2 Task Model 2: Ontology Inspection no perceivable immediate benefit for a project to develop a more Ontologies are complex entities. If any ontology is going to get general domain ontology in tandem with an application ontology, used by someone other than the person or group that implemented and so it does not happen. Consequently the Semantic Web goals it, there has to be a way in which it can be decided whether or not of sharing and reuse become much harder, as people will tend to it is an appropriate ontology for the task in hand [18]. Currently, assess these application specific ontologies as too specific for a this inspection process is difficult because of the paucity of new purpose, as they see that they will need to invest effort in its ontologies available, and the fact that many have been designed re-engineering. Another danger here is that with so many for a specific purpose. Also, the discrepancies listed in 3.1 result application ontologies being developed, that inspectors always in a general lack of information that can aid effective inspection start to assume that unusual features of ontologies are the result of and overall ontology comprehension. the needs of an application, and dismiss the ontology as potentially unusable. What is really needed is for the inspector to b. No indication without exploration of the level be sure what sort of artifact they are looking at by having easy of effort put into different areas of an access to certain parameters. ontology. In the Semantic Web vision, the first course of action for an ontologist would be to verify the existence or non-existence of a 4. DESIDERATA FOR SEMANTIC domain ontology with close or overlapping scope to the ontology ONTOLOGY COMPREHENSION they are to develop. This process will be laborious if it relies on Section 3 highlights the social and communicative discrepancies the current practice of downloading ontologies and browsing them that prevent an effective amount of ontology comprehension that to see if they are at all reusable. In response to this, technologies is required for the uptake of the Semantic Web goals of ontology such as Swoogle [16] and AKTiveRank [2] are starting to provide sharing and reuse. This section cross-analyses these discrepancies access to online ontologies through page ranking and other to produce some desiderata that can be considered for future analytical methods to establish potential target ontologies. systems. Whilst all types of data in and about an ontology may be However, these technologies have been criticized for ignoring the considered ‘ontological’, we specify ‘Meta-Ontological Data’, meaning of concepts and also relations [24]. Furthermore, we note ‘Ontological Metadata’ and ‘Logical Statements’ as clearly that the results returning from these searches are whole OWL identifiable parts. For information contained within ontology files files, free and independent of contextual information. For that is only for human interpretation of the encoded semantic example, a Swoogle search for “Protein” has in its top hits an content, we use the idea of Meta-Ontological data. For data ontology used in an educational tutorial (that in this case is specific to an individual ontology that is necessary for interpret evident from its URL), which is by no means intended to be a and inspection across the whole structure and history of shared or reusable resource, but none the less is discovered and development, we use the idea of Ontology Metadata. The ‘logical accessible. statements’ in an ontology constitute the remainder of the content. Those inspecting ontologies can find themselves in an isolated situation where Web searches and personal inspection of an 4.1 Separating the Ontological from the ontology or its documentation are the only means to ontology Meta-Ontological comprehension. It has already been recognized that the Web has Ontologies come with a considerable amount of meta-ontological enormous potential for social organization and engagement. In information (or should do so) which is used by the human to ontology comprehension, for example, it offers the means of assess and see the intended use of that ontology. Much of this asking those who know. It also, as Wikipedia has shown, offers meta-ontological information is linguistically orientated. These the means by which elements that aid ontology comprehension meta-ontological extensions to the ontology itself are meaningless can be developed. Having concluded the need for ontology strings to the computer, and in this respect are unnecessary in so comprehension, we now explore what is necessary for such a far as the computational goals of the Semantic Web are facility. concerned. We know that this meta-ontological information is The following discrepancies in ontology comprehension should necessary, but we also see that it is not convenient to access; it be clear from this section: lacks the human resources that often make the most of such 1. Discovery Level Discrepancies material, as in Section 3.2 where a lack of a single access point means that secondary information needs to be sought out a. Targeted discovery based on search for terms manually. rather than meanings In reality, we have a chance to design and build support for the b. Ontologies are discoverable independently of meta-ontological in the light of current experience. OWL has statements of purpose, scope etc. virtually no support, apart from some ad hoc solutions, for c. Searches may discover anything from tutorial carrying meta-ontological knowledge. We would advocate such a OWL files, programmatic OWL fragments, separation of the ontological from meta-ontological and this is application ontologies, outdated or where a Web 2.0 approach could help. unmaintained ontologies etc. Our current scenario places too much reliance on assessment 2. Ontology Level Discrepancies through simple linguistic inspection of, for instance, terms. These are labels for concepts and a simple assumption of lexical a. Statements of scope, purpose, expressivity etc matching implying conceptual matching is dangerous. For are often missing altogether, or require extra example, in biology, it might seem safe to assume that hepatocyte searches to discover them. and liver cells are the same thing. In fact, cells in the liver include b. Discussions that have affected overall hepatocyte cells, but also include adipocyte cells. Hepatocytes ontology development are not recorded make the liver the liver, but there are other cells too. c. Minimal opportunities to interact with the Ontologies are only intuitively discoverable through the development team identification and inspection of the appropriate individual terms. 3. Term Level Discrepancies Even the construction of linguistic definitions can leave ambiguous meanings for those inspecting an ontology, with no a. Feeding in from Section 3.1, ontologies need real way to find out how those definitions were converged upon. exploring in the development environment to Even with logical definitions, we still rely upon natural language assess appropriateness of terms. labels. The aim of languages such as OWL-DL is, however, to minimize potential ambiguities through logical descriptions. Overall, there should be a synergy between logical and linguistic knowledge in a way not possible in file-oriented development. definitions. The ontologists have a way to interact with the domain experts as Non-ontologist domain experts will attach intrinsic meaning to a community to perform tasks such as the disambiguation of terms terms by drawing on their internal knowledge and the context in before they have been encoded in an ontology, reducing the which a term is used. It is possible to restrict the intrinsic meaning chance that major revisions of ontological structure will be of a term using the consensus of a domain, so long as it is stated required. As this resource is shared and linkable, project and in the context of the purpose of the controlled vocabulary. domain contexts for terms can be established. These contexts can Interpretation of meaning in these controlled vocabularies still be used by both the ontologists and the domain experts to traverse requires a human agent and the knowledge is logically the gap into discussions that involve other groups, and discover inaccessible to a computer agent. overlapping scopes more intuitively. Additionally, these resources would provide ideal testing grounds for lexical research (e.g. [7]) Thus, we define the inline linguistic portions of ontology files as that should lead to future improvement on methodologies for meta-ontological data. These include anything that a human agent these workspaces. would use for the translation of specific complex logical statements into meaning (including links to other meanings) but are also intrinsically meaningless to the computer. Primarily, these are: • Terms o The specific string by which the logical meaning is labeled, usually considered as the real meaning. o E.g. (from celltype ontology) ‘subsidiary cell’ • Synonyms o Any number of labels that refer to the same meaning. o E.g. (cont’d) ‘accessory cell’ • Definitions o Short, concise description of the meaning, including links to other terms. o E.g. (cont’d) ‘An epidermal cell associated with a stoma and at least morphologically distinguishable from the epidermal cells composing the groundmass of the tissue’ • Annotations (examples of) o Longer, more verbose descriptions. o Examples of how the term is used. o Explanations of contextual use for the term. o Links to term provenance. o E.g. (cont’d) DBXREF - TAIR:0000296 Achieving the separation of this meta-ontological layer allows for the consideration of how to manage this mostly linguistic information. This separation is our major desideratum and from this flows the means by which Web 2.0 can provide a platform to expose meta-ontological information and harness and extend the range of group activities. 4.2 Promoting Social Interaction – A Meta- Ontological Workspace Explicit logic based ontologies for the Semantic Web are going to need to capture implicit knowledge with axioms and restrictions. Yet, unless the experts with the knowledge all manage to learn how to interpret complex logical statements, there needs to be a workspace in which implicit knowledge can be discussed and defined lexically within expert groups. In other words, terms and Figure 2: An augmented form of ontology comprehension. term linked information can essentially exist independently of the Ontology Metadata and Meta-Ontological statements have formal environment of ontologies. This implicit knowledge can be been separated from the Logical Statements and has been used by ontologists as a resource. With such a resource, exposed to a community using Web 2.0 principles. development of early stage ontologies will not require the Generating discussion of implicit meaning may sound a little like construction of formal hierarchies until a critical amount of cutting the domain expert out of the ontology construction implicit knowledge has been collected in these more lightweight process. It should in fact considerably reduce the overhead of resources. Also, multiple hierarchies for different purposes could ontology development by shifting the discussions based around be constructed from the same resource, reusing the collected intrinsic meanings of terms and which terms are the most domain ontologies, this can be marked up and become visible appropriate to use away from the attention of the ontologists. It is such that ontology level provenance, a history of where important not to make the division too wide, as there is a risk that everything in an ontology originated from and how it changed bias could creep in from the ontologists as the domain experts over time, can start to be built up. would be unable to assess the implications of certain restrictions Introducing a strong community aspect would encourage those and axioms. In terms of feedback to the domain expert from the developing ontologies to start using tagging, thereby linking up ontologist, we recognize that there is a need for some sort of their ontologies to particular domains and projects. Domain consistent translation methodology that can generate accurate ontology construction could be promoted by using ranking textual definitions from logical statements, but we consider this systems where inspectors can rate how useful the ontology was in outside of the scope of this article. terms of what was expected, assuming that more general It should be clear that such discussion workspaces would be well ontological models will fit the requirements of more people. suited for Web 2.0 style systems. These workspaces should OWL has been made popular for use as an ontology language promote the creation of lexicons in which a group of experts can because of the publicity of the Semantic Web, accessibility of start to add in and inspect lexical information. In this implicit tools for creating OWL ontologies and the fact that it is useful view, it is the terms that are the focus of discussion, not the beyond the scope of the Semantic Web. OWL has been used for a ontological interpretation, which are two different goals that lot of purposes, and searching for ontologies based on the content sometimes get confused during ontology development. Within the of their files seems like it may be unsustainable as the number of workspace, the terms can be discussed, and annotated with textual files grows faster than the number of useful ontologies. definitions, comments about usage, links to synonymous terms, requests for clarification etc. Helium was, for instance, discovered Efficient inspection of ontologies can be limited by a large size of in 1894. Of course it was the category of Helium that was ontologies. The current tendency is to build larger ontologies, as discovered, not the instances of the helium atoms (which the tool support and methodologies for modularization have been presumably have existed much before 1894). This is an example slow off the mark until recently. As we learn more about the of meta-class statements that are part of the ontology. They are implications and methods of modularization [22], ontologies can class level statements, but those that are well suited to this become more manageable, reducing the amount of evaluation cost linguistic, community style of interaction. per ontology. This of course will require better indexing, along with information about how each ontology has been The purpose of targeting Web 2.0 as a base for this meta- used/imported, perhaps leading to a ‘shopping cart’ model for ontological data is not to completely remove this type of highly modular ontology construction. information from the view of the ontology, we merely seek to relocate it so that the incredibly social nature of the definition of Perhaps one of the most motivating factors for achieving this knowledge can be coupled with an environment that is equally desire for more effective inspection is the aspect of learning. Once socially driven (see Figure 2). Modularization of ontologies is it becomes easy to empirically see what constitutes a good and seen to be one of the keys to making ontologies viable for the useful ontology, then these features get propagated and discussed. Semantic Web vision, and as such, import mechanisms exist that As has been noted in [1], the viral spread of understanding how to support the combination of different sources. Lexicons developed write HTML was in part because existing HTML could be by groups could be given URIs, as could all of the terms inspected and copied. Also, the effect of newly written HTML described in them. Knowledge held in WordNet [17] style lexical was instantly verifiable in a Web browser. It is harder to have this resources could be linked using online URIs in a similar way to sort of verification with ontologies, and there are a lot of imported online ontologies taking advantage of well established conflicting styles of ontology development with no consensus of methods for dealing with words and their meanings at the lexical what is ‘right’, If the Ontological Metadata Workspace were to be level. realized, then a hub of comparable, commented and marked-up ontologies could develop in a much quicker and consistent fashion than the solitary efforts that are currently the norm. 4.3 Promoting Ontology Sharing and Reuse: An Ontological Metadata Workspace 5. BIO-ONTOLOGIES: EXPERIENCES The production of ontologies that can be effectively shared and reused is a major step towards achieving the goals of the Semantic AND PERSPECTIVES Web. There are significant barriers to these goals in our current While our discussion in this article is most pertinent to the notion model of ontological comprehension. We have highlighted how of the Semantic Web as a whole, it originates from the discipline ontologists and domain experts alike need to inspect ontologies to of bioinformatics. Biologists were early adopters of the Web as a assess whether they are appropriate for their needs. Currently, the means of disseminating data and the tools for their analysis. These information that would be necessary to effectively conduct this data and tools are developed in a highly autonomous manner and investigation is hard to find, and does not always come in the consequently they are beset by both syntactic and semantic same format. heterogeneities. Bioinformaticians have seen ontologies as a means to create common understandings for human and An ontological metadata workspace would provide access to computers about the meaning of data in their distributed resources whole-ontology level information for ontologies necessary to in a life science Semantic Web [13]. The DNA sequences of carry out light evaluative processes. A collaborative Web 2.0 different organisms, for example have a common representation, approach to ontologies would see ‘ontology profiles’ that include but this is not so for the functional knowledge associated with clear statements about the purpose and scope of the ontology and those sequences. So, the sequences can be interpreted by humans information regarding its status. Ontologies would clearly be and computers, but not what is known about those sequences. labeled as domain and application ontologies to help evaluation, and subsequently, when application ontologies are derived from Consequently, biologists have created ontologies to describe, for such as ‘develops_from’. Despite having the full expressivity of instance, the functional attributes of DNA and proteins [3, 11]. OWL available in the OBO 1.2 syntax, there is little evidence to Bioinformatics has, therefore, much Web accessible data suggest that the developers in this community either see the need described by ontologies. The W3C have recognized a nascent or have the will to take on this level of expressivity in their Semantic Web in this domain in the development of the Health knowledge. Care and Life Sciences SIG 5 . It is a significant feature of the Perhaps then, this community can be a model for the future of move towards ontologies in this sector that it is biologists who ontology development on the Web. Quick and easy development build these tools, with some guidance from ontologists. Whilst of terms by engaging the user, employing Web 2.0 design this community has not made great use of OWL, but its own principles to forge more coordinated communities for representation, OBO 6 , it still provides a good representation of development of Semantic Web technologies. Web 2.0 has the Semantic Web activities. capability to expose all of the ‘light’ lexical issues and some basic The OBO ontologies have significant standing in biological assertions of linking meaning to terms. ‘Heavier’ more expressive communities, and it is perhaps the community building aspect that assertions in OWL are in the domain of the ontologist, who can be fuels this standing, as it includes: informed by the interactions they can have with domain experts and other ontologists through Web 2.0 communities. • A large number of centrally available OBO ontologies 7 • The OBO-Edit OBO ontology development tool that is 6. DISCUSSION specifically designed by a working group of users. We propose the construction of ontology specific resources, using the Web as a platform, which specifically deals with the • A committee, the OBO-Foundry 8 , that has been set up management of lexical meta-ontological aspects of ontology and has produced a set of principles for new OBO development together with the management of ontology metadata. ontologies to aspire to, including the promise of textual The applications of Web2.0 are geared towards harnessing these definitions for all terms and good documentation for all types of community interaction, which is precisely the sort of ontologies. interaction that is not supported in the current model of ontology • The OBO file format, for which the primary goals development. Dealing with meta-ontological data in include human readability and ease of parsing together downloadable ontology files and disparate descriptions of with a syntax that makes them exportable as OWL. ontology metadata on development sites is prohibitive to a more • Pages on the SourceForge 9 open source software universal appreciation of ontology design and implementation. development site, which includes the potential for A centralized resource for sharing OWL resources would act as a project information, forums, downloads and issue hub for community learning, sharing and reusing of ontology tracking by which suggestions for new terms and resources, bringing together ontology users and builders in a way modifications can be submitted. that is currently not possible. Designing ontologies by consensus Contributors to OBO are starting to pull together as a virtual in such workspaces would encourage best practice and speed up community by pooling its resources on the Web. The Gene the uptake of the more complicated Semantic Web technologies, Ontology [3] saw a phenomenal growth in the number of terms it starting with OWL and the knowledge that is to be contained contained through user interaction alone that is well documented within. At the same time the system would provide a measure of [5], such was the demand for the resource to represent so many control, ensuring that the dangers of misinterpreting the powerful researchers. Since then, the trend has continues as more and more semantics of OWL by untrained eyes are avoided. Having the biological domains aim to be represented by OBO. community built lexical resources is the beginning of an opportunity to link up ontologists with a more specific system that The caveat for the relative success of OBO has probably been can refer to the online lexical corpus. similar to that of Web 2.0 over Semantic Web (so far). Formality and methodology have temporarily made way for ease of use and The widespread realization of the Semantic Web will depend on ease of interaction. Interestingly, the majority of the OBO the production of ontologies that can be effectively shared and ontologies clearly state that they are “structured controlled reused, but in order to achieve this, the overheads of ontology vocabularies”, which require nothing like the expressive power of development and ontology comprehension have to be OWL, and little in the way of knowledge engineering because the considerably reduced. The OBO community/consortium has statements linking things do not require it. This is not for any effectively demonstrated the advantages of lowering these other reason than nothing more complex than this is required, overheads by engaging a community of domain experts in OBO ontologies are used for marking biological data so that they ontology development. OBO ontologies, however, are for human can be linked if they are annotated in the same way. Primarily, interpretation, so the true Semantic Web vision of human and these ontologies contain a hierarchy of terms denoting ‘is_a’ computational understanding is not addressed. At the same time, relationships. Less often but still common are ‘part_of’ highly expressive OWL-DL ontologies, for both computer and relationships, and occasionally other properties key to biology human interpretation are being produced, but largely in isolation. We propose a solution here which would bridge the gap between 5 these approaches and effectively enable the same type of domain http://www.w3.org/2001/sw/hcls/ expert community engagement for formal ontologies. 6 http://www.geneontology.org/GO.format.obo-1_2.shtml 7 http://obo.sourceforge.net/ 7. ACKNOWLEDGMENTS 8 Funding for this work was through BBSRC grant BBS/B/17156. http://obofoundry.org/ 9 http://sourceforge.net/ 8. REFERENCES and metadata engine for the semantic web Proceedings of the thirteenth ACM international conference on [1] Alani, H., Position paper: ontology construction from Information and knowledge management, ACM Press, online ontologies. in Proceedings of the 15th Washington, D.C., USA, 2004. International Conference on World Wide Web [17] Miller, G.A. WordNet: a lexical database for English. (Edinburgh, Scotland, 2006), ACM Press, New York, Communicationd of the ACM, 38 (11). 39-41. NY, 491-495. [18] Noy, N.F., Guha, R. and Musen, M.A., User Rating of [2] Alani, H., Brewster, C. and Shadbolt, N., Ranking ontologies: Who will rate the raters? in AAAI 2005 Ontologies with AKTiveRank. in International Spring Symposium on Knowledge Collection from Semantic Web Conference, (Athens, GA, USA, 2006). Volunteer Contributors, (Stamford, CA, USA, 2005). [3] Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., [19] Noy, N.F. and Klein, M. Ontology Evolution: Not the Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Same as Schema Evolution. Knowledge and Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Information Systems, V6 (4). 428-440. Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., [20] Pulido, J.R.G., Ruiz, M.A.G., Herrera, R., Cabello, E., Richardson, J.E., Ringwald, M., Rubin, G.M. and Legrand, S. and Elliman, D. Ontology languages for the Sherlock, G. Gene Ontology: tool for the unification of semantic web: A never completely updated review. biology. Nat Genet, 25 (1). 25-29. Knowledge-Based Systems, 19 (7). 489-497. [4] Austin, M.A., III and Samadzadeh, M.H., Software [21] Rector, A., Drummond, N., Horridge, M., Rogers, J., comprehension/maintenance: an introductory course. in, Knublauch, H., Stevens, R., Wang, H. and Wroe, C. (2005), 414-419. OWL Pizzas: Practical Experience of Teaching OWL- [5] Bada, M., Stevens, R., Goble, C., Gil, Y., Ashburner, DL: Common Errors & Common Patterns, 2004. M., Blake, J.A., Cherry, J.M., Harris, M. and Lewis, S. [22] Rector, A.L. Modularisation of domain ontologies A short study on the success of the Gene Ontology. Web implemented in description logics and related Semantics: Science, Services and Agents on the World formalisms including OWL Proceedings of the 2nd Wide Web, 1 (2). 235-240. international conference on Knowledge capture, ACM [6] Berners-Lee, T., Hendler, J. and Lassila, O. The Press, Sanibel Island, FL, USA, 2003. Semantic Web. Scientific American, 284. 34-43. [23] Rosse, C. and Mejino, J.L.V. A reference ontology for [7] Bodenreider, O., Burgun, A. and Rindflesch, T.C. biomedical informatics: the Foundational Model of Assessing the consistency of a biomedical terminology Anatomy. Journal of Biomedical Informatics, 36 (6). through lexical knowledge. International Journal of 478-500. Medical Informatics, 67 (1-3). 85-95. [24] Sabou, M., Lopez, V., Motta, E. and Uren, V., Ontology [8] Brank, J., Grobelnik, M. and Mladenic, D., A survey of Selection: Ontology Evaluation on the Real Semantic ontological evaluation techniques. in Conference on Web. in WWW2006, (Edinburgh, UK, 2006). Data Mining and Data Warehouses, (Ljubljana, [25] Schaffert, S., Gruber, A. and Westenhaler, R., A Slovenia, 2005). Semantic Wiki for collaborative knowledge formation. [9] Brickley, D. and Miller, L. FOAF vocabulary in Semantics, (Vienna, Austria, 2005). specification, 2005. [26] Shadbolt, N., Hall, W. and Berners-Lee, T. The [10] Corcho, O., Fernandez-Lopez, M. and Gomez-Perez, A. Semantic Web revisited. IEEE intelligent systems, 21 Methodologies, tools and languages for building (3). 96-101. ontologies. Where is their meeting point? Data & [27] Stevens, R., Baker, P., Bechhofer, S., Ng, G., Jacoby, Knowledge Engineering, 46 (1). 41-64. A., Paton, N.W., Goble, C.A. and Brass, A. TAMBIS: [11] Eilbeck, K., Lewis, S., Mungall, C., Yandell, M., Stein, Transparent Access to Multiple Bioinformatics L., Durbin, R. and Ashburner, M. The Sequence Information Sources. Bioinformatics, 16 (2). 184-186. Ontology: a tool for the unification of genome [28] Uschold, M. Knowledge level modelling: concepts and annotations. Genome Biology, 6 (5). R44. terminology. The Knowledge Engineering Review, 13 [12] Golbreich, C., Zhang, S. and Bodenreider, O. The (1). 5-29. foundational model of anatomy in OWL: Experience [29] Vrandecic, D., Pinto, S., Tempich, C. and Sure, Y. The and perspectives. Web Semantics: Science, Services and DILIGENT knowledge processes. Journal of Agents on the World Wide Web, 4 (3). 181-195. Knowledge Management, 9 (5). 85-96. [13] Good, B.M. and Wilkinson, M.D. The Life Sciences [30] Whetzel, P.L., Parkinson, H., Causton, H.C., Fan, L., Semantic Web is Full of Creeps! Brief Bioinform, 7 (3). Fostel, J., Fragoso, G., Game, L., Heiskanen, M., 275-286. Morrison, N., Rocca-Serra, P., Sansone, S.-A., Taylor, [14] Guarino, N. and Christopher, W. Evaluating ontological C., White, J. and Stoeckert, C.J., Jr. The MGED decisions with OntoClean. Commun. ACM, 45 (2). 61- Ontology: a resource for semantics-based description of 65. microarray experiments. Bioinformatics, 22 (7). 866- [15] Kirk, D., Roper, M. and Wood, M., Identifying and 873. addressing problems in framework reuse. in, (2005), 77- [31] Wolstencroft, K., Lord, P., Tabernero, L., Brass, A. and 86. Stevens, R. Protein classification using ontology [16] Li, D., Tim, F., Anupam, J., Rong, P., Cost, R.S., Yun, classification. Bioinformatics, 22 (14). e530-538. P., Pavan, R., Vishal, D. and Joel, S. Swoogle: a search