The Ontology Viewer: Facilitating Image Annotation with Ontology Terms in the CSIDx Imaging Database Amalia Kallergi Yun Bei Fons J. Verbeek akallerg@liacs.nl ybei@liacs.nl fverbeek@liacs.nl section Imaging & BioInformatics - Imagery & Media group Leiden Institute of Advanced Computer Science (LIACS), Leiden University. Niels Bohrweg 1, 2333 CA Leiden, The Netherlands ABSTRACT annotations and to assess synonyms. Moreover, CSIDx In the life sciences data must be described unambiguously. aims to explore the added value of the ontological relations We apply this principle in our multi-modal bio-imaging towards integration of images in representing biological database in which images are stored together with concepts. In this section, we briefly introduce the scope and comprehensive metadata. We use ontology terms to aim of the database and provide a short overview of the describe the semantic content of images. Ontologies are image annotation procedure. obtained from dedicated ontology repositories in the life sciences. For our users, the process of image annotation CSIDx is built to support a wide range of imaging with ontology terms was proven to be a challenging task. modalities and techniques and it is the backbone database Therefore, we have made improvements on both usability of the Cyttron project [3], a consortium towards an and speed of annotation. We developed search facilities integrated infrastructure for bio-imaging and modelling across our ontology collection and implemented a new cells down to atomic detail. CSIDx is also a web-based graphical ontology viewer. This tool allows for both community in which researchers from various institutes can querying and visualizing ontology terms by means of a 2D- share their image resources. The database is accessible via graph representation. Our viewer provides a means to a web interface and the design is based on rich Internet collect ontology terms and at the same time familiarizes application practices that allow for dynamic and responsive users with ontologies and their structure. In making these web applications. The system is developed, maintained and tools available we succeeded in our goals to reduce time physically hosted by the Imaging and Bio-Informatics and effort for accurate image annotation. group at Leiden University. In CSIDx, we propose that metadata as the information that Author Keywords describes an image is essential to support exchange and Ontology, life sciences, annotation, graph, images linking as well as analysis of images [1,2]. A key feature of CSIDx is linking of imaging modalities via concepts ACM Classification Keywords towards integration of functional concepts and to that end, H.5.3 Group and Organization Interfaces: Web-based an unambiguous annotation is required. Therefore, CSIDx interaction. stores both raw pixel data and user generated annotations. A major part of the CSIDx development is dedicated to INTRODUCTION tools that facilitate the process of an extended annotation The Cyttron Scientific Image Database for Exchange by the image owner. The development process and design (CSIDx) is a multi-modal imaging database for images of new features is accomplished in close collaboration with produced in the life sciences [2,7]. In CSIDx, image users; i.e., biologists, structural biologists and others, annotation is a fundamental aspect of image submission whose feedback is registered via observation and informal and ontology terms, as extracted from life-sciences interviews. ontologies, are used to define the semantic content of an image. These ontologies with their intrinsic curation and In order to assure a comprehensive annotation that also relations between all terms help to obtain unambiguous represents the actual image acquisition conditions, CSIDx maintains metadata about the 'who', the 'what' and the 'how' of an imaging experiment [1,2,7]. In this paper, we Workshop on Visual Interfaces to the Social particularly address the metadata on what an image is about and the Semantic Web (VISSW2009), IUI2009, i.e. information about the biological phenomenon depicted Feb 8 2009, Sanibel Island, Florida, USA. or the phenomenon the image relates to. This annotation Copyright is held by the author/owner(s). corresponds to the semantic content of the image and captures the interpretation of an image as given by the domain expert or the researcher responsible for the image acquisition. To assure accurate metadata [1,2] and to explore possible relations between the images, the 'what'- part of the annotation is expressed in ontology terms as extracted from life-science ontologies. In comparison with free text or user generated keywords, ontology terms guarantee consistency across the system and prevent ambiguities and spelling mistakes. Moreover, ontologies provide well-defined relations across the concepts which can be further explored in structuring or mining the image data. The use of domain specific ontologies also corresponds with the emerging practices in the field of life- science data repositories towards well-maintained and reusable resources by means of a common semantic representation. Figure 1. The ontology tree viewer. In this paper, we describe our efforts and tools to support Left: Selecting an ontology from a list. and facilitate image annotation with ontology terms. In Right: Selected ontology with expanding tree view. particular, we describe our ontology viewer, a graphical may occur and that terms can be interconnected. During our tool developed to assist image annotation based on sessions with the users, we often found ourselves sketching ontologies. CSIDx currently incorporates 37 life-science out ontology graphs on paper to explain an ontology. related ontologies, the majority of which are retrieved from Secondly, most of our users were insufficiently familiar the Open Biomedical Ontologies (OBO) Foundry [9]. The with the content of the ontologies. Simply, they did not OBO Foundry is a platform to share biological ontologies have sufficient knowledge on what terms are to be found in in a common syntax and the maintained ontologies are each separate ontology. We expected that their biological available in a variety of formats such as OBO [9], OWL knowledge would help them locate the terms of interest in [10] and RDF [12]. The biological ontologies available are the hierarchy but the ontology structure as given in the developed and maintained by researchers in the biomedical hierarchy view was not always matching the user's field and provide a fair overview of the domain specific expectations. Additionally, the vast amount of terms knowledge and vocabulary in the field of life-sciences. In available was difficult to manage. Even when a term was this manner, about half a million unique terms are available known or previously identified, clicking through the several for annotation. levels of an ontology hierarchy to locate the term was time CHALLENGES OF USING ONTOLOGIES IN IMAGE consuming and - from the user's point of view - unpractical ANNOTATION and unacceptable. In an earlier prototype of CSIDx, users could annotate images by selecting ontology terms from our ontology tree FORM OF A SOLUTION browser. This application depicts the hierarchical relation To address the challenges of using ontology terms for in the ontologies as a tree view; this view only displays the annotation, we examined the difficulties faced by our users subClassOf relation. With an interactive tree view (cf. and the limitations of the hierarchical visualization of the Figure 1), users can navigate by collapsing and expanding ontologies. Testing with our prototype provided useful terms in the hierarchy and can select terms to be assigned knowledge on how users interact with ontologies. Through to the image under annotation. The ontology tree browser participatory evaluations, we learned that users need to parses ontologies in the OWL-format by means of the Jena learn the ontology content, to build a mental model of framework [5]. This tool provides some control and ontology structure and to extract information from structure in dealing with the available ontology terms as ontologies. The complexity of the annotation task increases well as some means to navigate the ontologies. However, due to the lack of search facilities, the overwhelming this approach was found to be insufficient for the extended amount of terms and the lack of experience with and annotation and usability requirements of CSIDx. understanding of ontologies. Hence, we implement a solution that aims to improve the annotation process both in In fact, the introduction of ontology terms for image terms of usability and in terms of ontology comprehension. annotation was in itself a significant challenge for our In particular, we provide search facilities on the ontology users. Firstly, the majority of our users were not familiar terms corpus and implement the ontology viewer, a with the exact concept of the ontology. Although all the graphical tool used for both querying and visualizing ontologies used in our system are maintained by the ontology terms. In combination with these facilities and in bioscience community, hardly any of our users had prior order to reduce the annotation effort, the concept of extensive experience with ontologies. In particular, they MyTerms was also introduced in the workflow of the had no mental image of the structure of an ontology and annotation process. they demonstrated difficulties in comprehending that relations other than the child-parent relation of a hierarchy By designing and populating the ontology database, which Construct OWL-DL Simplified Relation currently consists of 565,600 terms and 825,724 relations, SubClassOf A⊆B ∀x [B(x) → A(x)] we are able to support querying facilities across the ontologies. Users of CSIDx can search for a corresponding Restriction A⊆∃ P.B ∃x∃y [A(x)∧B(y)∧P(x,y)] term by keywords using either the ontology viewer (cf. next section) or a simplified web search form. EquivalenceClass A⊆B∩C ∀x [A(x) → B(x)∧C(x)] &IntersectionOf MyTerms: A User Specific Collection of Terms for EquivalenceClass A⊆ B∪C B⊆A, C⊆A &UnionOf Annotation To reduce the effort required for identifying annotation Table 2. Indirect relations in OWL-DL are transformed in terms in the ontology collection, we have introduced the straightforward relations to be stored in a database concept of MyTerms in the workflow of the annotation schema process. MyTerms is a collection of user specific ontology terms that are saved under a user's profile and can be reused Querying Ontologies Using a Database Back End across annotations. Prior to an actual image annotation, From the user perspective, quick concept (keyword-like) users can browse the ontology collection with the querying searches across the ontologies are essential in order to tools available looking for terms that are relevant to their complete an extensive image annotation with ontology study or field of research. During an image annotation, terms. Keywords and textual descriptions of images as users assign terms to images by selecting terms from their conceived by the image owner need to be mapped to own relevant subset (MyTerms) instead of the complete existing ontology terms. Such a procedure is not easily corpus of terms available. This process is an attempt to accomplished without any search facilities especially when minimize the effort of searching for terms (search once, use the user is not familiar with the content of the ontologies. in all subsequent annotations) and to reduce the Ontology querying mechanisms can involve the use of overwhelming amount of ontology terms to a subset that is dedicated RDF query languages such as SPARQL [13]. both meaningful to the user and easier to browse and use. More elaborate forms of querying, like reasoning can be The MyTerm concept can be further elaborated to match accomplished with reasoners such as Pellet [11] and the structure of our system. In CSIDx, users are organized KAON2 [8]. Although powerful, these mechanisms are in groups that correspond to their actual research institute heavily challenged when large or complex ontologies are or group and this organization is often used throughout the involved and do not demonstrate fast performance in terms interface as a mechanism for exchanging shared resources, of speed of a query [4, Bei internal technical report]. such as images or microscopes. Therefore, we also provide CSIDx is focused on the domain of the life science where the possibility of sharing identified terms among group ontologies and controlled vocabularies tend to be enormous members, who are likely to work on a similar topic. In the in size and/or are constantly updated and expanding. In case of group shared ontology terms, the time and effort addition, the ontology structure tends to include elaborate spent by a group member to locate and identify useful relations which result in increased complexity when terms across the ontologies profits all members of the querying or reasoning with the ontology. However, the web research group. On the whole, MyTerms assure that the based character of CSIDx gives a high priority on speed admiringly time-consuming process of mapping metadata and reactivity of the system. Being confronted with such a to existing ontology terms does not need to be practical limitation, we adopted a solution with a Relational unnecessarily repeated. Database Management System (RDBMS) to support fast ontology queries. Specifically, our ontology resources were THE ONTOLOGY VIEWER transformed from their original OWL format to a simplified The ontology viewer provides a graphical interface for schema that can be easily stored and queried by means of a querying ontology terms and a means to visualize the RDBMS. Namely, the indirect relationships in OWL-DL ontology structure. We believe that a graphical are transformed into concise, direct relationships and the representation can assist our users in building a mental complete ontology structure is expressed as a directed model next to building a collection of terms. In practice, it graph of concepts and their relations that can be easily is a tool to assist building a MyTerms list and an attempt to stored in a database schema. Examples of the demystify ontologies to our users by making the relations transformations applied are given in Table 1. Such a among ontology terms obvious. The application is representation definitely lacks the completeness, developed in Java and deployed as a WebStart application. complexity and expressive power of the OWL- DL It can be accessed via the CSIDx web interface or used as a language but allows us to perform queries with high standalone application for registered users. performance. For the purposes of image annotation, we The ontology viewer (cf. Figure 2) consists of two major believe that such a representation, although incomplete, is panels: a query form and a 2D viewer. In the query form, still able to provide a sufficient view on the domain users search for ontology terms within an ontology by knowledge. providing one or more keywords and by specifying the level of detail for the search. Queries can be performed on the label, synonym or definition of terms and keywords can Layout Algorithm be combined in an 'AND' or 'OR' query. Users can choose KKLayout The Kamada-Kawai algorithm from the list of results to either visualize particular terms or directly add terms to their MyTerms list. The Fruchterman-Rheingold FRLayout algorithm In the 2D viewer, the ontology structure is represented as a graph in which terms are graph nodes and relations are A simple layout which places CircleLayout graph edges. Selected ontology terms, as collected from a vertices randomly on a circle query, are used to produce a sub-graph of the ontology A simple force-directed graph. This sub-graph provides the local context for the SpringLayout spring-embedder selected nodes which are highlighted green to distinguish from their connected terms. Another simple force-directed SpringLayout2 spring-embedder A short description with information on any given term can be obtained by mouse over the corresponding graph node. Meyer's "Self-Organizing ISOMLayout Regular graphical manipulations are supported on the Map" layout ontology graph which can be zoomed, paned, rotated and Table 2. Graph layouts available in the ontology viewer sheared. In this manner, user can adjust the view to better understand the displayed relations. The ontology viewer As they familiarize themselves with the ontology structure, also provides different graph layouts to support a more users demonstrate the wish to further interact with the suitable or preferred arrangement in space, especially in the ontology. Often, they request to expand the displayed case of complex sub-graphs. The supported layouts are nodes, a requirement that equals with interactively given in Table 2. To improve clarity of the presentation, traversing the complete ontology. While the ontology both the text labels of either nodes or relations and nodes viewer was basically aimed to provide some context for the other than the selected nodes can be toggled on or off. The queried terms rather than a complete overview of an graph drawing and manipulation is implemented by means ontology, we are interested to explore if the graph of the Java Universal Network/Graph Framework (JUNG) representation can be useful as a querying tool in itself. [6]. While the contributions of a graphical interface for DISCUSSION AND CONCLUSIONS ontology exploration are encouraging, the overall Overall, the CSIDx ontology viewer provides an performance remains an issue. Querying an ontology is informative graphical interface to the collection of satisfactory fast but displaying the graph structure has ontologies in CSIDx. As most ontologies are derived from significant memory requirements and may halt for large the OBO matrix, this viewer is also an alternative graphical ontologies. Also, while mapping familiar keywords to entry point to exploring the most popular and ontology terms, many users reported a difficulty in acknowledged ontologies in the domain of the life sciences. specifying which ontology to query in. In the current Importantly, for most CSIDx users, this interface is their prototype, querying for a keyword in the whole collection first impression on biological ontologies and a first step of ontologies is not supported and needs further attention. towards familiarizing themselves with the concept and Our results can be represented by the following content of ontologies. Compared also to the web based conclusions: querying facility that lacks the graph display, users have reported that the connected terms often help clarify 1. The Ontology viewer provides an intuitive interface for ambiguities: when the label of a term can be explained (novice) users; the options are self explanatory and the differently depending on the context or when the user is assisted in understanding ontology concepts while description of a term is insufficient, the connected terms are at the same time ontologies are queried and terms often conclusive on the exact meaning of the term. As a selected. The mapping to the graphs is very helpful to that result, users feel more confident that they have selected the respect. proper term for precise annotation. The graph 2. The MyTerms list provides a good simplification to the representation also seems to assist users in rethinking the otherwise “oversized” ontologies. Users can now use way they translate their desired annotation to ontology ontology concepts with ease in their image annotations. terms. By exploring the ontology, users often conclude on more terms than they initially queried for and they often 3. To assure visibility in the interface of the ontology express the desire to automatically add to their user term viewer a fast response to queries is required which can be list (MyTerms) the whole graph structure as displayed in provided through a transformation of the ontology the 2D viewer. Overall, the ontology viewer contributes structure to an RDBMS. towards a more complete and accurate annotation based on ontology terms. Algorithms & Systems (Eds Hanjalic, A., Schettini, R., FUTURE DIRECTIONS The work presented in this paper is the result of a Sebe, N.), 65060G-1,65060G-10 participatory design trajectory; in the design phase we 3. Cyttron Project, http://cyttron.org aimed to learn how we could bring the concept of 4. Gardiner, T., Horrocks, I., and Tsarkov, D. (2006) annotation of images with ontologies across the users of the Automated benchmarking of description logic CSIDx database. The design process also included reasoners. In Proc. of the 2006 Description Logic requirement generation by users. We accomplished this Workshop. Volume 189 design phase with an artifact that is a fully working prototype, rather than proposing a final application. Now 5. Jena A Semantic Web Framework for Java, that we have gained sufficient information on how novice http://jena.sourceforge.net/ users can work with ontologies, we can make next steps 6. JUNG Java Universal Network/Graph Framework, towards observatory evaluations in which different http://jung.sourceforge.net/ annotation strategies can be tested. Further user evaluations 7. Kallergi, A., Bei, Y., Kok., P., Dijkstra, J., Abrahams, by surveys will render sufficient data for statistical analysis J.P., Verbeek, F.J. (2008) Cyttron: A Virtualized on ontology interaction. Microscope supporting image integration and Annotations are the basic components of the semantic knowledge discovery. In: Cell Death and Disease structure in CSIDx. Furthermore, the relations included in Series, ResearchSignPost the ontologies provide additional material to be explored. Eds.Backendorf,Noteborn,Tavassoli):Proteins Killing Initially, we wish to investigate the direct relations as Tumour Cells (In Press) maintained in the RDBMS. Still, mechanisms to profit 8. KAON2, http://kaon2.semanticweb.org/ from the OWL-DL expressiveness can be expanded based 9. OBO Download Matrix, on the existing annotation with ontology concepts. http://www.berkeleybop.org/ontologies/ REFERENCES 10.OWL Web Ontology Language Overview, 1. Bei Y, Belmamoune M and Verbeek FJ. (2006) http://www.w3.org/TR/owl-features/ Ontology and image semantics in multimodal imaging: 11.Pellet Reasoner, http://pellet.owldl.com/ submission and retrieval. Proc. SPIE Internet Imaging VII, Vol. 6061, 60610C1-C12, 2006. 12.RDF Resource Description Framework, http://www.w3.org/TR/rdf-syntax-grammar 2. Bei, Y., Dmitrieva, J., Belmamoune, M., Verbeek, F.J. (2007) Ontology Driven Image Search Engine. Proc. 13.SPARQL Query Language for RDF, SPIE Vol. 6506, MultiMedia Content Access: http://www.w3.org/TR/rdf-sparql-query/ Figure 2. The ontology viewer with a KKLayout of the graph and highlighted the selected results