Protégé Extensions for Scientist-Oriented Modeling of Observation and Measurement Semantics? Wesley Saunders1 , Shawn Bowers1 , Margaret O’Brien2 1 Department of Computer Science, Gonzaga University 2 Marine Science Institute, UC Santa Barbara wsaunders@zagmail.gonzaga.edu, bowers@gonzaga.edu, mob@msi.ucsb.edu Abstract. We present Protégé-OWL extensions designed to help scientists define domain-specific ontologies for describing observational data. The extensions pro- vide high-level forms that users can fill out from within Protégé to specify classes used to describe scientific measurements. As a user fills out a form, underlying OWL-DL axioms are automatically asserted, thus allowing users to specify rel- atively complex OWL-DL constraints without requiring an understanding of the technical details of OWL. Encoded in the constraints generated by the extension are a set of “best practices” for enabling improved data discovery and integration of observational data. 1 Introduction Earth and environmental scientists often depend on data collected from multiple re- search efforts to address broad scientific questions. These efforts rely on discovering, interpreting, and integrating diverse and heterogeneous data sets covering a wide range of semantic concepts. Employing domain-specific terms for describing earth and en- vironmental data has the potential to significantly improve discovery and integration, however, only a relatively small number of ontologies have been created within these domains. We see two main barriers to ontology development within these communi- ties: (1) the breadth of (specialized) concepts and phenomena studied requires a large and diverse number of ontological terms, and (2) the high-level of expertise needed to efficiently develop ontologies using current ontology languages and tools. The aim of this work is to help address these challenges by adding structured, easy-to-use forms to Protégé-OWL that scientists can use to quickly create meaning- ful domain-specific ontologies. Our approach leverages a generic, core observation and measurement ontology [1] that is designed for describing scientific data sets based on metadata annotations (mappings from data attributes to specialized measurement classes) [2]. These annotations provide a uniform view over otherwise heterogeneous data sets that can be used to enhance data discovery and integration applications (e.g., for improved precision and recall [2] or analysis over an integrated data repository [3]). Our extensions to Protégé-OWL allow users to create sophisticated term definitions using simple “fill-in-the-blank” forms, while automatically generating the underlying DL axioms corresponding to the user’s input. This approach provides domain-scientists ? This work supported in part through NSF grants 0743429 and 0753144. hasContext   Rela#onship   *   *   ofEn#ty   Observa#on   En#ty   *   1..1   1..1   1..1   Protocol   usesProtocol   hasMeasurement   hasValue   1..1   *   1..1   *   *   ofCharacteris#c   Standard   Measurement   Characteris#c   usesStandard   *   *   1..1   Fig. 1. The main classes and properties of the observation and measurement ontology (OBOE). While shown in UML, the model is defined using OWL-DL. with the option of creating high-quality ontologies without having to understand and directly work with the underlying DL formalisms of OWL. The generated axioms en- code a number of OWL-DL “best practices” (e.g., as in [4]) to ensure term definitions are well-suited for data discovery. We briefly describe the core observation and mea- surement ontology, approaches used within our Protégé extensions, and future work. 2 Observation Modeling using Protégé-OWL Fig. 1 shows the main modeling constructs of OBOE1 (the Extensible Observation On- tology) [1] used within our extensions. An observation is made of an entity (e.g., bi- ological organisms, geographic locations, environmental features) and serves to group a set of measurements together to form a single “observation event”. A measurement assigns a value to a characteristic of the observed entity (e.g., the height of a tree), and can also include standards (e.g., units) and collection protocols. An observation can occur within the surrounding context of other observations, where context can be viewed as a form of dependency [1], and context often includes a named relationship (e.g., “partOf”, “within”) between observed entities. A key feature of OBOE is that it allows properties (characteristics and relationships) of entities to be asserted without being interpreted as inherently (i.e., always) true of the entity. Depending on the context in which the entity was observed or how the measurements were performed, an entity’s properties may take on different values. OBOE allows RDF-style assertions about enti- ties to be contextualized, and thus different values can be assigned for the same entity under distinct contexts, which is crucial for modeling scientific data [1,5]. Fig. 2 shows an example of the extensions (i.e., plug-in) we have developed within Protégé-OWL for defining domain-specific ontologies using OBOE. The extensions were developed as a new Tab within Protégé (version 4.1). When the OBOE Tab is ini- tially selected from within the Tabs menu of Protégé the plug-in automatically imports the standard OBOE ontologies (after prompting the user). The OBOE Tab provides the standard class hierarchy view from Protégé (left-hand side of Fig. 2) as well as a new editing panel (right-hand side) with separate subpanels (subtabs) for each of the constructs shown in Fig. 1. Each subpanel contains a different form for specifying the associated OBOE class (the Measurement class form is shown in Fig. 2). Users create 1 http://ecoinformatics.org/oboe/oboe.1.0/oboe-core.owl Fig. 2. The Measurement form showing the various fields of a “Fresh Water Nitrogen Concentration” measurement type defined within the an OBOE extension ontology. Fig. 3. The OWL-DL axioms created by the form in Fig. 2. new classes via the standard class hierarchy panel. After a class is created or selected, the plug-in displays the appropriate form on the right-side of the window. Each form consists of a comment section as well as fields that can (optionally) be filled in. Most of these fields are filled with classes that are selected using a tree-based class selection widget, which constrains the choice of classes based on the type of class to be selected and the other values of fields as appropriate (e.g., depending on the characteristic cho- sen, only certain unit types can be selected). The Measurement form (shown in Fig. 2) contains the largest number of fields of all the forms, and includes fields for an observed entity, characteristic, standard, protocol, and zero or more context observations. Each context observation consists of an optional relationship type and an entity class (e.g., Fig. 2 shows the FreshWater entity and the Within relationship). Fig. 3 shows the standard Protégé view for the class of Fig. 2. As shown, defining this class using the Measurement form results in a non-trivial DL axiom. In this case, we assert Measurement types (such as the one in Fig. 2) using an equivalent class axiom. A measurement type can be viewed as a combination of a number of other classes, and users can annotate data set attributes either directly via a measurement type or by specifying the individual components (i.e., the entity, characteristic, standard, and so on). By using equivalence classes, attributes can be classified into measurement types automatically using a reasoner (such as Pellet), which also allows for data discovery searches that are based either on measurement types or the individual components of a measurement. We note that most other classes created using the OBOE plug-in are defined using subclass axioms. As shown in Fig. 3 we also control the use of universal and existential property restrictions largely following the conventions defined in [4]. 3 Summary and Future Work We have presented new extensions to Protégé-OWL to simplify the creation of observa- tion and measurement ontologies. The extensions are being used to develop controlled vocabularies within the Santa Barbara Coastal Long-Term Ecological Research Project and within TraitNet (for managing trait-based ecological and evolutionary research data). These ontologies consist of thousands of terms created using the form-based ap- proach described here. Unlike spreadsheet-based approaches (e.g., [6]), our approach takes advantage of existing features in Protégé (e.g., for displaying and navigating on- tologies), provides a variety of quality assurance controls (e.g., ensuring appropriate measurement units are chosen based on given characteristics), and offers a more struc- tured approach to ontology editing (similar to “term templates” [7]). As future work we are exploring ways to generalize the approach described here to allow developers to automatically generate a set of Protégé forms for a given core ontology model. References 1. Bowers, S., Madin, J., Schildhauer, M.: A conceptual modeling framework for expressing observational data semantics. In: ER. (2008) 2. Berkley, C., Bowers, S., Jones, M.B., Madin, J.S., Schildhauer, M.: Improving data discovery for metadata repositories through semantic search. In: CISIS. (2009) 1152–1159 3. Bowers, S., Kudo, J., Cao, H., Schildhauer, M.: ObsDB: A system for uniformly storing and querying heterogeneous observational data. In: e-Science. (2010) 4. A. Rector, et al.: OWL Pizzas: Practical experience of teaching OWL-DL: Common errors & common patterns. In: EKAW. (2004) 63–81 5. Mungall, C.: Representing phenotypes in OWL. In: OWLED. (2007) 6. O’Connor, M., Halaschek-Wiener, C., Musen, M.: Mapping Master: A flexible approach for mapping spreadsheets to OWL. In: ISWC. (2010) 7. P. Rocca-Serra, et al.: Overcoming the ontology enrichment bottleneck with quick term tem- plates. Applied Ontology 6 (2011) 13–22