Protégé Extensions for Scientist-Oriented Modeling of
          Observation and Measurement Semantics?

                 Wesley Saunders1 , Shawn Bowers1 , Margaret O’Brien2
                    1 Department of Computer Science, Gonzaga University
                          2 Marine Science Institute, UC Santa Barbara

        wsaunders@zagmail.gonzaga.edu, bowers@gonzaga.edu, mob@msi.ucsb.edu


       Abstract. We present Protégé-OWL extensions designed to help scientists define
       domain-specific ontologies for describing observational data. The extensions pro-
       vide high-level forms that users can fill out from within Protégé to specify classes
       used to describe scientific measurements. As a user fills out a form, underlying
       OWL-DL axioms are automatically asserted, thus allowing users to specify rel-
       atively complex OWL-DL constraints without requiring an understanding of the
       technical details of OWL. Encoded in the constraints generated by the extension
       are a set of “best practices” for enabling improved data discovery and integration
       of observational data.


1    Introduction
Earth and environmental scientists often depend on data collected from multiple re-
search efforts to address broad scientific questions. These efforts rely on discovering,
interpreting, and integrating diverse and heterogeneous data sets covering a wide range
of semantic concepts. Employing domain-specific terms for describing earth and en-
vironmental data has the potential to significantly improve discovery and integration,
however, only a relatively small number of ontologies have been created within these
domains. We see two main barriers to ontology development within these communi-
ties: (1) the breadth of (specialized) concepts and phenomena studied requires a large
and diverse number of ontological terms, and (2) the high-level of expertise needed to
efficiently develop ontologies using current ontology languages and tools.
    The aim of this work is to help address these challenges by adding structured,
easy-to-use forms to Protégé-OWL that scientists can use to quickly create meaning-
ful domain-specific ontologies. Our approach leverages a generic, core observation and
measurement ontology [1] that is designed for describing scientific data sets based
on metadata annotations (mappings from data attributes to specialized measurement
classes) [2]. These annotations provide a uniform view over otherwise heterogeneous
data sets that can be used to enhance data discovery and integration applications (e.g.,
for improved precision and recall [2] or analysis over an integrated data repository [3]).
    Our extensions to Protégé-OWL allow users to create sophisticated term definitions
using simple “fill-in-the-blank” forms, while automatically generating the underlying
DL axioms corresponding to the user’s input. This approach provides domain-scientists
? This work supported in part through NSF grants 0743429 and 0753144.
                                                         hasContext	
  

            Rela#onship	
                                                   *	
  
                                                 *	
                                                 ofEn#ty	
  
                                                                 Observa#on	
                                               En#ty	
  
                                                                                     *	
                 1..1	
  
                                                                          1..1	
                                            1..1	
  
              Protocol	
         usesProtocol	
  
                                                                          hasMeasurement	
                   hasValue	
  
                              1..1	
  
                                                                            *	
  
                              1..1	
                     *	
                         *	
  
                                                                                             ofCharacteris#c	
  
             Standard	
                                          Measurement	
                                       Characteris#c	
  
                              usesStandard	
             *	
                         *	
                1..1	
  

Fig. 1. The main classes and properties of the observation and measurement ontology
(OBOE). While shown in UML, the model is defined using OWL-DL.


with the option of creating high-quality ontologies without having to understand and
directly work with the underlying DL formalisms of OWL. The generated axioms en-
code a number of OWL-DL “best practices” (e.g., as in [4]) to ensure term definitions
are well-suited for data discovery. We briefly describe the core observation and mea-
surement ontology, approaches used within our Protégé extensions, and future work.


2   Observation Modeling using Protégé-OWL

Fig. 1 shows the main modeling constructs of OBOE1 (the Extensible Observation On-
tology) [1] used within our extensions. An observation is made of an entity (e.g., bi-
ological organisms, geographic locations, environmental features) and serves to group
a set of measurements together to form a single “observation event”. A measurement
assigns a value to a characteristic of the observed entity (e.g., the height of a tree),
and can also include standards (e.g., units) and collection protocols. An observation
can occur within the surrounding context of other observations, where context can be
viewed as a form of dependency [1], and context often includes a named relationship
(e.g., “partOf”, “within”) between observed entities. A key feature of OBOE is that
it allows properties (characteristics and relationships) of entities to be asserted without
being interpreted as inherently (i.e., always) true of the entity. Depending on the context
in which the entity was observed or how the measurements were performed, an entity’s
properties may take on different values. OBOE allows RDF-style assertions about enti-
ties to be contextualized, and thus different values can be assigned for the same entity
under distinct contexts, which is crucial for modeling scientific data [1,5].
     Fig. 2 shows an example of the extensions (i.e., plug-in) we have developed within
Protégé-OWL for defining domain-specific ontologies using OBOE. The extensions
were developed as a new Tab within Protégé (version 4.1). When the OBOE Tab is ini-
tially selected from within the Tabs menu of Protégé the plug-in automatically imports
the standard OBOE ontologies (after prompting the user). The OBOE Tab provides
the standard class hierarchy view from Protégé (left-hand side of Fig. 2) as well as a
new editing panel (right-hand side) with separate subpanels (subtabs) for each of the
constructs shown in Fig. 1. Each subpanel contains a different form for specifying the
associated OBOE class (the Measurement class form is shown in Fig. 2). Users create
1 http://ecoinformatics.org/oboe/oboe.1.0/oboe-core.owl
Fig. 2. The Measurement form showing the various fields of a “Fresh Water Nitrogen
Concentration” measurement type defined within the an OBOE extension ontology.


              Fig. 3. The OWL-DL axioms created by the form in Fig. 2.

new classes via the standard class hierarchy panel. After a class is created or selected,
the plug-in displays the appropriate form on the right-side of the window. Each form
consists of a comment section as well as fields that can (optionally) be filled in. Most
of these fields are filled with classes that are selected using a tree-based class selection
widget, which constrains the choice of classes based on the type of class to be selected
and the other values of fields as appropriate (e.g., depending on the characteristic cho-
sen, only certain unit types can be selected). The Measurement form (shown in Fig. 2)
contains the largest number of fields of all the forms, and includes fields for an observed
entity, characteristic, standard, protocol, and zero or more context observations. Each
context observation consists of an optional relationship type and an entity class (e.g.,
Fig. 2 shows the FreshWater entity and the Within relationship).
    Fig. 3 shows the standard Protégé view for the class of Fig. 2. As shown, defining
this class using the Measurement form results in a non-trivial DL axiom. In this case, we
assert Measurement types (such as the one in Fig. 2) using an equivalent class axiom.
A measurement type can be viewed as a combination of a number of other classes,
and users can annotate data set attributes either directly via a measurement type or by
specifying the individual components (i.e., the entity, characteristic, standard, and so
on). By using equivalence classes, attributes can be classified into measurement types
automatically using a reasoner (such as Pellet), which also allows for data discovery
searches that are based either on measurement types or the individual components of
a measurement. We note that most other classes created using the OBOE plug-in are
defined using subclass axioms. As shown in Fig. 3 we also control the use of universal
and existential property restrictions largely following the conventions defined in [4].

3   Summary and Future Work
We have presented new extensions to Protégé-OWL to simplify the creation of observa-
tion and measurement ontologies. The extensions are being used to develop controlled
vocabularies within the Santa Barbara Coastal Long-Term Ecological Research Project
and within TraitNet (for managing trait-based ecological and evolutionary research
data). These ontologies consist of thousands of terms created using the form-based ap-
proach described here. Unlike spreadsheet-based approaches (e.g., [6]), our approach
takes advantage of existing features in Protégé (e.g., for displaying and navigating on-
tologies), provides a variety of quality assurance controls (e.g., ensuring appropriate
measurement units are chosen based on given characteristics), and offers a more struc-
tured approach to ontology editing (similar to “term templates” [7]). As future work
we are exploring ways to generalize the approach described here to allow developers to
automatically generate a set of Protégé forms for a given core ontology model.

References
1. Bowers, S., Madin, J., Schildhauer, M.: A conceptual modeling framework for expressing
   observational data semantics. In: ER. (2008)
2. Berkley, C., Bowers, S., Jones, M.B., Madin, J.S., Schildhauer, M.: Improving data discovery
   for metadata repositories through semantic search. In: CISIS. (2009) 1152–1159
3. Bowers, S., Kudo, J., Cao, H., Schildhauer, M.: ObsDB: A system for uniformly storing and
   querying heterogeneous observational data. In: e-Science. (2010)
4. A. Rector, et al.: OWL Pizzas: Practical experience of teaching OWL-DL: Common errors &
   common patterns. In: EKAW. (2004) 63–81
5. Mungall, C.: Representing phenotypes in OWL. In: OWLED. (2007)
6. O’Connor, M., Halaschek-Wiener, C., Musen, M.: Mapping Master: A flexible approach for
   mapping spreadsheets to OWL. In: ISWC. (2010)
7. P. Rocca-Serra, et al.: Overcoming the ontology enrichment bottleneck with quick term tem-
   plates. Applied Ontology 6 (2011) 13–22