<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Prote´ g e´ Extensions for Scientist-Oriented Modeling of Observation and Measurement Semantics?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wesley Saunders</string-name>
          <email>wsaunders@zagmail.gonzaga.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shawn Bowers</string-name>
          <email>bowers@gonzaga.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Margaret O'Brien</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Gonzaga University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Marine Science Institute</institution>
          ,
          <addr-line>UC Santa Barbara</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present Prote´ge´-OWL extensions designed to help scientists define domain-specific ontologies for describing observational data. The extensions provide high-level forms that users can fill out from within Prote´ge´ to specify classes used to describe scientific measurements. As a user fills out a form, underlying OWL-DL axioms are automatically asserted, thus allowing users to specify relatively complex OWL-DL constraints without requiring an understanding of the technical details of OWL. Encoded in the constraints generated by the extension are a set of “best practices” for enabling improved data discovery and integration of observational data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Earth and environmental scientists often depend on data collected from multiple
research efforts to address broad scientific questions. These efforts rely on discovering,
interpreting, and integrating diverse and heterogeneous data sets covering a wide range
of semantic concepts. Employing domain-specific terms for describing earth and
environmental data has the potential to significantly improve discovery and integration,
however, only a relatively small number of ontologies have been created within these
domains. We see two main barriers to ontology development within these
communities: (1) the breadth of (specialized) concepts and phenomena studied requires a large
and diverse number of ontological terms, and (2) the high-level of expertise needed to
efficiently develop ontologies using current ontology languages and tools.</p>
      <p>
        The aim of this work is to help address these challenges by adding structured,
easy-to-use forms to Prote´ge´-OWL that scientists can use to quickly create
meaningful domain-specific ontologies. Our approach leverages a generic, core observation and
measurement ontology [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that is designed for describing scientific data sets based
on metadata annotations (mappings from data attributes to specialized measurement
classes) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. These annotations provide a uniform view over otherwise heterogeneous
data sets that can be used to enhance data discovery and integration applications (e.g.,
for improved precision and recall [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or analysis over an integrated data repository [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]).
      </p>
      <p>Our extensions to Prote´ge´-OWL allow users to create sophisticated term definitions
using simple “fill-in-the-blank” forms, while automatically generating the underlying
DL axioms corresponding to the user’s input. This approach provides domain-scientists
? This work supported in part through NSF grants 0743429 and 0753144.</p>
      <p>Rela#onship  </p>
      <p>Protocol  
Standard  </p>
      <p>hasContext  
*  </p>
      <p>*  </p>
      <p>Observa#on  
usesProtocol  
1..1  
1..1   *   *  
usesStandard   *   Measurement   *  </p>
      <p>*  
1..1  
hasMeasurement  
*  
ofEn#ty  
1..1  
hasValue  </p>
      <p>En#ty  
1..1  
ofCharacteris#c  
1..1  </p>
      <p>
        Characteris#c  
with the option of creating high-quality ontologies without having to understand and
directly work with the underlying DL formalisms of OWL. The generated axioms
encode a number of OWL-DL “best practices” (e.g., as in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) to ensure term definitions
are well-suited for data discovery. We briefly describe the core observation and
measurement ontology, approaches used within our Prote´ge´ extensions, and future work.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Observation Modeling using Prote´ge´-OWL</title>
      <p>1 http://ecoinformatics.org/oboe/oboe.1.0/oboe-core.owl
new classes via the standard class hierarchy panel. After a class is created or selected,
the plug-in displays the appropriate form on the right-side of the window. Each form
consists of a comment section as well as fields that can (optionally) be filled in. Most
of these fields are filled with classes that are selected using a tree-based class selection
widget, which constrains the choice of classes based on the type of class to be selected
and the other values of fields as appropriate (e.g., depending on the characteristic
chosen, only certain unit types can be selected). The Measurement form (shown in Fig. 2)
contains the largest number of fields of all the forms, and includes fields for an observed
entity, characteristic, standard, protocol, and zero or more context observations. Each
context observation consists of an optional relationship type and an entity class (e.g.,
Fig. 2 shows the FreshWater entity and the Within relationship).</p>
      <p>
        Fig. 3 shows the standard Prote´ge´ view for the class of Fig. 2. As shown, defining
this class using the Measurement form results in a non-trivial DL axiom. In this case, we
assert Measurement types (such as the one in Fig. 2) using an equivalent class axiom.
A measurement type can be viewed as a combination of a number of other classes,
and users can annotate data set attributes either directly via a measurement type or by
specifying the individual components (i.e., the entity, characteristic, standard, and so
on). By using equivalence classes, attributes can be classified into measurement types
automatically using a reasoner (such as Pellet), which also allows for data discovery
searches that are based either on measurement types or the individual components of
a measurement. We note that most other classes created using the OBOE plug-in are
defined using subclass axioms. As shown in Fig. 3 we also control the use of universal
and existential property restrictions largely following the conventions defined in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Summary and Future Work</title>
      <p>
        We have presented new extensions to Prote´ge´-OWL to simplify the creation of
observation and measurement ontologies. The extensions are being used to develop controlled
vocabularies within the Santa Barbara Coastal Long-Term Ecological Research Project
and within TraitNet (for managing trait-based ecological and evolutionary research
data). These ontologies consist of thousands of terms created using the form-based
approach described here. Unlike spreadsheet-based approaches (e.g., [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]), our approach
takes advantage of existing features in Prote´ge´ (e.g., for displaying and navigating
ontologies), provides a variety of quality assurance controls (e.g., ensuring appropriate
measurement units are chosen based on given characteristics), and offers a more
structured approach to ontology editing (similar to “term templates” [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). As future work
we are exploring ways to generalize the approach described here to allow developers to
automatically generate a set of Prote´ge´ forms for a given core ontology model.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bowers</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schildhauer</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A conceptual modeling framework for expressing observational data semantics</article-title>
          .
          <source>In: ER</source>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Berkley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bowers</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madin</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schildhauer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Improving data discovery for metadata repositories through semantic search</article-title>
          . In: CISIS. (
          <year>2009</year>
          )
          <fpage>1152</fpage>
          -
          <lpage>1159</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bowers</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kudo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schildhauer</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>ObsDB: A system for uniformly storing and querying heterogeneous observational data</article-title>
          . In: e-Science.
          <article-title>(</article-title>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Rector</surname>
          </string-name>
          , et al.:
          <article-title>OWL Pizzas: Practical experience of teaching OWL-DL: Common errors &amp; common patterns</article-title>
          .
          <source>In: EKAW</source>
          . (
          <year>2004</year>
          )
          <fpage>63</fpage>
          -
          <lpage>81</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mungall</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Representing phenotypes in OWL</article-title>
          . In: OWLED. (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>O</given-names>
            <surname>'Connor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Halaschek-Wiener</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Musen</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Mapping Master: A flexible approach for mapping spreadsheets to OWL</article-title>
          . In: ISWC. (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>P.</given-names>
            <surname>Rocca-Serra</surname>
          </string-name>
          , et al.:
          <article-title>Overcoming the ontology enrichment bottleneck with quick term templates</article-title>
          .
          <source>Applied Ontology</source>
          <volume>6</volume>
          (
          <year>2011</year>
          )
          <fpage>13</fpage>
          -
          <lpage>22</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>