<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Ontology-Mediated Space Science Digital Repository</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>California Institute of Technology</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>J. Steven Hughes</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Planetary Data System, NASA's official archive for Solar System Exploration data, has transitioned to an archival information system based on ISO standards for the long-term preservation of digital data. The ontology-based PDS4 Information Model provides the informational requirements to address the system's mission to efficiently collect, archive, and make accessible the digital data and documentation produced by or relevant to NASA's planetary missions. Adopted internationally, this ontology-mediated information system brings together inputs from a variety of sources in an open and interoperable fashion.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>Digital</kwd>
        <kwd>Repository</kwd>
        <kwd>Ontology</kwd>
        <kwd>Information</kwd>
        <kwd>Model</kwd>
        <kwd>Mediated</kwd>
        <kwd>Science</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>The Planetary Data System (PDS) [1] is NASA’s official archive for Solar System</title>
      </sec>
      <sec id="sec-1-2">
        <title>Exploration science data. It is a federation of science discipline nodes formed in response</title>
        <p>
          to the findings of the Committee on Data Management and Computing (CODMAC) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
that a “wealth of science data would ultimately cease to be useful and probably lost if a
process was not developed to ensure that the science data were properly archived.”
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>Starting operations in 1990, the stated mission of the PDS is to “facilitate</title>
        <p>achievement of NASA’s planetary science goals by efficiently collecting, archiving, and
making accessible digital data and documentation produced by or relevant to NASA’s
planetary missions, research programs, and data analysis programs.”</p>
      </sec>
      <sec id="sec-1-4">
        <title>After about twenty years of successful operations, the PDS transitioned to a more</title>
        <p>
          modern system [
          <xref ref-type="bibr" rid="ref3 ref4 ref5">3,4,5</xref>
          ] using lessons-learned and foundational principles from the Open
        </p>
      </sec>
      <sec id="sec-1-5">
        <title>Archival Information System Reference Model (OAIS-RM) [6]. The OAIS-RM states</title>
        <p>that the digital repository must define the designated community and its associated
knowledge base. Complementing the key CODMAC finding that “the science
community must be engaged in all aspects of a science data repository if the data are to
remain scientifically useful to the community over the long term” this suggests that
ontologies would be useful for capturing the planetary science knowledge base.</p>
      </sec>
      <sec id="sec-1-6">
        <title>A task was subsequently initiated to create ontologies that would remain</title>
        <p>independent but actively drive the development and evolution of the archival system.</p>
      </sec>
      <sec id="sec-1-7">
        <title>The resulting ontologies form the core of an agile data management and curation system</title>
        <p>having characteristics of adaptive planning, early delivery, evolutionary development,
continuous improvement, and rapid and flexible response to change.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. The PDS4 Information Model</title>
      <sec id="sec-2-1">
        <title>The Protégé [7] information modeling tool was chosen for the development of two</title>
        <p>
          ontologies. The first ontology was an implementation of the ISO/IEC 11179 [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] Metadata
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Registry (MDR) standard. This standard meets many of the requirements for the detailed definitions required for science object classes and their attributes. These requirements include the ability to define domain data types, value ranges and character lengths, units of measure, and terminological names, aliases, and sources.</title>
      </sec>
      <sec id="sec-2-3">
        <title>The MDR standard also provides strategies that help address significant issues</title>
        <p>associated with metadata governance, primarily the impact on metadata due to changes
in the science community. The PDS Information Model adopts a key strategy of the MDR
standard that provides and implements a multi-level governance hierarchy. The ontology
is partitioned into namespaces and each namespace is governed independently by a
steward.</p>
      </sec>
      <sec id="sec-2-4">
        <title>The second or core ontology is “a representation of concepts, relationships,</title>
        <p>
          constraints, rules, and operations to specify data semantics for a chosen domain of
discourse” [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. It provides a sharable, stable, and organized structure of information
requirements that supports an agile data curation environment. The combination of the
two ontologies is called the PDS4 Information Model.
2.1. Foundational Concepts
The foundational concepts for the core ontology of the PDS4 Information Model are
derived from the Information Model provided in the OAIS RM, starting with the
“Information Object”. Its extensions include object classes for the following information
categories:
• Identification – allows information object to be discovered and accessed.
• Representation/Format - allows a data object to be interpreted.
• Fixity - ensures the information object has not been unintentionally altered.
• Provenance – provides essential authenticity of information objects.
• Context - describes the environment in which the data object was created.
• Reference - allows the information objects to be referenced.
        </p>
        <p>• Access Rights - identifies the access restrictions pertaining to the data.</p>
      </sec>
      <sec id="sec-2-5">
        <title>The PDS4 Information Model addresses these concepts that are required for long-term preservation and provides the means for the PDS mission's data and documentation to be efficiently collected, archived, and made accessible.</title>
      </sec>
      <sec id="sec-2-6">
        <title>The knowledge acquisition phase for the core ontology of the PDS Information</title>
      </sec>
      <sec id="sec-2-7">
        <title>Model was a multi-year effort that required the collaboration of information architects</title>
        <p>and experts from the diverse scientific domains in the Planetary Science community.</p>
      </sec>
      <sec id="sec-2-8">
        <title>This effort resulted in what is called the “Common dictionary” consisting of the object classes and relationships used across the science domains. The Common dictionary, illustrated as a concept map in Figure 1, has rapidly stabilized after a few years of use.</title>
      </sec>
      <sec id="sec-2-9">
        <title>Two useful characteristics result from the use of ontologies to drive agile</title>
        <p>development. First, as mentioned previously, the information architecture remains
independent of the implemented system’s architecture including its implementation
choices. This allows the science domains and the information technology to evolve
separately. This reduces the impact of change.</p>
      </sec>
      <sec id="sec-2-10">
        <title>The second property, the Information Model’s multi-level governance hierarchy,</title>
        <p>partitions the model into a common and many discipline and local models, each governed
by a steward. Each steward is given significant autonomy but is constrained by the
ontology’s modeling principles. This creates a loosely coupled set of consistent models.</p>
        <p>In order to improve data interoperability, an original large set of diverse and often
complex data formats are reduced to a small standard set of simple data formats that are
designed for long-term stability. The reduced standard set is sufficient for most planetary
science data. More complex data formats are accommodated by using compositions of
the standard formats. However, this conceptualization increases the complexity
restrictions, such as no longer allowing the interleaving of two or more data objects. For
example, engineering data may no longer be prefixed to each row of a simple raster
image but must be grouped into a separate data object.</p>
      </sec>
      <sec id="sec-2-11">
        <title>The Information Model defines one Product class. This class is extended into a set</title>
        <p>of products sufficient for the various categories of data, including observational data,
ancillary data, contextual information, and documents. Each Product consists of a
detached metadata label with a unique, immutable identifier and one or more data objects.</p>
      </sec>
      <sec id="sec-2-12">
        <title>The identifier can be versioned. The Product label is defined from and validated against the PDS4 Information Model. The Product label provides data format, identification, reference, integrity, provenance, and context information.</title>
      </sec>
      <sec id="sec-2-13">
        <title>PDS4 Products may reference each other. This results in the formation of a semantic</title>
        <p>network of linked data. There are two aggregate products, Collection and Bundle. A</p>
      </sec>
      <sec id="sec-2-14">
        <title>Collection groups related “base” products. A Bundle groups related Collections.</title>
      </sec>
      <sec id="sec-2-15">
        <title>The content of the PDS4 Information Model is filtered, translated, and written to</title>
        <p>various file formats as shown in Figure 2. Since the PDS chose XML to label Products,
the content of the Information Model is written to XML Schema and Schematron files.</p>
      </sec>
      <sec id="sec-2-16">
        <title>These files are subsequently used to create and validate the product’s XML label. Product label validation is largely accomplished using the XML Schema and Schematron files, effectively tens-of-thousands of lines of auto-generated declarations.</title>
        <p>PDS4 System
Requirements
Planetary Science</p>
        <p>Common
Requirements
Metadata Model</p>
        <p>ISO-14721
Open Archival
Information System</p>
        <p>(OAIS)
Metadata Model
ISO/IEC 11179
Metadata Registry
Information Model
Planetary Science
Discipline/Mission
Requirements</p>
        <p>Ontology
Modeling
PDS4 Ontology</p>
        <p>Protégé
Local Data
Dictionary
(IngestLDD)</p>
        <p>IMTool
Data Object Model</p>
        <p>PDS4
Information</p>
        <p>Model
(Common)
Filter
Translate
Export</p>
        <p>PDS
Information
Model
(Common +
Local Data
Dictionary)
LDDTool
Parse
Validate
Ingest</p>
        <p>XML Schema
Schematron
PDS4 Information
Model Specification
(HTML)
PDS4 Data
Dictionary
(DocBook,</p>
        <p>HTML, PDF)
PDS4 Information
Model Dump</p>
        <p>(JSON)
PDS4 Information
Model Dump
(RDF, OWL, XMI,
RDBM Schema)
Software/Services</p>
        <p>Configuration Files
2.2. Maintaining Relevancy in a Diverse and Evolving Science Discipline</p>
      </sec>
      <sec id="sec-2-17">
        <title>After the completion of the Common model, the development of discipline and local</title>
        <p>models started. These models are called Local Data Dictionaries (LDDs). Each discipline
LDD involves specific areas of expertise, for example Cartography. To create a
discipline LDD, one or more discipline specialists take “stewardship” responsibility to
design and maintain an LDD for their specific area of expertise. To shield discipline
experts from the complexities of the data modeling process, a modeling “template” and
associated validation tool were developed that constrain the designer to a simple design
methodology and selected references to classes and attributes defined in the Common
model. The LDD template itself is defined in the Common model and so has an XML</p>
      </sec>
      <sec id="sec-2-18">
        <title>Schema and Schematron file. The LDD designer populates the LDD template to create the LDD.</title>
      </sec>
      <sec id="sec-2-19">
        <title>The tool validates the populated LDD template by temporarily "ingesting" the LDD</title>
        <p>into the Common model to check for consistency. This framework allows stewards to
design and maintain their models in an environment that is loosely coupled to the</p>
      </sec>
      <sec id="sec-2-20">
        <title>Common model and other LDDs.</title>
        <p>This paradigm is repeated at the local level for missions and projects. The tool allows
the “stacking” of two or more LDDs when cross-referencing is desired. References
between LDDs for reuse of object class definitions are negotiated between stewards. The
resulting hierarchy is illustrated in Figure 3. Currently there are fifteen LDDs at the
discipline level and nine LDDs at the mission level. However new missions and the
migration of legacy mission data will substantially increase the number of LDDs at the
mission level.</p>
      </sec>
      <sec id="sec-2-21">
        <title>Common</title>
      </sec>
      <sec id="sec-2-22">
        <title>Discipline</title>
      </sec>
      <sec id="sec-2-23">
        <title>Mission</title>
      </sec>
      <sec id="sec-2-24">
        <title>Information Object</title>
      </sec>
      <sec id="sec-2-25">
        <title>Product, Array, Table, …</title>
      </sec>
      <sec id="sec-2-26">
        <title>Imaging</title>
      </sec>
      <sec id="sec-2-27">
        <title>Cartography</title>
      </sec>
      <sec id="sec-2-28">
        <title>Spectral …</title>
      </sec>
      <sec id="sec-2-29">
        <title>InSight</title>
      </sec>
      <sec id="sec-2-30">
        <title>Cassini</title>
      </sec>
      <sec id="sec-2-31">
        <title>Voyager 1</title>
      </sec>
      <sec id="sec-2-32">
        <title>Voyager 2 …</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusion</title>
      <p>The PDS4 Information Model consists of two stable ontologies and a rapidly expanding
group of discipline and local ontologies. These mediating ontologies form the core of an
independently managed model that provides the information requirements for software
and services configuration. An agile development cycle of use, feedback, planning,
change, test, and release keeps the PDS relevant in a diverse and evolving science
discipline and ensures future planetary scientists will have useful data for new science
inquiries.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Acknowledgements</title>
      <p>© 2019. All rights reserved.</p>
      <sec id="sec-4-1">
        <title>The research was carried out at the Jet Propulsion Laboratory, California Institute of</title>
      </sec>
      <sec id="sec-4-2">
        <title>Technology, under a contract with the National Aeronautics and Space Administration.</title>
      </sec>
      <sec id="sec-4-3">
        <title>The authors wish to acknowledge the PDS4 Data Design Working Group (DDWG) and</title>
        <p>the Systems Design Working Group (SDWG) for their significant efforts in the design,
development, and implementation of the PDS4 information and system architectures.</p>
      </sec>
      <sec id="sec-4-4">
        <title>These discipline experts remained committed, sought excellence, and provided first-rate</title>
        <p>information without which PDS4 would not have been possible. The authors also wish
to acknowledge Sean Hardman for his PDS Systems Development leadership and David</p>
      </sec>
      <sec id="sec-4-5">
        <title>Giaretta and the various teams responsible for the ISO standards. Finally, they would like to recognize the support of the PDS Management Council and NASA Headquarters.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Special</given-names>
            <surname>Issue</surname>
          </string-name>
          :
          <article-title>The Planetary Data System, Planetary</article-title>
          and Space Science,
          <source>European Geophysical Society, ISSN 0032-0633</source>
          , Volume
          <volume>44</volume>
          ,
          <string-name>
            <surname>Number</surname>
            <given-names>1</given-names>
          </string-name>
          ,
          <string-name>
            <surname>January</surname>
          </string-name>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>National</given-names>
            <surname>Research Council</surname>
          </string-name>
          .
          <year>1986</year>
          .
          <article-title>Issues and Recommendations Associated with Distributed Computation and Data Management Systems for Space Science, Committee on Data Management and Computing, Space Studies Board</article-title>
          , National Academy Press, Washington, DC, pp.
          <fpage>95</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Hughes</surname>
            ,
            <given-names>J. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crichton</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mattman</surname>
            ,
            <given-names>C. A.</given-names>
          </string-name>
          ,
          <source>“Ontology-Based Information Model Development for Science Information Reuse and Integration”</source>
          ,
          <volume>10</volume>
          .1109/IRI.
          <year>2009</year>
          .5211603, IEEE International Conference on Information Reuse &amp; Integration,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Hughes</surname>
            ,
            <given-names>J. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crichton</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hardman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Law</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joyner</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramirez</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <article-title>"PDS4: A model-driven planetary science data architecture for long-term preservation," Data Engineering Workshops (ICDEW</article-title>
          ),
          <year>2014</year>
          IEEE 30th International Conference on , vol., no., pp.
          <volume>134</volume>
          ,
          <issue>141</issue>
          , March 31
          <fpage>2014</fpage>
          -
          <lpage>April</lpage>
          4
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Crichton</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hughes</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sean Hardman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Law</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beebe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Morgan,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Grayzeck</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          ,
          <source>"Scalable Planetary Science Information Architecture for Big Science Data”</source>
          ,
          <source>2014 IEEE 10th International Conference on e-Science</source>
          , Volume
          <volume>2</volume>
          ,
          <fpage>20</fpage>
          -
          <lpage>24</lpage>
          October
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[6] ISO</source>
          <volume>14721</volume>
          :
          <fpage>2012</fpage>
          -
          <article-title>Space data and information transfer systems -- Open archival information system (OAIS) -- Reference model</article-title>
          ,
          <source>ISO</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7] (
          <year>2013</year>
          )
          <article-title>The Protégé Ontology Editor and Knowledge Acquisition System website</article-title>
          . [Online]. Available: http://protege.stanford.edu/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8] ISO/IEC 11179: Information Technology --
          <source>Metadata registries (MDR)</source>
          ,
          <source>ISO/IEC</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>Y. T.</given-names>
          </string-name>
          ,
          <article-title>"Information Modeling: From Design To Implementation"</article-title>
          ,
          <source>Proceedings of the Second World Manufacturing Congress</source>
          , pp
          <fpage>315</fpage>
          -
          <lpage>321</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>