=Paper= {{Paper |id=Vol-2518/paper-DAOSI2 |storemode=property |title=An Ontology-Mediated Space Science Digital Repository |pdfUrl=https://ceur-ws.org/Vol-2518/paper-DAOSI2.pdf |volume=Vol-2518 |authors=J. Steven Hughes,Daniel J. Crichton,Ronald S. Joyner |dblpUrl=https://dblp.org/rec/conf/jowo/HughesCJ19 }} ==An Ontology-Mediated Space Science Digital Repository== https://ceur-ws.org/Vol-2518/paper-DAOSI2.pdf
         An Ontology-Mediated Space Science
                 Digital Repository
               J. Steven Hughesa,1, Daniel J. Crichton a, and Ronald S. Joyner a
                                a
                                  Jet Propulsion Laboratory
                            a
                              California Institute of Technology


             Abstract. The Planetary Data System, NASA’s official archive for Solar System
             Exploration data, has transitioned to an archival information system based on ISO
             standards for the long-term preservation of digital data. The ontology-based PDS4
             Information Model provides the informational requirements to address the system’s
             mission to efficiently collect, archive, and make accessible the digital data and
             documentation produced by or relevant to NASA’s planetary missions. Adopted
             internationally, this ontology-mediated information system brings together inputs
             from a variety of sources in an open and interoperable fashion.

             Keywords. Digital, Repository, Ontology, Information, Model, Mediated, Science



1. Introduction

The Planetary Data System (PDS) [1] is NASA’s official archive for Solar System
Exploration science data. It is a federation of science discipline nodes formed in response
to the findings of the Committee on Data Management and Computing (CODMAC) [2]
that a “wealth of science data would ultimately cease to be useful and probably lost if a
process was not developed to ensure that the science data were properly archived.”
     Starting operations in 1990, the stated mission of the PDS is to “facilitate
achievement of NASA’s planetary science goals by efficiently collecting, archiving, and
making accessible digital data and documentation produced by or relevant to NASA’s
planetary missions, research programs, and data analysis programs.”
     After about twenty years of successful operations, the PDS transitioned to a more
modern system [3,4,5] using lessons-learned and foundational principles from the Open
Archival Information System Reference Model (OAIS-RM) [6]. The OAIS-RM states
that the digital repository must define the designated community and its associated
knowledge base. Complementing the key CODMAC finding that “the science
community must be engaged in all aspects of a science data repository if the data are to
remain scientifically useful to the community over the long term” this suggests that
ontologies would be useful for capturing the planetary science knowledge base.
     A task was subsequently initiated to create ontologies that would remain
independent but actively drive the development and evolution of the archival system.
The resulting ontologies form the core of an agile data management and curation system

     1
       J. Steven Hughes, Architecture And Systems Engineering, Jet Propulsion Laboratory, 4800 Oak Grove
Drive, Pasadena, California, USA; steve.hughes@jpl.nasa.gov. Copyright © 2019 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
having characteristics of adaptive planning, early delivery, evolutionary development,
continuous improvement, and rapid and flexible response to change.


2. The PDS4 Information Model

The Protégé [7] information modeling tool was chosen for the development of two
ontologies. The first ontology was an implementation of the ISO/IEC 11179 [8] Metadata
Registry (MDR) standard. This standard meets many of the requirements for the detailed
definitions required for science object classes and their attributes. These requirements
include the ability to define domain data types, value ranges and character lengths, units
of measure, and terminological names, aliases, and sources.
     The MDR standard also provides strategies that help address significant issues
associated with metadata governance, primarily the impact on metadata due to changes
in the science community. The PDS Information Model adopts a key strategy of the MDR
standard that provides and implements a multi-level governance hierarchy. The ontology
is partitioned into namespaces and each namespace is governed independently by a
steward.
     The second or core ontology is “a representation of concepts, relationships,
constraints, rules, and operations to specify data semantics for a chosen domain of
discourse” [9]. It provides a sharable, stable, and organized structure of information
requirements that supports an agile data curation environment. The combination of the
two ontologies is called the PDS4 Information Model.

2.1. Foundational Concepts

The foundational concepts for the core ontology of the PDS4 Information Model are
derived from the Information Model provided in the OAIS RM, starting with the
“Information Object”. Its extensions include object classes for the following information
categories:
     • Identification – allows information object to be discovered and accessed.
     • Representation/Format - allows a data object to be interpreted.
     • Fixity - ensures the information object has not been unintentionally altered.
     • Provenance – provides essential authenticity of information objects.
     • Context - describes the environment in which the data object was created.
     • Reference - allows the information objects to be referenced.
     • Access Rights - identifies the access restrictions pertaining to the data.
The PDS4 Information Model addresses these concepts that are required for long-term
preservation and provides the means for the PDS mission's data and documentation to be
efficiently collected, archived, and made accessible.
     The knowledge acquisition phase for the core ontology of the PDS Information
Model was a multi-year effort that required the collaboration of information architects
and experts from the diverse scientific domains in the Planetary Science community.
This effort resulted in what is called the “Common dictionary” consisting of the object
classes and relationships used across the science domains. The Common dictionary,
illustrated as a concept map in Figure 1, has rapidly stabilized after a few years of use.
                                     Figure 1. Common Model


     Two useful characteristics result from the use of ontologies to drive agile
development. First, as mentioned previously, the information architecture remains
independent of the implemented system’s architecture including its implementation
choices. This allows the science domains and the information technology to evolve
separately. This reduces the impact of change.
     The second property, the Information Model’s multi-level governance hierarchy,
partitions the model into a common and many discipline and local models, each governed
by a steward. Each steward is given significant autonomy but is constrained by the
ontology’s modeling principles. This creates a loosely coupled set of consistent models.
     In order to improve data interoperability, an original large set of diverse and often
complex data formats are reduced to a small standard set of simple data formats that are
designed for long-term stability. The reduced standard set is sufficient for most planetary
science data. More complex data formats are accommodated by using compositions of
the standard formats. However, this conceptualization increases the complexity
restrictions, such as no longer allowing the interleaving of two or more data objects. For
example, engineering data may no longer be prefixed to each row of a simple raster
image but must be grouped into a separate data object.
     The Information Model defines one Product class. This class is extended into a set
of products sufficient for the various categories of data, including observational data,
ancillary data, contextual information, and documents. Each Product consists of a
detached metadata label with a unique, immutable identifier and one or more data objects.
The identifier can be versioned. The Product label is defined from and validated against
the PDS4 Information Model. The Product label provides data format, identification,
reference, integrity, provenance, and context information.
     PDS4 Products may reference each other. This results in the formation of a semantic
network of linked data. There are two aggregate products, Collection and Bundle. A
Collection groups related “base” products. A Bundle groups related Collections.
     The content of the PDS4 Information Model is filtered, translated, and written to
various file formats as shown in Figure 2. Since the PDS chose XML to label Products,
the content of the Information Model is written to XML Schema and Schematron files.
These files are subsequently used to create and validate the product’s XML label. Product
label validation is largely accomplished using the XML Schema and Schematron files,
effectively tens-of-thousands of lines of auto-generated declarations.

        PDS4 System
                                                                         XML Schema
        Requirements
                                                                         Schematron
                             Ontology                 IMTool
                             Modeling             Data Object Model




                                                                       PDS4 Information
      Planetary Science                                               Model Specification
          Common                                                           (HTML)
        Requirements       PDS4 Ontology                PDS4
                              Protégé               Information
                                                       Model
                                                     (Common)
                                                                          PDS4 Data
                                                                          Dictionary
        Metadata Model                                                    (DocBook,
           ISO-14721                                                      HTML, PDF)
         Open Archival
      Information System                                Filter
             (OAIS)                                   Translate
                                                       Export          PDS4 Information
                                                                         Model Dump
                                                                            (JSON)


       Metadata Model
         ISO/IEC 11179                                   PDS
      Metadata Registry                              Information       PDS4 Information
      Information Model                                 Model            Model Dump
                                                     (Common +          (RDF, OWL, XMI,
                                                      Local Data        RDBM Schema)
                                                      Dictionary)




                                                                      Software/Services
      Planetary Science     Local Data                                Configuration Files
                                                      LDDTool
      Discipline/Mission    Dictionary                 Parse
        Requirements        (IngestLDD)               Validate
                                                       Ingest



                                        Figure 2. Information Flow



2.2. Maintaining Relevancy in a Diverse and Evolving Science Discipline

After the completion of the Common model, the development of discipline and local
models started. These models are called Local Data Dictionaries (LDDs). Each discipline
LDD involves specific areas of expertise, for example Cartography. To create a
discipline LDD, one or more discipline specialists take “stewardship” responsibility to
design and maintain an LDD for their specific area of expertise. To shield discipline
experts from the complexities of the data modeling process, a modeling “template” and
associated validation tool were developed that constrain the designer to a simple design
methodology and selected references to classes and attributes defined in the Common
model. The LDD template itself is defined in the Common model and so has an XML
Schema and Schematron file. The LDD designer populates the LDD template to create
the LDD.
     The tool validates the populated LDD template by temporarily "ingesting" the LDD
into the Common model to check for consistency. This framework allows stewards to
design and maintain their models in an environment that is loosely coupled to the
Common model and other LDDs.
     This paradigm is repeated at the local level for missions and projects. The tool allows
the “stacking” of two or more LDDs when cross-referencing is desired. References
between LDDs for reuse of object class definitions are negotiated between stewards. The
resulting hierarchy is illustrated in Figure 3. Currently there are fifteen LDDs at the
discipline level and nine LDDs at the mission level. However new missions and the
migration of legacy mission data will substantially increase the number of LDDs at the
mission level.

                     Common               Information Object
                                          Product, Array, Table,
                                          …
                     Discipline           Imaging
                                          Cartography
                                          Spectral
                                          …
                     Mission              InSight
                                          Cassini
                                          Voyager 1
                                          Voyager 2
                                          …
                                    Figure 3. Model Hierarchy



3. Conclusion

The PDS4 Information Model consists of two stable ontologies and a rapidly expanding
group of discipline and local ontologies. These mediating ontologies form the core of an
independently managed model that provides the information requirements for software
and services configuration. An agile development cycle of use, feedback, planning,
change, test, and release keeps the PDS relevant in a diverse and evolving science
discipline and ensures future planetary scientists will have useful data for new science
inquiries.


4. Acknowledgements

The research was carried out at the Jet Propulsion Laboratory, California Institute of
Technology, under a contract with the National Aeronautics and Space Administration.

© 2019. All rights reserved.

The authors wish to acknowledge the PDS4 Data Design Working Group (DDWG) and
the Systems Design Working Group (SDWG) for their significant efforts in the design,
development, and implementation of the PDS4 information and system architectures.
These discipline experts remained committed, sought excellence, and provided first-rate
information without which PDS4 would not have been possible. The authors also wish
to acknowledge Sean Hardman for his PDS Systems Development leadership and David
Giaretta and the various teams responsible for the ISO standards. Finally, they would
like to recognize the support of the PDS Management Council and NASA Headquarters.
References

[1] Special Issue: The Planetary Data System, Planetary and Space Science, European Geophysical Society,
      ISSN 0032-0633, Volume 44, Number 1, January, 1996.
[2] National Research Council. 1986. Issues and Recommendations Associated with Distributed Computation
      and Data Management Systems for Space Science, Committee on Data Management and Computing,
      Space Studies Board, National Academy Press, Washington, DC, pp. 95.
[3] Hughes, J. S., Crichton, D. J., Mattman, C. A., “Ontology-Based Information Model Development for
      Science Information Reuse and Integration”, 10.1109/IRI.2009.5211603, IEEE International Conference
      on Information Reuse & Integration, 2009.
[4] Hughes, J. S., Crichton, D. J., Hardman, S., Law, E., Joyner, R., Ramirez, P., "PDS4: A model-driven
      planetary science data architecture for long-term preservation," Data Engineering Workshops (ICDEW),
      2014 IEEE 30th International Conference on , vol., no., pp.134,141, March 31 2014-April 4 2014.
[5] Crichton, D., Hughes, J.S., Sean Hardman, S., Law, E., Beebe, R., Morgan, T., Grayzeck, E., "Scalable
      Planetary Science Information Architecture for Big Science Data”, 2014 IEEE 10th International
      Conference on e-Science, Volume 2, 20-24 October 2014.
[6] ISO 14721:2012 - Space data and information transfer systems -- Open archival information system (OAIS)
      -- Reference model, ISO, 2003.
[7] (2013) The Protégé Ontology Editor and Knowledge Acquisition System website. [Online]. Available:
      http://protege.stanford.edu/.
[8] ISO/IEC 11179: Information Technology -- Metadata registries (MDR), ISO/IEC, 2008.
[9] Lee, Y. T., "Information Modeling: From Design To Implementation", Proceedings of the Second World
      Manufacturing Congress, pp 315-321, 1999.