<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Ontology Design Pattern towards Preservation of Computational Experiments</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Da Huo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jarek Nabrzyski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Charles F. Vardeman II</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Notre Dame</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>There has been work to provide preservation mechanisms that aim to capture the results of computational experiments. However, these preservation environments concern themselves with preserving the computational replication of numeric results rather than context of the calculation that is demanded for scienti cally reproducible and extensible results. This is insu cient to help researches to comprehend preserved experiments and computational artifacts since su cient documentation is lacking so understanding requires face to face meetings with their group members and external collaborators and may likely be lost. Computational experiments need to be described in both human and computer readable fashion. This paper aims at providing an ontology design pattern that conceptualizes any computational experiment artifacts and is aligned with other existing vocabularies such as the well-known W3C PROV vocabulary.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>As computational work becomes more and more integral to scienti c research,
an e cient approach to preserve and share experiments has become a concern
required by the reproducibility of the scienti c process. Although a variety of
technologies have been created to capture computational experiments, it is still a
struggle for scientists to understand the preserved experiment without a detailed
documentation.</p>
      <p>
        In this paper, we present an early work towards building an ontology called
Smart Container (SC) that begins with conceptualizing computational
experiments from the perspective of computational environments and activities within
those environments as provided by the Docker Linux container framework1 as a
preservation tool. Docker is a lightweight virtualization platform that has several
properties such as providing versioned le system, a modular design for
distribution of software components as well as a sustainable community that make
it attractive as a preservation tool. One community of collaborators at CERN2
is already exploring Docker to preserve high energy physics experiments. Our
approach is to eventually provide a mechanism that captures the additional
provenance of computational experiments in a machine readable approach using
the W3C standard RDF data model3 that has been shown to aid in
contextualization of scienti c experiment.[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] We reference or align to existing ontologies
and ontology patterns, such as PROV-O4, CSO5 and ACT6 to aid in
discoverability, interoperability and queryability and future extensibility.[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] By starting
with a formal model of Docker and the provenance of Docker activities, we hope
to provide a basis for 1) existing scienti c work ow frameworks and their
workow descriptions to be captured within a Linux container environment 2) data
to be integrated in a consistent manner and lastly 3) for a common description
of the environment. The ultimate goal being to provide automated tools that
\wrap" the existing Docker command line tool and infrastructure such that it is
transparent to the scientist but captures information necessary to populate the
metadata behind the scenes. Automated scienti c gateways that utilize Docker
as a deployment and execution platform would provide a further degree of
transparency and allow researchers a low barrier for utilization.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Toward the Formalization of \Smart Containers"</title>
      <p>
        We propose the construction of a \Smart Container Ontology" using a modular
approach by systematic alignment of concepts present in Docker as a
computational environment where computational activities occur by a modular approach
reusing vocabulary terms where possible to contextualize those concepts and
creation. This pattern based approach is believed to facilitate some control over
unintended consequences of large ontologies and entailment inconsistencies
introduced by their logic descriptions.[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] The modular ontologies PROV, CSO and
ACT, three widely-used vocabularies. The purpose of connecting these patterns
and creating a small specialization is to assist in answering competency
questions such as 1) \What are the requirements for a computational activity?",
2)\What was the environment in which the activity was performed in terms
of software components?",3)\What is the order in which provisioning activities
must occur?", 4) \What software agents are responsible for a particular result
or outcome". A brief description of these conceptual building blocks follows.
      </p>
      <p>
        PROV-O is a W3C recommendation that describes \information about
entities, activities, and people involved in producing a piece of data or thing, which
can be used to form assessments about its quality, reliability or trustworthiness"7
and facilitates the exchange of this information between di erent systems and
under di erent contexts. PROV-O has been demonstrated to have reasonable
exibility and has been shown to have promise for enhancing alignment between
di erent ontologies.[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] Therefore, we propose to use PROV-O as a foundational
building block to facilitate connection to other vocabularies and preservation
e orts. The Core Software Ontology(CSO)[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is a modular ontology formalizing
common concepts from software engineering, such as data, software and
execution of software on some hardware. We reference CSO using rdfs:seeAlso but
do not import CSO directly to avoid entailment of relations beyond the scope
of this application. ACT8 ontology design pattern developed using the Vocamp
      </p>
      <sec id="sec-2-1">
        <title>4 http://www.w3.org/ns/prov#</title>
        <p>5 http://cos.ontoware.org/cso#
6 http://descartes-core.org/ontologies/activity/1.0/ActivityPattern#
7 http://www.w3.org/TR/2013/NOTE-prov-overview-20130430/
8 http://ontologydesignpatterns.org/wiki/Submissions:An_Ontology_Design_</p>
        <p>
          Pattern_for_Activity_Reasoning
process for ontology engineering9, describes the common core concept for
reasoning about activities.[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] It combines two perspectives for modeling activity,
temporally-ordered entities in space-time and a work ow in planning-related
applications. This axiomatization that facilitates temporal ordering of
requirements and outputs essential to capture Docker as a provisioning environment is
not provided by prov:Activity and must be imported from ACT.
2.1
        </p>
        <p>
          A Proposed Pattern
A Docker based execution engine can be divided into at least two types of
activities: provisioning activities that create an appropriate environment for
a computational activity to occur that directly produce scienti c
observations. Provisioning activities include necessary procedures, such as prepare OS,
install software, install software libraries, and download data, contributing to
the \computational environment" for a scienti c computation. Computational
activities are actual experiment steps and work ows for a scienti c investigation
which involving generating and analyzing research data. We propose aligning
these activity concepts with prov:Activity which represents "something that
occurs over a period of time and acts upon or with entities.[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] Within Docker,
an individual activity requires inputs to execute and it generates some result for
future steps. A running container, its requirement is an image to start from and
it generates a new image as an outcome. A software agent, such as bash, python
and Docker, acting on behalf of a human, involves each activity is described by
prov:SoftwareAgent and can connect to the human user as a prov:Person.
Both concepts subclass prov:Agent. We propose identifying a human agent
using URI's constructed from ORCID identi ers to facilitate investigator and
potential publication identities to be propagated. Concepts referenced from CSO
using rdfs:seeAlso include the human readable encoding of the whole
computational experiment as SoftwareAsCode, and uses the pattern to seperate
InformationObject from InformationRealization which is executing code
from CSO.
        </p>
        <p>In Fig.1. we present a concept map of the application of proposed pattern
with respect to Docker. The class sc:runningContainer, representing a Docker
container, is a specialization of prov:Activity and act:Activity. It has a
requirement from ACT that is a sc:RequiredImage, a subclass of prov:Entity
and act:Requirement. The outcome class from ACT is represented by
sc:OutcomeImage, a subclass of prov:Entity and act:Outcome. A
sc:SoftwareAgent is a direct subclass of prov:SoftwareAgent standing for
the software executes activities on behalf of the human user. In Fig.2. we present
a concept map of the specialization of the activity pattern. The classes
sc:provisonActivity and sc:computationalActivity are a subclass of
sc:runningContainer. Provisioning activities are in uenced by sc:provisionPlan
which subclass prov:Plan. A sc:computationalActivity is also in uenced by
a sc:work owPlan10. Provisioning activities produce a computational
environment that is a requirement for computational activities.</p>
      </sec>
      <sec id="sec-2-2">
        <title>9 http://vocamp.org/wiki/Main_Page</title>
        <p>10 see http://www.w3.org/TR/2015/NOTE-hcls-dataset-20150514/#s7n_1 section
7.1.2 for details
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>In this paper, we present an ontology pattern named Smart Container that
contextualizes computational experiments using Docker as an infrastructure
example. We populated our ontology design pattern by analyzing main concepts in
computational experiment and aligned with PROV-O, CSO and ACT to provide
possibilities for wider extensions.</p>
      <p>Acknowledgement We acknowledge funding from NSF grant PHY-1247316
\DASPOS: Data and Software Preservation for Open Science" and the Center for
Research Computing at UND. We also acknowledge helpful conversations with
Michelle Cheatham, Stian Soiland-Reyes, Matthew Gamble and Carole Goble.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abdalla</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carral</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janowicz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>An ontology design pattern for activity reasoning</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Compton</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corsar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , K.:
          <article-title>Sensor data provenance: Ssno and prov-o together at last</article-title>
          . In: To appear 7th International Semantic Sensor Networks Workshop (
          <year>October 2014</year>
          ) (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Ontology design patterns for semantic web content</article-title>
          . pp.
          <volume>262</volume>
          {
          <fpage>276</fpage>
          . Springer (
          <year>2005</year>
          ), http://link.springer.com/chapter/10.1007/11574620\_
          <fpage>21</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Janowicz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adams</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vardeman</surname>
            <given-names>II</given-names>
          </string-name>
          , C.:
          <article-title>Five stars of Linked Data vocabulary use</article-title>
          . Semantic Web http://iospress.metapress.com/ index/053766UR810L7274.pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lebo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahoo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belhajjame</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cheney</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corsar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garijo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soiland-Reyes</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zednik</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Prov-o: The prov ontology</article-title>
          .
          <source>W3C Recommendation</source>
          <volume>30</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldstein</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zednik</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duggan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aulenbach</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>West</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tilmes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Ontology engineering in provenance enablement for the national climate assessment 61,</article-title>
          <volume>191</volume>
          {205, http://linkinghub.elsevier. com/retrieve/pii/S1364815214002254
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Oberle</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grimm</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>An ontology for software</article-title>
          .
          <source>In: Handbook on ontologies</source>
          , pp.
          <volume>383</volume>
          {
          <fpage>402</fpage>
          . Springer (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>