<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Smart Container: an ontology towards conceptualizing Docker</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Da Huo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaroslaw Nabrzyski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Charles F. Vardeman II</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Formalize the SC Domain</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Notre Dame</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Because of growing demand to preserve and share reproducible computational experiments in scienti c community, there has been interest in using Docker Linux Containers as a preservation mechanism. However, this is insu cient to help researches to comprehend "Dockerized" experiments and connect computational artifacts with concepts in peer-reviewed publications. We present here an ontology and software, Smart Container, that can conceptualize Docker artifacts by and is aligned with other existing vocabularies such as the well known W3C prov vocabulary.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Given the complexity of software systems, an initial ontology certainly cannot
cover every single aspect related to Docker. Therefore, we focused on
conceptualizing essential terms involving running a computational experiment in Docker
importing existing ontologies and conceptual terms into an ontology pattern. The
purpose of this work is to ll the provenance gap between Docker infrastructure
and scienti c experiment artifacts and providing a vocabulary prototype that is
capable for future extension.
2.1</p>
      <sec id="sec-1-1">
        <title>Background</title>
        <p>The Smart Container Ontology was constructed from a systematic alignment
between the main concepts present in existing Docker meta-data utilizing and
existing vocabulary terms where possible to contextualize those meta-data
concepts. In our work, PROV and CSO, two widely-used vocabularies, were
introduced to construct the ground of Smart Container Ontology.</p>
        <p>
          PROV-O is a W3C recommendation that describes the interactions of
provenance generated in di erent systems and under di erent contexts[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Three main
types of concepts: prov:Entity, which represents objects; prov:Activity, which
describes an event happened over time involving entities; and prov:Agent, which is
responsible for an activity or an entity, constructed PROV-O. PROV-O has been
demonstrated to have reasonable exibility and has been shown to enable
alignment between other ontologies [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Therefore, we choose to use PROV-O as the
foundational \upper" ontology for the Smart Container Ontology to facilitate
connections with other vocabularies.
        </p>
        <p>
          The Core Software Ontology(CSO)[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is an ontology formalizing common
concepts in software engineering, such as data, software with its di erent shades
of meaning classes and methods. CSO uses DOLCE4 as a foundational ontology
and its extensions: Descriptions&amp;Situations(DnS)5, the Ontology of Plans(OoP)6
and the Ontology of Information Objects(OIO)7. CSO provided us with a
formalization of \software" concepts that we can apply to Smart Container domains.
However, because of the complexity of DOLCE, we do not import CSO directly
to avoid entailment of relations beyond the scope of this application.
        </p>
        <p>Docker is an application based on Linux Containers(LXC). It isolates an
application with its dependencies in a single process which is more light-weighted
than full hypervisor virtualization of guest operating system. It can be
provisioned by a simple Docker le text based work ow. Docker also adopted a layer
le system way to achieve versioning and component re-use. A Docker image is
a read-only layer which is stateless. A container has states: when it is running,
it represents a tree of processes isolated from other processes on the host; when
it exits, it represents a read-write layer generated by the process along with
its all underneath stateless images. We di erentiated these two concepts in our
ontology.
4 http://purl.org/ifgi/dolce#
5 http://www.loa.istc.cnr.it/ontologies/ExtendedDnS.owl
6 http://www.loa.istc.cnr.it/ontologies/Plans.owl
7 http://www.ontologydesignpatterns.org/ont/dul/IOLite.owl
2.2</p>
      </sec>
      <sec id="sec-1-2">
        <title>Alignment Pattern</title>
        <p>
          From inspection, a Docker image is a digital object with some attributes. It
matches the description of a prov:Entity: a physical, digital, conceptual, or other
kind of thing with some xed aspects[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The Docker le is a text le with lines of
Docker le commands. Docker fetches these commands and invokes the relevant
software to execute them. Each line in the Docker le generates an execution
inside the container. Because a container has states, we treat static (just created
or exited) container as a prov:Entity, which is similar to a docker image. A
running container is represented by a prov:Activity, which represents something
that occurs over a period of time and acts upon or with entities[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. For each
Docker container, a software can be bash, python, Docker itself or any other agent
executes commands. We extracted the software responsible for each command
execution as a prov:SoftwareAgent, which is a subclass of prov:Agent. From a
macro perspective, a series of operation in a computational experiment is always
associated with a human user. We aligned the human user with prov:Person,
sub-classing prov:Agent and use the prov:actedOnBehalfOf to connect the two.
        </p>
        <p>In computational experiments, we have to be very careful about some special
concepts from the computer science domain. The encoding of the whole
computational experiment, such as the Docker le, is similar to InformationObject
concept from CSO. The Docker itself in the experiment, on the other hand, is
similar to a form of CSO:InformationRealization which is a realization of code in
the machine. If we break the Docker le line by line, we also can treat each line of
command as a smaller InformationObject. The running container is analogous to
ComputationalActivity in CSO where the software manifests itself by a sequence
of tasks contained in a plan. Our approach is Ontology Pattern based creating
our own specialization but apply rdfs:seeAlso to terms in CSO, the weak sense
of identity without making strong ontological commitments based on DOLCE.</p>
        <p>In g 1, sc:Image, representing a Docker Image, is a specialization of
prov:Entity. Docker containers were divided into three parts: sc:startContainer,
sc:runningContainer and sc:endContainer. sc:startContainer and sc:endContainer
subclass prov:Entity representing static conditions. sc:RunningContainer, on the
other hand, subclasses prov:Activity as an event over time. An Docker le,
referenced as sc:Docker le, and a line of command, referenced as sc:Command, both
are subclasses of prov:Plan. The human user of the Docker le is identi ed as
sc:User which subclassing prov:Person. sc:SoftwareAgent is a direct subclass of
prov:SoftwareAgent standing for the software executes commands. We use a
technique similar to TrustURI's by using the Docker image 64 digit code uniquely can
be resolved by a uniform resource name(URN) with speci c protocols to create
a URI for a static image. Each running container can be exposed by a HTTP
address which is dereferenceable so we construct Uniform Resource Locator(URI)
in the normal manner. We identify a human agent using URI's constructed
from ORCID(Open Researcher and Contributor ID)identi er, a non-proprietary
alphanumeric code to uniquely identify scienti c and other academic authors,
facilitating investigator and potential publication identities to be propagated.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Conclusion</title>
      <p>In this paper, we present an ontology pattern named Smart Container that
contextualizes the docker software system acting as an infrastructure for
computational experiments. We populated our ontology design pattern by analyzing main
concepts in Docker and aligned with PROV-O and CSO to provide possibilities
for wider extensions.</p>
      <p>Acknowledgements. We acknowledge funding from NSF grant PHY-1247316
\DASPOS: Data and Software Preservation for Open Science."</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Compton</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corsar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , K.:
          <article-title>Sensor data provenance: Ssno and prov-o together at last</article-title>
          . In: To appear 7th International Semantic Sensor Networks Workshop (
          <year>October 2014</year>
          ) (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Lebo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sahoo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belhajjame</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cheney</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corsar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garijo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soiland-Reyes</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zednik</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Prov-o: The prov ontology</article-title>
          .
          <source>W3C Recommendation</source>
          <volume>30</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Oberle</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grimm</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>An ontology for software</article-title>
          .
          <source>In: Handbook on ontologies</source>
          , pp.
          <volume>383</volume>
          {
          <fpage>402</fpage>
          . Springer (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez-Perez</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belhajjame</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klyne</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Cuesta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garrido</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hettne</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roos</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Roure</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goble</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Why work ows breakunderstanding and combating decay in taverna work ows</article-title>
          . In: E-
          <string-name>
            <surname>Science (</surname>
          </string-name>
          e-Science),
          <year>2012</year>
          IEEE 8th International Conference on. pp.
          <volume>1</volume>
          {
          <issue>9</issue>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>