Representing Dockerfiles in RDF

Riccardo Tommasini1 , Ben De Meester2 , Pieter Heyvaert2 , Ruben Verborgh2 ,
                 Erik Mannens2 , Emanuele Della Valle1
                      1
                      Politecnico di Milano, DEIB, Milan, Italy
              {riccardo.tommasini,emanuele.dellavalle}@polimi.it
                        2
                          Ghent University – imec – IDLab,
         Department of Electronics and Information Systems, Ghent, Belgium
    {ben.demeester,pheyvaer.heyvaert,ruben.verborgh,erik.mannens}@ugent.be


        Abstract. Containers – lightweight, stand-alone software executables –
        are everywhere. Industries exploit container managers to orchestrate
        complex cloud infrastructures and researchers in academia use them to
        foster reproducibility of computational experiments. Among existing so-
        lutions, Docker is the de facto standard in the container industry. In this
        paper, we advocate the value of applying the Linked Data paradigm to
        the container ecosystem’s building scripts, as it will allow adding ad-
        ditional knowledge, ease decentralized references, and foster interoper-
        ability. In particular we defined a vocabulary Dockeronto that allows to
        semantically annotate Dockerfiles.

        Keywords: container, Docker, Linked Data, vocabulary


1     Introduction

Linux Containers 3 (e.g., lxc) are an operating system-level virtualization tech-
nique that revolutionized the way software is packaged and distributed. Com-
panies exploit lxc to manage complex infrastructures, either internally, e.g., by
means of OpenStack4 , or deployed on one of the available cloud solutions, e.g.,
Microsoft Azure5 and Amazon Web Services6 . Among the available container
solutions, Docker 7 rapidly became the de facto standard and, more recently,
it started influencing academic research, because it helps solving a number of
fundamental concerns that address reproducibility and repeatability of experi-
ments, as put forward by Boettiger [2]. Docker guarantees (i) modular reuse of
software packages, (ii) a portable environment, (iii) public sharing by means of
web repositories (Docker Registry), and (iv) versioning.
    Docker provides a set of concepts for the creation and initialization of con-
tainers: (i) a Docker Image is a software package containing a single application,
3
  https://linuxcontainers.org/
4
  https://www.openstack.org/
5
  https://azure.microsoft.com/
6
  https://aws.amazon.com/
7
  https://www.docker.com/
2        R. Tommasini et al.


1   FROM ubuntu : l a t e s t
2   RUN apt−g e t update            apt−g e t i n s t a l l −y python python−p i p wget
3   RUN p i p i n s t a l l F l a s k
4   ADD h e l l o . py /home/ h e l l o . py

    Listing 1.1: A Dockerfile that installs a Python application on Ubuntu.


(ii) a Docker file is the script that contains the instructions used to build the
image, and (iii) a Docker Container is a runable instance of an image.
    The build instructions are at the core of what functionality a container offers.
Although, this works in the Docker ecosystem, outside this ecosystem these in-
structions are not sharable and extendable in a machine-understandable manner.
For example, (i) providing additional information about specific instructions is
only limited through the use of comments in the script, (ii) refering to specific
instructions outside of the complete context of its specific script is not easily
achievable, and (iii) machine-understandibility is limited as the build instruc-
tions are not self-descriptive. Therefore, we advocate the use of Linked Data
principles8 to make these build instructions available. However, to apply these
principles a vocabulary is required to semantically annotate these instructions.
    Therefore, in previous efforts, Label-Schema.org9 and Smart Containers [4]
were introduced. However, they do not consider the build instructions. Label-
Schema.org proposes a set of build-time labels for containers in the form of
org.label-schema.[key]=[value] that can be used to add metadata to the
built Docker Image. Smart Containers model Docker concepts using prov-o [1],
focusing on the environment where computational experiments are executed, but
they remain high level.
    In this paper, we present Dockeronto 10 . It is a vocabulary that builds on the
idea of Smart Containers to semantically annotate Docker files. Furthermore, it
uses the Function Ontology (fno) [5] to represent a Docker file’s instructions.


2     Dockeronto in a Nutshell

In this section, we introduce Dockeronto via an example11 . The Docker file of
Listing 1.1 installs and runs a Python application on top of the latest available
Ubuntu image and executes the following types of instructions:

 1. FROM specifies the base image from which the current Docker file inherits
    all the functionalities,
 2. RUN executes a command within the image (at build time), and
 3. ADD copies a file from the host file systems into the image file system.
8
   https://www.w3.org/DesignIssues/LinkedData.html
9
   http://label-schema.org/rc1/
10
   https://github.com/riccardotommasini/dockeronto
11
   The documentation and more elaborate examples are available at https://github.
   com/riccardotommasini/dockeronto.
                                                               Representing Dockerfiles in RDF                           3


 1   do : from a f n o : Function , do : I n s t r u c t i o n ;
 2      fno : expects     ( do : imageInputParam             ) ;
 3      fno : returns     ( do : imageOutputParam ) .
 4
 5   do : run a f n o : Function , do : I n s t r u c t i o n ;
 6      fno : expects    ( do : imageInputParam             do : runInputCommand ) ;
 7      fno : returns    ( do : imageOutputParam ) .
 8
 9   do : runInputCommand a f n o : Parameter ;
10      f n o : p r e d i c a t e do : runCmd ;                     f n o : t y p e do : Command .
11   do : imageInputParam a f n o : Parameter ;
12      f n o : p r e d i c a t e do : i m a g e I n p u t ;        f n o : t y p e do : Image        .
13   do : imageOutputParam a f n o : Output ;
14      f n o : p r e d i c a t e do : imageOutput ;                f n o : t y p e do : Image        .

                 Listing 1.2: from and run instruction in Dockeronto


ex : d o c k e r f i l e 1 a do : D o c k e r f i l e ;
              do : c o n t a i n s ( ex : i n s 1 ex : i n s 2 ex : i n s 3 ex : i n s 4   );

ex : i n s 1 f n o : e x e c u t e s do : from ; do : fromValue <ubuntu : l a t e s t >;
             r d f s : l a b e l ” I n s t a l l l a t e s t Ubuntu ” ;
             r d f s : comment ”We a l w a y s want t o have t h e l a t e s t Ubuntu u p d a t e s . ” ;
             d c t e r m s : c r e a t o r ex : r i c c a r d o .
ex : i n s 2 f n o : e x e c u t e s do : run ; do : runCmd ” apt−g e t update && . . . ” .
             rdfs : l a b e l ” I n s t a l l necessary dependencies .”
ex : i n s 3 f n o : e x e c u t e s do : run ;       do : runCmd            ” pip i n s t a l l Flask ” .
             d c t e r m s : c r e a t o r ex : ben .
ex : i n s 4 f n o : e x e c u t e s do : add ;       do : s r c ” h e l l o . py ” ; do : d s t ”/home/ h e l l o . py ” ;
             r d f s : comment ” T h i s s c r i p t shows h e l l o w o r l d . ” .

           Listing 1.3: rdf representation of a Dockerfile with Dockeronto.


    We represent these instructions using the Function Ontology (fno) [5]. List-
ing 1.2 shows the fno descriptions for FROM and RUN. For example, the RUN
instruction expects an image and a command as input parameters (line 6):
the image is the intermediate image from the previous instruction, and the com-
mand is, e.g., apt-get update && apt-get install -y python python-pip
wget (Listing 1.1, line 2). Note that the modeling of the individual run commands
is not in scope of this work, as this is not specific to the Docker file syntax. For
the time being, they are described as string values. The instruction’s output is a
new image (line 7), either to be used for the next instruction, or as the resulting
image of the Docker file.
    Listing 1.3 shows the Turtle serialization of the example Docker file using
Dockeronto. This representation is queryable yet still executable, because we
use an rdf:List to retain the ordering of the Docker file instructions since it
influences the output Docker Image. We consider intermediate images, which
are generated during build time, but since they can be inferred we did not
describe them explicitly. Last but not least, we added additional information to
the instructions, such as labels, comments, and their creators.
    Outside the context of this Docker file it is also possible to refer to specific
instructions without the need to know the complete Docker file or even the fact
4          R. Tommasini et al.


ex : r i c c a r d o dbo : c r e a t e d ex : i n s 2 , ex : i n s 4 ;

ex : r e v i e w I n s 3 a schema : Review ;
                         schema : itemReviewed ex : i n s 3 ;
                         schema : c o n t r i b u t o r ex : r i c c a r d o .

Listing 1.4: Knowledge about specific instructions outside the Docker file context.


that the instructions are related to Docker . For example, in Listing 1.4 additional
knowledge about who created the instructions is provided, together with a review
of one specific instruction.


3      Conclusion
The development of Dockeronto is an important step to improve the use of Docker
build instructions outside the context of a Docker file. This in turn allows to work
towards applying the Linked Data principles. The extensibility and shareability
is improved, and the build instructions are now self-descriptive. As was shown
in the example, (i) additional knowledge can be easily added to the instructions,
(ii) (references to) instructions can be shared outside the context of a Docker file,
and (iii) the use of semantic annotations via Dockeronto allows for self-descriptive
instructions that are understandable even outside of the Docker ecosystem.
     For the future, we envision an Linked Container ecosystem, where semantic
technologies are used to empower development workflows, track provenance and
develop semantic microservices [3].


References
1. Belhajjame, K., Cheney, J., Corsar, D., Garijo, D., Soiland-Reyes, S., Zednik, S.,
   Zhao, J.: PROV-O: The PROV Ontology. Recommendation, World Wide Web Con-
   sortium (W3C) (Apr 2013), https://www.w3.org/TR/prov-o/, accessed June 14th,
   2017
2. Boettiger, C.: An introduction to docker for reproducible research. Operating Sys-
   tems Review 49(1), 71–79 (2015), http://doi.acm.org/10.1145/2723872.2723882
3. Fernández-Villamor, J.I., Iglesias, C.A., Garijo, M.: Microservices - lightweight ser-
   vice descriptions for REST architectural style. In: ICAART 2010 - Proceedings of
   the International Conference on Agents and Artificial Intelligence, Volume 1 - Ar-
   tificial Intelligence, Valencia, Spain, January 22-24, 2010. pp. 576–579 (2010)
4. Huo, D., Nabrzyski, J., Vardeman, C.: Smart container: an ontology towards con-
   ceptualizing docker. In: Proceedings of the ISWC 2015 Posters & Demonstrations
   Track, Bethlehem, PA, USA, October 11, 2015. (2015)
5. Meester, B.D., Dimou, A., Verborgh, R., Mannens, E.: An ontology to semantically
   declare and describe functions. In: The Semantic Web - ESWC 2016 Satellite Events,
   Heraklion, Crete, Greece, May 29 - June 2, 2016, Revised Selected Papers. pp. 46–49
   (2016), https://doi.org/10.1007/978-3-319-47602-5_10