Representing Dockerfiles in RDF Riccardo Tommasini1 , Ben De Meester2 , Pieter Heyvaert2 , Ruben Verborgh2 , Erik Mannens2 , Emanuele Della Valle1 1 Politecnico di Milano, DEIB, Milan, Italy {riccardo.tommasini,emanuele.dellavalle}@polimi.it 2 Ghent University – imec – IDLab, Department of Electronics and Information Systems, Ghent, Belgium {ben.demeester,pheyvaer.heyvaert,ruben.verborgh,erik.mannens}@ugent.be Abstract. Containers – lightweight, stand-alone software executables – are everywhere. Industries exploit container managers to orchestrate complex cloud infrastructures and researchers in academia use them to foster reproducibility of computational experiments. Among existing so- lutions, Docker is the de facto standard in the container industry. In this paper, we advocate the value of applying the Linked Data paradigm to the container ecosystem’s building scripts, as it will allow adding ad- ditional knowledge, ease decentralized references, and foster interoper- ability. In particular we defined a vocabulary Dockeronto that allows to semantically annotate Dockerfiles. Keywords: container, Docker, Linked Data, vocabulary 1 Introduction Linux Containers 3 (e.g., lxc) are an operating system-level virtualization tech- nique that revolutionized the way software is packaged and distributed. Com- panies exploit lxc to manage complex infrastructures, either internally, e.g., by means of OpenStack4 , or deployed on one of the available cloud solutions, e.g., Microsoft Azure5 and Amazon Web Services6 . Among the available container solutions, Docker 7 rapidly became the de facto standard and, more recently, it started influencing academic research, because it helps solving a number of fundamental concerns that address reproducibility and repeatability of experi- ments, as put forward by Boettiger [2]. Docker guarantees (i) modular reuse of software packages, (ii) a portable environment, (iii) public sharing by means of web repositories (Docker Registry), and (iv) versioning. Docker provides a set of concepts for the creation and initialization of con- tainers: (i) a Docker Image is a software package containing a single application, 3 https://linuxcontainers.org/ 4 https://www.openstack.org/ 5 https://azure.microsoft.com/ 6 https://aws.amazon.com/ 7 https://www.docker.com/ 2 R. Tommasini et al. 1 FROM ubuntu : l a t e s t 2 RUN apt−g e t update apt−g e t i n s t a l l −y python python−p i p wget 3 RUN p i p i n s t a l l F l a s k 4 ADD h e l l o . py /home/ h e l l o . py Listing 1.1: A Dockerfile that installs a Python application on Ubuntu. (ii) a Docker file is the script that contains the instructions used to build the image, and (iii) a Docker Container is a runable instance of an image. The build instructions are at the core of what functionality a container offers. Although, this works in the Docker ecosystem, outside this ecosystem these in- structions are not sharable and extendable in a machine-understandable manner. For example, (i) providing additional information about specific instructions is only limited through the use of comments in the script, (ii) refering to specific instructions outside of the complete context of its specific script is not easily achievable, and (iii) machine-understandibility is limited as the build instruc- tions are not self-descriptive. Therefore, we advocate the use of Linked Data principles8 to make these build instructions available. However, to apply these principles a vocabulary is required to semantically annotate these instructions. Therefore, in previous efforts, Label-Schema.org9 and Smart Containers [4] were introduced. However, they do not consider the build instructions. Label- Schema.org proposes a set of build-time labels for containers in the form of org.label-schema.[key]=[value] that can be used to add metadata to the built Docker Image. Smart Containers model Docker concepts using prov-o [1], focusing on the environment where computational experiments are executed, but they remain high level. In this paper, we present Dockeronto 10 . It is a vocabulary that builds on the idea of Smart Containers to semantically annotate Docker files. Furthermore, it uses the Function Ontology (fno) [5] to represent a Docker file’s instructions. 2 Dockeronto in a Nutshell In this section, we introduce Dockeronto via an example11 . The Docker file of Listing 1.1 installs and runs a Python application on top of the latest available Ubuntu image and executes the following types of instructions: 1. FROM specifies the base image from which the current Docker file inherits all the functionalities, 2. RUN executes a command within the image (at build time), and 3. ADD copies a file from the host file systems into the image file system. 8 https://www.w3.org/DesignIssues/LinkedData.html 9 http://label-schema.org/rc1/ 10 https://github.com/riccardotommasini/dockeronto 11 The documentation and more elaborate examples are available at https://github. com/riccardotommasini/dockeronto. Representing Dockerfiles in RDF 3 1 do : from a f n o : Function , do : I n s t r u c t i o n ; 2 fno : expects ( do : imageInputParam ) ; 3 fno : returns ( do : imageOutputParam ) . 4 5 do : run a f n o : Function , do : I n s t r u c t i o n ; 6 fno : expects ( do : imageInputParam do : runInputCommand ) ; 7 fno : returns ( do : imageOutputParam ) . 8 9 do : runInputCommand a f n o : Parameter ; 10 f n o : p r e d i c a t e do : runCmd ; f n o : t y p e do : Command . 11 do : imageInputParam a f n o : Parameter ; 12 f n o : p r e d i c a t e do : i m a g e I n p u t ; f n o : t y p e do : Image . 13 do : imageOutputParam a f n o : Output ; 14 f n o : p r e d i c a t e do : imageOutput ; f n o : t y p e do : Image . Listing 1.2: from and run instruction in Dockeronto ex : d o c k e r f i l e 1 a do : D o c k e r f i l e ; do : c o n t a i n s ( ex : i n s 1 ex : i n s 2 ex : i n s 3 ex : i n s 4 ); ex : i n s 1 f n o : e x e c u t e s do : from ; do : fromValue ; r d f s : l a b e l ” I n s t a l l l a t e s t Ubuntu ” ; r d f s : comment ”We a l w a y s want t o have t h e l a t e s t Ubuntu u p d a t e s . ” ; d c t e r m s : c r e a t o r ex : r i c c a r d o . ex : i n s 2 f n o : e x e c u t e s do : run ; do : runCmd ” apt−g e t update && . . . ” . rdfs : l a b e l ” I n s t a l l necessary dependencies .” ex : i n s 3 f n o : e x e c u t e s do : run ; do : runCmd ” pip i n s t a l l Flask ” . d c t e r m s : c r e a t o r ex : ben . ex : i n s 4 f n o : e x e c u t e s do : add ; do : s r c ” h e l l o . py ” ; do : d s t ”/home/ h e l l o . py ” ; r d f s : comment ” T h i s s c r i p t shows h e l l o w o r l d . ” . Listing 1.3: rdf representation of a Dockerfile with Dockeronto. We represent these instructions using the Function Ontology (fno) [5]. List- ing 1.2 shows the fno descriptions for FROM and RUN. For example, the RUN instruction expects an image and a command as input parameters (line 6): the image is the intermediate image from the previous instruction, and the com- mand is, e.g., apt-get update && apt-get install -y python python-pip wget (Listing 1.1, line 2). Note that the modeling of the individual run commands is not in scope of this work, as this is not specific to the Docker file syntax. For the time being, they are described as string values. The instruction’s output is a new image (line 7), either to be used for the next instruction, or as the resulting image of the Docker file. Listing 1.3 shows the Turtle serialization of the example Docker file using Dockeronto. This representation is queryable yet still executable, because we use an rdf:List to retain the ordering of the Docker file instructions since it influences the output Docker Image. We consider intermediate images, which are generated during build time, but since they can be inferred we did not describe them explicitly. Last but not least, we added additional information to the instructions, such as labels, comments, and their creators. Outside the context of this Docker file it is also possible to refer to specific instructions without the need to know the complete Docker file or even the fact 4 R. Tommasini et al. ex : r i c c a r d o dbo : c r e a t e d ex : i n s 2 , ex : i n s 4 ; ex : r e v i e w I n s 3 a schema : Review ; schema : itemReviewed ex : i n s 3 ; schema : c o n t r i b u t o r ex : r i c c a r d o . Listing 1.4: Knowledge about specific instructions outside the Docker file context. that the instructions are related to Docker . For example, in Listing 1.4 additional knowledge about who created the instructions is provided, together with a review of one specific instruction. 3 Conclusion The development of Dockeronto is an important step to improve the use of Docker build instructions outside the context of a Docker file. This in turn allows to work towards applying the Linked Data principles. The extensibility and shareability is improved, and the build instructions are now self-descriptive. As was shown in the example, (i) additional knowledge can be easily added to the instructions, (ii) (references to) instructions can be shared outside the context of a Docker file, and (iii) the use of semantic annotations via Dockeronto allows for self-descriptive instructions that are understandable even outside of the Docker ecosystem. For the future, we envision an Linked Container ecosystem, where semantic technologies are used to empower development workflows, track provenance and develop semantic microservices [3]. References 1. Belhajjame, K., Cheney, J., Corsar, D., Garijo, D., Soiland-Reyes, S., Zednik, S., Zhao, J.: PROV-O: The PROV Ontology. Recommendation, World Wide Web Con- sortium (W3C) (Apr 2013), https://www.w3.org/TR/prov-o/, accessed June 14th, 2017 2. Boettiger, C.: An introduction to docker for reproducible research. Operating Sys- tems Review 49(1), 71–79 (2015), http://doi.acm.org/10.1145/2723872.2723882 3. Fernández-Villamor, J.I., Iglesias, C.A., Garijo, M.: Microservices - lightweight ser- vice descriptions for REST architectural style. In: ICAART 2010 - Proceedings of the International Conference on Agents and Artificial Intelligence, Volume 1 - Ar- tificial Intelligence, Valencia, Spain, January 22-24, 2010. pp. 576–579 (2010) 4. Huo, D., Nabrzyski, J., Vardeman, C.: Smart container: an ontology towards con- ceptualizing docker. In: Proceedings of the ISWC 2015 Posters & Demonstrations Track, Bethlehem, PA, USA, October 11, 2015. (2015) 5. Meester, B.D., Dimou, A., Verborgh, R., Mannens, E.: An ontology to semantically declare and describe functions. In: The Semantic Web - ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29 - June 2, 2016, Revised Selected Papers. pp. 46–49 (2016), https://doi.org/10.1007/978-3-319-47602-5_10