=Paper=
{{Paper
|id=Vol-2878/paper4
|storemode=property
|title=A Smart Development Environment for Infrastructure as Code
|pdfUrl=https://ceur-ws.org/Vol-2878/paper4.pdf
|volume=Vol-2878
|authors=Jesús Gorroñogoitia,Zoe Vasileiou,Emilio Imperiali,Indika Kumara,Dragan Radolović,Georgios Meditskos
}}
==A Smart Development Environment for Infrastructure as Code==
<pdf width="1500px">https://ceur-ws.org/Vol-2878/paper4.pdf</pdf>
<pre>
A Smart Development Environment for
Infrastructure as Code
Jesús Gorroñogoitia1 , Zoe Vasileiou2 , Emilio Imperiali3 , Indika Kumara4 ,
Dragan Radolović5 and Georgios Meditskos2
1
  ATOS, Spain
2
  Information Technologies Institute, Centre for Research and Technology Hellas, Greece
3
  Politecnico di Milano, Italy
4
  Jheronimus Academy of Data Science, Tilburg University, The Netherlands
5
  XLAB Research, Slovenia


                                         Abstract
                                         Cloud computing is a mature paradigm that has evolved to accommodate ever-increasing complex
                                         applications such as in the AI and HPC domain. If applications are complex, infrastructure can be even
                                         more, spanning over hybrid architectures. As such, producing a less error-prone deployment while
                                         offering high performance requires application and infrastructure awareness, and also deep knowledge
                                         of the IaC languages. In this paper, we present the SODALITE IDE, a suite that assists the users in the
                                         authoring of application deployment topology and infrastructure models for IaC. With focus on quality
                                         and performance, the IDE enables the faster and simpler development of IaC by offering features such as
                                         in-sync multiple model viewpoints , smart context-aware content assistance and semantic validation,
                                         powered by a Knowledge Base.

                                         Keywords
                                         Infrastructure as Code, TOSCA, Ansible, IDE, Semantic, Ontology


1. Introduction
In recent years, the global market has seen a tremendous rise in utility computing serving as
the backend for practically any new technology, methodology or advancement from healthcare
to aerospace. SODALITE addresses complex tasks of configuration, deployment, and operation
of complex applications. The development of these tasks implies knowledge of multiple IaC
scripting languages and being able to manage the whole development process of IaC. Given
those intricacies, the simplification and abstraction of those DevOps processes is uppermost.
To this end, SODALITE provides tools for a simpler and faster development of IaC, deployment
and execution of heterogeneous applications in HPC, Cloud & Software-Defined(SW) defined
computing environments, with particular focus on quality, performance, and manageability.
Following this vision, SODALITE offers smart modeling capabilities to help non-expert DevOps
teams in defining Abstract Application Deployment Models (AADMs). The main novelty of

First workshop on trustworthy software and open source, March 23-25, 2021, Virtual Conference
Envelope-Open jesus.gorronogoitia@atos.net (J. Gorroñogoitia); zvasilei@iti.gr (Z. Vasileiou); emilio.imperiali@mail.polimi.it
(E. Imperiali); i.p.k.weerasinghadewage@tilburguniversity.edu (I. Kumara); dragan.radolovic@xlab.si
(D. Radolović); gmeditsk@iti.gr (G. Meditskos)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
SODALITE regarding the support to the definition of AADMs is its ability to simplify this task
by offering semantic-based guidance to the end users.
   The purpose of this paper is to describe the SODALITE Integrated Development Environment
(SODALITE IDE) highlighting its ability to create trustworthy AADMs. The paper is structured
as follows: Section 2 presents the state of the art in the creation of AADMs; Section 3 presents
the design and implementation of the SODALITE IDE in detail; Section 4 provides a description
of the preliminary evaluation of the IDE; Finally, Section 5 draws the conclusion and delineate
the plan for future work.


2. Background and Related Work
2.1. TOSCA and IaC
TOSCA (The Topology and Orchestration Specification for Cloud Applications) [1] is an
OASIS standard language that enables the interoperable description of applications and cloud
infrastructure services, their relationships, and the operational behavior of those services. The
TOSCA metamodel supports the definition and description of those services through a topology
definition which consists of Node Templates representing components of the application such as
databases, virtual machines or web servers. These templates are instances of the Node Types that
define the schema of the infrastructural or application component such as which properties, and
requirements can be used. Regarding the operational behavior, node types might define lifecycle
operations to implement the behavior that an orchestration engine can invoke at runtime when
instantiating a node. These operations are used to configure a database or start a server, and
are backed by script implementation artifacts such as Chef or Ansible.

2.2. Related Work
Several modeling languages have been proposed for supporting the specification of complex
application topologies and their deployment into infrastructures including Cloud [1] [2] [3] [4]
and HPC [5] [6]. TOSCA is a fast emerging standard in the Cloud realm, but lacks visual notation,
leading to the appearance of authoring tools with their own non-standardized visual notation.
For addressing this limitation, researchers have designed their proposals to standardize the
visual notation for TOSCA.
   Winery [7] is a Web TOSCA visual editor, that can be also included, as a plugin. It separates
modeling concerns to support not only resource experts on the specification of TOSCA types,
but also application owners on the definition of their application topologies. CloudCAMP
DSML [3], a web-based editor, supports the generation of IaC deployment models from users’
abstract business-oriented requirements for the creation of application component topologies by
utilizing TOSCA node templates and relationships. Alien4Cloud1 , a Web-based editor, enables
application owners to design their deployment topology, as an orchestration of components
instantiated from types retrieved from a common TOSCA types Catalogue.
   The main novelty of the SODALITE IDE is to support the complete specification of both
application deployment topologies, and of the resources the application requires on the target
   1
       https://alien4cloud.github.io/
infrastructure as model instances of the SODALITE DSL. This DSL has been designed as
an abstraction that leverages TOSCA to facilitate the export of AADM topologies as TOSCA
blueprints into the SODALITE IaC environment. Moreover, the SODALITE IDE includes support
for the creation of Ansible Models (AMs) explicitly associated with the resource node types
defined in a Resource Model (RM) and application node types defined in AADMs, thus fully
covering design and operation of the node types.


3. SODALITE IDE: A Smart Environment for Developing
   Trustworthy IaC
3.1. Design Goals
The primary goals for the SODALITE IDE are as follows:

    • Portability. A application should be deployed over various types of infrastructures
      with little or no modification to the descriptions of the application’s deployment model.
      To support portability, we use the TOSCA standard and the IaC approach to defining
      and enacting application’s deployment, the containerization to packaging application
      components, and the MDE (model-driven engineering) approach to converting the abstract
      deployment models into concrete TOSCA and IaC scripts.
    • Reduce Coding Effort. The end-users with little or no expertise in heterogeneous
      infrastructures or complex IaC should be able to design the deployment models of the
      applications with little effort. To reduce coding effort, IDE provides a high-level program-
      ming model (DSL), and helps the users with the context assistance and recommendations
      for various modeling tasks.
    • Support Developing Trustworthy Deployment Codes. The end-users should be able
      to develop the deployment artifacts that do not contain known smells, bugs, and errors.
      To achieve this goal, the SODALITE IDE uses a various types of the reasoning support
      for TOSCA and IaC, from syntax validation to smell detection.

3.2. Architecture of SODALITE IDE
Figure 1 shows the architecture of the SODALITE IDE, which consists of four main layers:
deployment modeling, deployment preparation, orchestration, and intelligence.

3.2.1. Deployment Modeling Layer
This layer contains the main interface with the user through the SODALITE IDE. All the model
representations are instances of the SODALITE DSL, which has been designed as a meta-model
for creating model instances based on that schema. All the models are designed to collect
from users the minimum set of deployment information needed for synthesizing of the TOSCA
blueprint based on the assistance provided by the Intelligence Layer. Additionally, the IDE
is providing various dashboards giving, in such a way, access to the whole lifecycle of the
application.
Figure 1: The SODALITE generic architecture framework


3.2.2. Intelligence Layer
This layer is vertical to all the other layers as it is the main storage of all the resources. In
this layer, the cloud applications and infrastructures are semantically represented as abstracted
reusable patterns. Those semantic abstractions are realised in the form of RDF Knowledge
graphs, aiming at the formal representation and linking of the information that enables a
semantic reasoning framework to be developed on of those RDF graphs to support search,
discovery, validation and reuse. Through the mapping services, the DSL models given as input
from the Deployment Modeling layer are transformed to RDF graphs. For having less error-
prone deployments, in this layer, the IaC, such as TOSCA blueprints and Ansible scripts, are
verified against smells/bugs using data-driven and rule-based techniques.

3.2.3. Deployment Preparation Layer
This layer is the glue of the Deployment Modeling and Intelligence layers with the runtime
execution of the models (Orchestration layer). The Image Builder supports pre-building of
images for targeting an OS virtualizer such as Docker, while the IaC builder produces a complete
and executable TOSCA blueprint based on the AADM verified by he Intelligence layer.

3.2.4. Orchestration Layer
This layer is in charge of the deployment of the applications into heterogeneous infrastruc-
tures through the Orchestrator which receives the application deployment model as a TOSCA
blueprint from the Deployment Preparation Layer. Also, in order to improve the application’s
performance, this layer is responsible for refactoring the application based on monitoring
parameters. The Refactorer searches for the best alternative deployment variant utilizing the
reasoning capabilities (matchmaking) of the Knowledge Base (the intelligence layer).

3.3. Using SODALITE IDE to Develop Trustworthy IaC
One of the main workflows supported by the SODALITE platform is the authoring of an AADM
for preparing the deployment of a complex application. The Resource Models(RMs) provide
TOSCA related modeling concepts supporting the definition of types for infrastructure resources,
and AADMs support the specification of application component instances, as TOSCA templates.
For fostering re-usability, during the authoring of the AADM, we reap the benefit of the resource
types stored in a common, shared SODALITE knowledge base (KB). The Intelligence of the IDE
mainly derives from the semantic Knowledge Base (KB) for assisting users developing the model
with hints, suggestions, and validation during the process of AADM development.


Figure 2: BPMN model describing the AADM authoring workflow.


   Figure 2 shows the AADM authoring process using a BPMN workflow diagram. Initially, the
user defines the AADM in the SODALITE DSL language by getting content iterative assistance
both for syntactic, and semantic content (provided by the KB). This assistance could include
retrieving compatible concepts, such as properties, and attributes, with the type schema of
each template, locating compatible hosts for an application or reusing resources matching
component’s requirements. Subsequently, the user submits the model, and in the back end, the
Semantic Reasoner processes the model. This process includes the mapping of the DSL model
to RDF Knowledge Graphs, and then three activities are performed:

    • Validation. In TOSCA, the type system supports inheritance as a type can extend another,
      inheriting all its concepts (e.g. properties, capabilities). Each template of the AADM
      is an instance of a specific type, namely an infrastructure resource, and gets validated
      against this type definition. Some of the validation cases are: (i) Verification of TOSCA
      Application Topologies in respect of all inter-component relationships [8] (ii) Required
      properties are missing
    • Semi-automatic Model Completion. To further abstract the model, some information
      can be omitted by the user, and be suggested or auto-completed by the Semantic Reasoner.
      For instance, Semantic Reasoner suggests node templates that could serve as require-
      ments, according to the type definition of the template, representing connections such as
      where the application/middleware component can be hosted or dependencies with other
      components. This missing information can be either suggested on the design time, or
      auto-completed on the deployment time.
    • Quality Assurance of IaC. To assure the quality and trustworthiness of TOSCA and
      IaC programs, we support calculating different quality metrics for them, and detecting
      various smells and linguistic issues in them using the ontology reasoning and the machine
      learning and NLP (Natural Language Programming) based techniques. More information
      about for our support be found in our earlier publications [9, 10, 11].

Then, the output of the aforementioned activities can lead to different paths. If the validation
process results in detected errors, then the model is not saved in the KB, the errors are returned,
and the user should redefine the model. If no errors are detected, the model is saved in KB. If
suggestions and/or IaC smells are returned then the user can optionally accept them, and adjust
the model by following again the same workflow.
   A similar process is followed in the design of the RMs and the optimization models (OMs).
The modelling of OMs are related with the static optimization of the application, for which
recommendations with quick-fixes are provided. Additionally, the modelling of Ansible models
(AMs) is supported representing the operation implementations as Ansible playbooks of the
infrastructure resources. This KB-powered assistance is planned for the AMs.

3.4. Implementation
The SODALITE IDE is a part of the SODALITE H2020 project [12]. It was designed, and
implemented in Eclipse as a set of plugins, offers editors for the models, described in Section 3.3,
namely, RMs, AADMs, OMs, and AMs constituting instances compliant with the SODALITE
DSLs. The RM and AADM DSLs are closely related to the TOSCA specification. This separation
of modeling concerns, targeting different modeling roles, simplifies the TOSCA complexity and
promotes the reusability of resources stored in the shared knowledge base (KB).
   The current release of the IDE2 offers textual (XText) editors for all the SODALITE DSLs,
and a graphical view representation (Sirius) for the AADM, which is automatically generated
(Figure 3). Sirius technology permits the creation of multiple graphical viewpoints of the
same DSL. These representations are synchronised, making changes in one representation
propagate to the others sharing the same underlying model. The IDE textual editors offer most
the features included by XText, remarkably the syntax coloring, and validation, formatting and
auto-completion, cross-reference navigation, quick-fix proposals, outline view, and others.
   Smart context-aware content-assistance and semantic validation is largely supported through
the symbolic inference capabilities of the KB. These RDF graphs capture both the structural
and semantics relationships of the models, promoting reusability and interoperability using
standard ontology languages (OWL2) and reasoning tools. In Figure 4, a part of a custom node
type is depicted as a UML graph inheriting other TOSCA normative types and containing a
description with various concepts. More information about SODALITE ontologies can be found
in an earlier publication[13].
   AADMs and RMs are stored into the KB, enabling the reuse of the resource instance and
types defined in them. Additionally, the SODALITE IDE provides support to browse the KB,
retrieve, modify, and delete models stored therein. It also supports other SODALITE processes,
    2
        https://github.com/SODALITE-EU/ide
Figure 3: AADM graphical and textual view representation


Figure 4: sodalite.nodes.DockerHost custom node type as a UML graph in Topbraid ME Composer


notably the deployment of AADMs into target infrastructures, the creation of resource images,
and the governance of deployments.
   Ansible Models are used to implement the operations defined in the RM or AADM. The
IDE provides support for this integration between TOSCA and Ansible, by allowing to couple
an AM with an operation from a RM or AADM node definition. In this way, while the user
is defining the AM, the IDE suggests the available inputs provided by the TOSCA operation,
that can be used in the AM. Furthermore, the AM DSL, while being related to the Ansible
specification, conceptually groups the Ansible attributes in categories based on their usage, for
better organization of the code. The IDE takes care of generating the concrete Ansible playbook
from the abstract AM, that can then be executed for the deployment in the target infrastructures.
Two demo videos, one for AADM and OM editors, and one for Ansible DSL editors are available
at figshare3 .


   3
       https://figshare.com/projects/SODALITE/101705
4. Evaluation
We first evaluated the feasibility of the SODALITE IDE using it to implement the three industrial
case studies of the SODALTE project, namely clinical trials, vehicle IoT, and Snow. Clinical
trials case study focuses the development of a simulation process chain supporting in-silico
clinical trials of bone-implant-systems. Vehicle IoT case study deploys a distributed system
for processing vehicular data over Cloud and Edge environments. Snow case study deploys
a workflow that processes snow images from multiple data sources to derive information on
mountain snow coverage. For each use case, using the SODALITE IDE, we developed the AADM
models for the application, and deploy the application based on the developed models. The case
study resources are in our GitHub repository.
   We next performed controlled experiments with two different types of users, namely non-
expert users (9 participants) and TOSCA experts (5 participants), with the objective to receive
feedback on perceived ease of use, usefulness and intention to use of the DSL and IDE. The
mentioned factors are usually determining the user acceptance of a particular technology.
All experiment participants have been asked to focus on the development of AADMs for a
Machine Mearning (ML) application, which consists of 1) a database that stores training data,
2) a component that trains a ML model 3) a repository that stores trained machine learning
models, 4) a component that makes predictions/inferences based on the trained models. The
use case owners used their corresponding use cases.
   The experiments with TOSCA experts showed the SODALITE can help to achieve 27.73%
improvement over the baseline consisting in using their typical textual editor for creating
a TOSCA topology. The experiments with both groups showed that the users consider the
SODALITE IDE very useful, easy to use, and think it has a high adoption potential. As a weak
point, some missing features of the IDE , the partial stability, and incomplete documentation
of the IDE have negatively impacted on the user’s effort and time. We plan to work on these
aspects in the next period.


5. Conclusion
In this paper, we presented an overview of the SODALITE IDE, a smart suite for IaC, providing
full support for the modeling of infrastructure resources, application deployment topologies,
application optimizations, and operation implementations, and also for the management of
deployed applications at runtime. In future, we intend to: (i) provide more advanced context-
assistance and validation services (ii) abstract the application model by permitting the con-
cretization of the AADM at deployment time by relying on Semantic Reasoning services (iii)
improve the AADM visual modelling authoring from palettes, (iv) enhance OM authoring and
(v) further interconnect the different DSLs.


Acknowledgments
This work has received funding from the European Union’s Horizon 2020 research and innova-
tion programme under grant agreement no. 825480 (H2020), SODALITE. We thank all members
of the SODALITE consortium for their inputs and feedbacks to the development of this paper.


References
 [1] Oasis, Oasis topology and orchestration specification for cloud applications version 1.0,
     2013. URL: http://docs.oasis-open.org/tosca/TOSCA/v1.0/os/TOSCA-v1.0-os.pdf.
 [2] M. Carlson, et al., Cloud application management for platforms (2012). URL: https://www.
     oasis-open.org/committees/download.php/47278/CAMP-v1.0.pdf.
 [3] A. Bhattacharjee, Y. D. Barve, A. Gokhale, Cloudcamp : A model-driven generative
     approach for automating cloud application deployment and management, 2017.
 [4] N. Ferry, et al., Towards model-driven provisioning, deployment, monitoring, and adap-
     tation of multi-cloud systems, in: 2013 IEEE Sixth International Conference on Cloud
     Computing, Santa Clara, CA, USA, 2013, pp. 887–894.
 [5] M. Palyart, D. Lugato, I. Ober, J.-M. Bruel, Mde4hpc: An approach for using model-driven
     engineering in high-performance computing, in: SDL 2011: Integrating System and
     Software Modeling, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 247–261.
 [6] M. Śmiałek, K. Rybiński, R. Roszczyk, K. Marek, Towards a unified requirements model
     for distributed high performance computing, Springer International Publishing, Cham,
     2020, pp. 1–20.
 [7] O. Kopp, T. Binz, U. Breitenbücher, F. Leymann, Winery - a modeling tool for tosca-based
     cloud applications, in: ICSOC, 2013.
 [8] A. Brogi, A. D. Tommaso, J. Soldani, Sommelier: A tool for validating tosca application
     topologies, in: 5th International Conference on Model-Driven Engineering and Software
     Development, 2017, pp. 1–22.
 [9] I. Kumara, G. Quattrocchi, D. Tamburri, W.-J. Van Den Heuvel, Quality assurance of
     heterogeneous applications: The sodalite approach, in: Advances in Service-Oriented and
     Cloud Computing, Springer International Publishing, 2021, pp. 173–178.
[10] I. Kumara, Z. Vasileiou, G. Meditskos, D. A. Tamburri, W.-J. Van Den Heuvel, A. Karakostas,
     S. Vrochidis, I. Kompatsiaris, Towards semantic detection of smells in cloud infrastructure
     code, WIMS 2020, Association for Computing Machinery, 2020, p. 63–67.
[11] N. Borovits, I. Kumara, P. Krishnan, S. D. Palma, D. Di Nucci, F. Palomba, D. A. Tamburri,
     W.-J. van den Heuvel, Deepiac: Deep learning-based linguistic anti-pattern detection in
     iac, MaLTeSQuE 2020, Association for Computing Machinery, 2020, p. 7–12.
[12] E. Di Nitto, J. Gorroñogoitia, I. Kumara, G. Meditskos, D. Radolović, K. Sivalingam, R. S.
     González, An approach to support automated deployment of applications on heterogeneous
     cloud-hpc infrastructures, in: 2020 22nd International Symposium on Symbolic and
     Numeric Algorithms for Scientific Computing (SYNASC), 2020, pp. 133–140.
[13] G. Meditskos, Z. Vasileiou, A. Karakostas, S. Vrochidis, I. Kompatsiaris, A pattern-based
     semantic lifting of cloud and hpc applications using owl 2 meta-modelling, in: Special
     Session on High Performance Services Computing and Internet Technologies, HPCS, 2020.

</pre>