Linking provenance with system logs: a context
       aware information integration and exploration
       framework for analyzing workflow execution
                                      Elias el Khaldi Ahanach, Spiros Koulouzis, Zhiming Zhao
                                          elias.el.khaldi@gmail.com, {S.Koulouzis|Z.Zhao}@uva.nl
                                                            Informatics Institute,
                                                          University of Amsterdam,
                                                         Amsterdam, The Netherlands


   Abstract—When executing scientific workflows in a distributed                   for carbon monitoring in atmosphere, ecosystems and
environment, anomalies of the workflow behavior are often                          marine environments, the European Plate Observing Sys-
caused by a mixture of different issues, e.g., careless design                     tem (EPOS)4 for solid earth science and Euro-Argo5 for
of the workflow logic, buggy workflow components, unexpected
performance bottlenecks or resource failure at the underlying                      collecting environmental observations from large-scale
infrastructure. The provenance information only defines data                       deployments of robotic floats in the world’s oceans.
evolution at the workflow level, which does not have an explicit                3) Virtual Research Environments (VREs) provide user-
connection with the system logs provided by the underlying                         centric support for discovering and selecting data and
infrastructure. Analyzing provenance information and apposite                      software services from different sources, and composing
system metrics requires expertise and a considerable amount of
manual effort. Moreover, it is often time-consuming to aggregate                   and executing application workflows [1], [2], also referred
this information and correlate events occurring at different levels                to as Virtual Laboratories or Science Gateways [3].
in the infrastructure. In this paper, we propose an architecture
                                                                               Although roles and functions of these different kinds of
to automate the integration among the workflow provenance
information with the performance information collected from                    environments may substantially overlap, we can distinguish
infrastructure nodes running workflow tasks. Our architecture                  that e-infrastructures focus on generic Information and com-
enables workflow developers or domain scientists to effectively                munications technology (ICT) resources (e.g., computing or
browse workflow execution information together with the system                 networking), RIs manage data and services focused on specific
metrics, and analyze contextual information for possible anoma-
                                                                               scientific domains, and VREs support the lifecycle of specific
lies.
                                                                               research activities. Although the boundaries between these en-
                              I. I NTRODUCTION                                 vironments are not always entirely clear (often sharing services
   In the last decades, researchers have been using sophis-                    for infrastructure and data management [4]), collectively they
ticated research support environments for efficient data dis-                  represent an important trend in many international research
covery, experiment management, and workflow composition                        and development projects.
and execution. These research support environments typically                      Research support environments combine multiple resources
include:                                                                       including federated clouds and repositories that allow the
  1) e-Infrastructures, e.g., EGI1 and EUDAT2 , focus on the                   sharing of service-oriented architecture (SOA) based scientific
      management of the service lifecycle of computing, storage                workflows [5]. These environments also offer the means to
      and network resources, and provide services to research                  store provenance data concerning the execution of scientific
      communities or other user groups to provision dedicated                  workflows.
      infrastructure and to manage persistent services and their                  When performing an experiment using a Workflow Manage-
      underlying storage, data processing and networking re-                   ment System (WFMS) in a VRE, different types of contextual
      quirements.                                                              information can be collected:
  2) Research infrastructures (RIs) are facilities, resources,                   • Provenance information provided by the workflow sys-
      and services constructed for specific scientific commu-                      tem.
      nities to conduct research. They can include scientific                    • Application logs monitored by the platform (e.g., by
      equipment, knowledge-based resources such as collec-                         Apache Tomcat or Java virtual machine).
      tions, archives or scientific data. Some RIs examples in-                  • System logs collected by the infrastructure monitoring
      clude the Integrated Carbon Observation System (ICOS)3                       systems.
  1 http://www.egi.eu/
  2 http://www.eudat.eu/                                                         4 https://www.epos-ip.org/
  3 https://www.icos-ri.eu/                                                      5 http://www.euro-argo.eu/


Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
Provenance (PROV)6 is a typical model often used for work-          and obscured for the user. Although the use of cloud technolo-
flow provenance. It models causality of workflow events using       gies provides scientists with a dedicated infrastructure which
concepts of agents, entities, and activities involved in data       combines data and computation [10] along with sophisticated
evolution [6]. It has been used in workflow systems like            monitoring tools, it is very difficult for a scientist or service de-
Apache Taverna to export information on workflow executions.        veloper to discover which Virtual Machine (VM) or container
Provenance can be stored using XML or RDF7 standards, with          is responsible for execution bottlenecks or failed workflow
many tools available for parsing and querying the data.             exertions.
   At the same time, monitoring systems provide metrics about          To be able to bridge the gap between the workflow abstrac-
the usage of e-Infrastructures resources, e.g., CPU usage,          tion level and the dedicated infrastructure we set the following
memory consumption, and network traffic. Those metrics can          objectives :
be useful for workflow developers to investigate application         1) Analyse the execution time of service-based scientific
behavior at the low-level resources, e.g., locating workflow            workflows
failures caused by the underlying infrastructure resource.           2) Detect bottlenecks that cause workflow performance
However, the provenance and system metrics differ in scope              degradation
of information and are provided by different sources, which          3) Detect the cause of bottlenecks
makes the integrated analysis difficult and time-consuming.          4) Present the analysis to a user
   In this paper, we propose a context-aware information             5) Make resource scaling suggestions to be used by scaling
integration and exploration framework for users to effectively          controllers
investigate possible workflow execution bottlenecks by com-
bining provenance with the system logs. We discuss an archi-                             III. R ELATED W ORK
tecture that will allow scientists or service developers to an-
alyze the execution of service-based scientific workflows and          In the past, there have been many efforts to detect the
visualize possible bottlenecks related with the infrastructure by   sources of performance loss or detect anomalies in the in-
getting a detailed view of the resource usage of each workflow      frastructure while executing scientific workflows.
task. By taking advantage of this information, infrastructure          In [11] the authors propose an approach that aims to help
administrators, service developers or scaling controllers may       workflow end-users and middleware developers to understand
configure the provisioned visualized infrastructure.                the sources of performance losses when executing scientific
   The rest of the paper is organized as follows: In section        workflows in Grid environments. To achieve that they propose
II we discuss the motivation for our solution. Section III          a model for estimating the ideal lowest execution time of
presents an overview of related works. Our proposed system          a workflow and then calculate the total overhead as the
architecture is given in Section IV while Sections V and VI         difference between the workflow’s measured Grid execution
are devoted to the assessment of our proposed solution. The         time and its ideal time. This work is focused on Grid envi-
paper concludes with Section VII.                                   ronments and depends on the presence of a specific scheduler
                                                                    named GRAM [12] to collect the state of each workflow task
                          II. M OTIVATION                           submitted to a grid site. Moreover, in Grid environments, the
                                                                    performance of the node that is assigned to execute a workflow
   By abstracting the application logic of steps or processes
                                                                    task is more or less stable. Once a workflow task is scheduled
in an experiment, the WFMS allows scientists to efficiently
                                                                    on a node it has exclusive use of its resources.
construct, execute and validate complex application at a high
level[7].                                                              Also focusing on Grid environments, the authors of [13]
                                                                    presented a simple for autonomous detection and handling of
   The adoption of cloud technologies together with the in-
                                                                    operational incidents in workflow activities. In this work, the
creasing popularity of containerization and DevOps practices
                                                                    authors attempt to classify the state of each task in a workflow
have made the development, deployment, and monitoring of
                                                                    and apply the appropriate rule in case of an error. In this
web services faster and more efficient.
                                                                    work issues like resource, scaling are not addressed since
   Scientific workflows usually interact with multiple web ser-
                                                                    Grid environments assign tasks in a static resource. In [14]
vices which are often hosted in different cloud infrastructures
                                                                    the authors have gathered eight months of workflow activity
and may be composed of numerous tasks[8]. When execut-
                                                                    from the e-BioInfra platform and present an analysis on task
ing such complex workflows, it is often hard to detect the
                                                                    failures in their e-Infrastructure. Although this work provides
underlying cause of execution bottlenecks which are related
                                                                    some useful insight into the behavior of workflow tasks, it is
to the performance of the infrastructure. If we break down
                                                                    not connecting the higher level workflow deception tasks with
the infrastructure into multiple abstraction levels, we can see
                                                                    the underlying usage of resources.
that workflows are created and executed on the highest level
while resource usage is measured on the lowest levels[9]. As           The work presented in [15] proposes an online mechanism
a result, the measured resource metrics are often unreachable       for detecting anomalies while executing scientific workflows
                                                                    on networked clouds. The authors use an integrated framework
  6 https://www.w3.org/TR/prov-overview/                            to collect online monitoring time-series data from workflow
  7 https://www.w3.org/RDF/                                         tasks and the infrastructure. This approach is tightly coupled
with the Pegasus WFMS which depends on specific worker
nodes to execute workflows tasks.
   Existing solutions are outdated and depend on specific
task schedulers or WFMS. Although traditionally scientific
workflows preserve provenance information, they are never
used together with information collected from the infrastruc-
ture nodes to analyze contextual information for possible
anomalies.

                    IV. A RCHITECTURE
   Our architecture, named Cross-context Workflow Execution      Fig. 1. The CWEA is made by five main components: 1) the WCDR for
                                                                 querying provenance data and parsing workflows, 2) the RCDR for querying
Analyzer (CWEA) enables workflow developers or domain            performance data, 3) the WFEA for implementing diagnosis algorithms and
scientists to effectively browse workflow execution informa-     4) the Interactive GUI for presenting the results.
tion together with the system metrics, and analyze contextual
information for possible anomalies. To achieve that it relies
on the following components:                                     WCDR parses the workflow to simply extract web service
  • Workflow Context Data Retriever (WCDR): this compo-          endpoints and the type of call made.
    nent queries the provenance data generated by the WFMS          To be able to query more data sources for performance
    to extract the name, start-time, and end-time of each web    data, the RCDR may include additional implementations8 . It
    service call described in the workflow. The WCDR also        is therefore necessary that each VM is hosting a performance
    parses the workflow itself to extract the endpoints of the   metrics collector and a metrics database.
    web services used and the type of call made (e.g., GET,         The GUI component is the only component that is accessible
    POST, etc.).                                                 from the outside. Besides acting as a graphical interface for
  • Resource Context Data Retriever (RCDR) this component        the user, the back-end of the GUI component is a REST API
    uses the service endpoints obtained by the WCDR to           for calls made by other applications. This design assures both
    query the corresponding hosts and retrieve available per-    manual and programmatic interaction.
    formance data within the web service’s call time-ranges.        For our prototype, the provenance data and workflow are
    The performance data typically include CPU utilization,      manually uploaded by the user to the GUI. However, with
    memory and network usage.                                    our design we aim to be able to connect to provenance data
  • Workflow Execution Analyzer (WFEA): this component           workflow repositories.
    abstracts a set of diagnostic algorithms that are used to       To be able to analyze and visualize potential bottlenecks
    analyze various performance metrics to find correlations     in the execution of a workflow a user should perform the
    between workflow execution and resource usage. Cur-          following steps:
    rently, we have implemented an algorithm that identifies      1) Once the WFMS, in our case Taverna, has executed a
    the most time-consuming web services within the context          workflow the user exports the workflow’s provenance as
    of a workflow execution. The findings of the WFEA are            a file.
    presented in the interactive GUI for visualization or can     2) Once the execution is over the user uploads the prove-
    be consumed in the form of JSON by other software                nance and workflow files to the GUI
    components such as infrastructure scale controllers.          3) The GUI sends the files to the WCDR where it parses the
  • Interactive GUI: this is a web-based interface that allows       file and returns for each service in the workflow: 1) its
    the user to combine and visualize the workflow execution         name, 2) its endpoint, 3) its invocation start-time and
    steps with the performance metrics of the underlying             4) its invocation end-time. This information is returned in
    resources.                                                       the form of a list and visualized by the GUI. This list can
   Figure 1 shows the overall architecture and its individual        be filtered to select the hosts the user wishes to analyze
components. Each of the components is implemented as a               further.
RESTful microservice with its own functionality. For our          4) The user specifies the hosts to be analyzed and sends
prototype the WCDR is able to parse t2flow workflows, a              a request to the RCDR via the GUI to gather the rele-
specification used by the Taverna WFMS. Also, our microser-          vant performance data. The RCDR attempts to query to
vice architecture design allows us to also support other work-       databases on the endpoints to retrieve the performance
flow specifications like SCUFL2 by implementing additional           data bound by the timestamps of each web service
parses.                                                              invocation.
   As mentioned above the WCDR is parsing the provenance          5) Once the performance data are available to WFEA, it
data to obtain the execution trace of a workflow. Therefore,         performs its analysis and returns the results to the GUI.
complex workflow statures such as loops or conditions are
already recorded by the provenance data. As a next step the        8 At the moment we support the Prometheus database
  The sequence diagram in Figure 2 shows the process                             • The CWEA and its components to parse the workflow,
described above.                                                                   gather the relevant metrics from different hosts and to
                                                                                   perform the visualization.
                                                                                Since we set up our experiments using VMs, we opted for
                                                                              Docker to run all of these components as separate containers.
                                                                              Deployment of Docker containers is often used in DevOps
                                                                              with the combination of virtualized resources as they offer
                                                                              exclusive access to the resources (e.g., VMs. Moreover, a wide
                                                                              range of tools for monitoring is available as Docker containers.
                                                                              Therefore, it requires minimum effort to set up a realistic
                                                                              performance monitoring framework on each VM. Singularity
                                                                              [18] is another option for containerization of applications
                                                                              which focuses more on high-performance computing (HPC)
                                                                              by providing access to devices like GPUs or MPI hardware.
                                                                              Nevertheless, the wide adoption of Docker together with a
                                                                              wide range of tools and the fact that the scope of this work is
                                                                              beyond HPC made us choose Docker.


Fig. 2. A sequence diagram showing the interactions between the different
components. We assume the use case in which an actor is using the GUI to
gather and analyze performance data, giving workflow descriptions as input.


                         V. E XPERIMENTS
   For our experiments, we hosted several services on three
distributed VMs. We used Taverna to create and execute a
workflow comprising these services each implementing a set
of methods the exhaust the system resources. By doing this,
we simulated heavy CPU/MEM/NET use that can be traced
back to the performance metrics. To conduct our experiment
we used the following components:
   • The Taverna WFMS where we composed and executed
      a workflow. We also used Taverna to extract the workflow
      provenance data.                                                        Fig. 3. This diagram depicts our experimental setup. We use three VMs
   • Three VMs (labeled A, B and C) each containing:                          (named A, B and C). On each VM we hosted the a simple web service,
                                                                              cAdvisor and Prometheus. After the workflow execution the user provides to
      – A web service that offers the following methods: 1) a                 the CWEA (Figure 1 shows its architecture) the provenance and workflow
        lightweight call which requires very little resources                 information to query Prometheus on each VM.
        form the VM, 2) a CPU intensive and 3) a memory
        intensive 9 . Using these methods, we can simulate                       Figure 3 shows the configuration of each VM. We used
        heavy CPU and memory usage that can be traced back                    Taverna to create and execute a test workflow shown in
        to the performance metrics.                                           Figure 4. We examined published Taverna workflows10 to
      – A performance metrics collector. For our experi-                      create a workflow that has a realistic structure. As a result,
        mental setup, we used cAdvisor [16], a daemon that                    we constructed our test workflow comprising a total of six
        collects, aggregates, processes, and exports a range of               tasks ending in one output port. The workflow contains both
        system metrics about running containers. By default,                  sequential and parallel executions. The tasks that are executed
        these metrics include CPU, Memory, and Network.                       are of the following types: 1) a lightweight, 2) a CPU intensive
      – A metrics database. In our setup, we used                             and 3) a memory intensive.
        Prometheus[17], a time-series database which is used
                                                                                                          VI. R ESULTS
        to gather and store performance metrics from cAdvisor.
        We use this database to query the metrics concerning                    We have executed the workflow described in the previous
        the workflow execution.                                               section and analyzed with CWEA. Figure 5 show the results
  9 Both methods use the command stress-ng for 15 sec.                          10 Taken from www.myexperiment.org
                                                                       the recorded metric. Similarly in Figure 6(b) we present the
                                                                       memory used by each task, also color-highlighted. In this
                                                                       graph, the x-axis shows the memory used in MB and the y-
                                                                       axis the time. Figures 6(c) and 6(d) show the network usage
                                                                       where the x-axis represent incoming or outgoing data in KB/s
                                                                       and the y-axis time.
                                                                          Considering the performance of task CPU intensive 1 we
                                                                       see in Figure 6(a) that it started at approximately 21:45:5 and
                                                                       ended at 21:45:28 and used more than 80% of CPU. However,
                                                                       in Figure 6(b) we see that the same task did not require any
                                                                       memory from its hosting VM, but it did use some network
                                                                       resources as it can be seen from Figure 6(d).
                                                                          From the results presented here, we can see that the most
                                                                       time-consuming task of the workflow presented in Section V
                                                                       is the mem intensive. Looking at the results in Figure 6, we
                                                                       see that this task is using CPU, memory and network resources
                                                                       which may attribute for the increased execution time.
                                                                                VII. C ONCLUSIONS AND F UTURE W ORK
                                                                          In this paper, we have presented and evaluated our proposed
                                                                       architecture that allows scientists or service developers to
                                                                       analyze the execution of service-based scientific workflows,
Fig. 4. A simple workflow made of six tasks spread over three VMs.
Tasks lightweigt 1, CPU intensive 2 and mem intensive 2 where hosted   visualize possible bottlenecks related with the infrastructure
on VM A. Tasks CPU intensive 1 and mem intensive 1 on VM B. Task       by getting a detailed view of the resource usage of each work-
CPU intensive 3 was hosted on VM C.                                    flow task. By taking advantage of our solution infrastructure
                                                                       administrators, service developers or scaling controllers my
as visualized by the GUI. In Figure 5(a) we see the workflow           configure the provisioned visualized infrastructure.
execution analysis. This table provides to the user a view of             As described in Sections IV our proposed architecture is
the workflow execution with a table that shows the name of             relying on the WFMS to collect provenance data. This means
the task, its endpoint, the HTTP method and the start and end          that the workflow execution needs to complete before CWEA
times of each invocation. Next, in 5(b) we see the execution           can use the provenance data since. However, being able to
timeline of the workflow and the time required to execute each         collect provenance data as each task is completed will greatly
task. In this timeline, each task is presented by a different color    benefit our architecture as results would be able to be presented
bar which is also highlighted in the resource usage graphs             as on the fly. Such an approach requires further investigation
below. The position and length of each bar correspond to               on how to use the appropriate APIs from a multiple WFMS
the start time and duration of each task. Table I shows the            or abstract this process by relying on a provenance repository
execution times in more detail.                                        that will be able to collect provenance data as workflow tasks
                                                                       are completed.
    Task Name            Exec. Duration,    Perc. of Total Exec.,
                         sec                %
                                                                          Another issue that will require our attention in the future is
    mem intensive 1      26.04              22.34                      the persistence of the performance data on each VM. It can be
    mem intensive 2      24.57              21.09                      the case that as soon as the workflow execution is over or the
    CPU intensive 1      22.83              19.59                      VMs are no longer used they may be deleted causing the loss
    CPU intensive 3      21.80              18.71
    CPU intensive 2      21.11              18.11                      of all performance data. In future cases with the combination
    lightweight 1        0.19               0.16                       of on the fly data gathering as desired above performance data
                              TABLE I                                  shall be copied to a separate performance database. Having
  E XECUTION TIMES AND PARENTAGE OF TOTAL EXECUTION FOR EACH
                     TASK IN THE WORKFLOW.
                                                                       performance data from many workflow executions will also
                                                                       enable us to make use of statistical and AI algorithms to detect
                                                                       and predict possible workflow execution failures due to errors
   Figure 6 shows the metrics collected from each VM. All              in the resource infrastructure.
sub-figures highlight each service execution (when each ser-
vice call started and ended) with a separate color which corre-                             ACKNOWLEDGMENT
sponds to the colors shown in Figure 5(b). More specifically,            This work was supported by the European Union’s
Figure 6(a) presents the CPU used by each task highlighted             Horizon 2020 research and innovation programme under
with the color that corresponds to each task based on the              grant agreements No. 824068 (ENVRI-FAIR), 654182 (EN-
timeline shown in Figure 5(b). In this Figure, the x-axis is           VRIPLUS project), 825134(ARTICONF), 676247 (VRE4EIC
the percentage of the CPU used and the y-axis the time of              project),643963 (SWITCH project).
                                                                            (a)


                                                                            (b)

Fig. 5. Workflow execution analysis and execution timeline as presented to the user by the GUI. The execution timeline provides to the user an overview of
the time taken to execute each task.


       (a) CPU resources: This graph shows the CPU resources utilized by (b) Memory resources: This graph shows the memory resources utilized
       each service during the workflow execution.                       by each service during the workflow execution.


       (c) Incoming network traffic: This graph shows the incoming data for (d) Outgoing network traffic: This graph shows the outgoing data for
       each task, execution.                                                each task, execution.

Fig. 6. Combined results after parsing the workflow, querying the provenance file and querying the relevant resource usage. All sub-figures highlight each
service execution with a separate color which corresponds to the colors shown. in Figure 5(b)


                             R EFERENCES                                          [5] K. Evans, A. Jones, A. Preece, F. Quevedo, D. Rogers, I. Spasić,
                                                                                      I. Taylor, V. Stankovski, S. Taherizadeh, J. Trnkoczy, G. Suciu, V. Suciu,
 [1] L. Candela, D. Castelli, and P. Pagano, “Virtual research environments:          P. Martin, J. Wang, and Z. Zhao, “Dynamically reconfigurable workflows
     an overview and a research agenda,” Data Science Journal, vol. 12,               for time-critical applications,” Proceedings of the 10th Workshop on
     no. 0, pp. GRDI75–GRDI81, 2013.                                                  Workflows in Support of Large-Scale Science, pp. 7:1–7:10, 2015.
 [2] Z. Zhao, A. Belloum, C. De Laat, P. Adriaans, and B. Hertzberger,
     “Distributed execution of aggregated multi domain workflows using an         [6] P. Groth and L. Moreau, “Prov-overview. an overview of the prov family
     agent framework,” Services, 2007 IEEE Congress on, pp. 183–190, 2007.            of documents,” 2013.
 [3] M. A. Miller, W. Pfeiffer, and T. Schwartz, “The cipres science gateway:     [7] R. Cushing, S. Koulouzis, A. Belloum, and M. Bubak, “Applying
     enabling high-impact science for phylogenetics researchers with limited          workflow as a service paradigm to application farming,” Concurrency
     resources,” Proceedings of the 1st Conference of the Extreme Science             and Computation: Practice and Experience, vol. 26, no. 6, pp. 1297–
     and Engineering Discovery Environment: Bridging from the eXtreme to              1312, 2014.
     the campus and beyond, p. 39, 2012.                                          [8] R. F. da Silva, R. Filgueira, I. Pietri, M. Jiang, R. Sakellariou, and
 [4] S. Koulouzis, A. S. Belloum, M. T. Bubak, Z. Zhao, M. ivkovi, and C. T.          E. Deelman, “A characterization of workflow management systems
     de Laat, “Sdn-aware federation of distributed data,” Future Generation           for extreme-scale applications,” Future Generation Computer Systems,
     Computer Systems, vol. 56, pp. 64 – 76, 2016.                                    vol. 75, pp. 228–238, 2017.
 [9] Y. Demchenko, Z. Zhao, P. Grosso, A. Wibisono, and C. De Laat,
     “Addressing big data challenges for scientific data infrastructure,” in 4th
     IEEE International Conference on Cloud Computing Technology and
     Science Proceedings, pp. 614–617, IEEE, 2012.
[10] S. Koulouzis, D. Vasyunin, R. Cushing, A. Belloum, and M. Bubak,
     “Cloud data federation for scientific applications,” in Euro-Par 2013:
     Parallel Processing Workshops (D. an Mey, M. Alexander, P. Bientinesi,
     M. Cannataro, C. Clauss, A. Costan, G. Kecskemeti, C. Morin, L. Ricci,
     J. Sahuquillo, M. Schulz, V. Scarano, S. L. Scott, and J. Weidendorfer,
     eds.), (Berlin, Heidelberg), pp. 13–22, Springer Berlin Heidelberg, 2014.
[11] R. Prodan and T. Fahringer, “Overhead analysis of scientific workflows
     in grid environments,” IEEE Transactions on Parallel and Distributed
     Systems, vol. 19, pp. 378–393, March 2008.
[12] I. Foster, “Globus toolkit version 4: Software for service-oriented
     systems,” Journal of computer science and technology, vol. 21, no. 4,
     p. 513, 2006.
[13] R. Ferreira da Silva, T. Glatard, and F. Desprez, “Self-healing of oper-
     ational workflow incidents on distributed computing infrastructures,” in
     2012 12th IEEE/ACM International Symposium on Cluster, Cloud and
     Grid Computing (ccgrid 2012), pp. 318–325, May 2012.
[14] S. Madougou, S. Shahand, M. Santcroos, B. van Schaik, A. Benab-
     delkader, A. van Kampen, and S. Olabarriaga, “Characterizing workflow-
     based activity on a production e-infrastructure using provenance data,”
     Future Generation Computer Systems, vol. 29, no. 8, pp. 1931 – 1942,
     2013. Including Special sections: Advanced Cloud Monitoring Systems
     & The fourth IEEE International Conference on e-Science 2011 e-
     Science Applications and Tools & Cluster, Grid, and Cloud Computing.
[15] P. Gaikwad, A. Mandal, P. Ruth, G. Juve, D. Krl, and E. Deelman,
     “Anomaly detection for scientific workflow applications on networked
     clouds,” in 2016 International Conference on High Performance Com-
     puting Simulation (HPCS), pp. 645–652, July 2016.
[16] “cadvisor (container advisor), official github page.” https://github.com/
     google/cadvisor. Accessed: 2019-03-28.
[17] “Prometheus, an open-source systems monitoring and alerting toolkit..”
     https://prometheus.io/docs/introduction/overview/. Accessed: 2019-03-
     28.
[18] G. M. Kurtzer, “Singularity 2.1. 2-linux application and environment
     containers for science, 2016,” Available from Internet:¡ https://doi.
     org/10.5281/zenodo, vol. 60736.