=Paper= {{Paper |id=Vol-2526/short1 |storemode=property |title=Linking Abstract Plans of Scientific Experiments to their Corresponding Execution Traces (short paper) |pdfUrl=https://ceur-ws.org/Vol-2526/short1.pdf |volume=Vol-2526 |authors=Milan Markovic,Daniel Garijo,Peter Edwards |dblpUrl=https://dblp.org/rec/conf/kcap/MarkovicGE19 }} ==Linking Abstract Plans of Scientific Experiments to their Corresponding Execution Traces (short paper)== https://ceur-ws.org/Vol-2526/short1.pdf
           Linking Abstract Plans of Scientific Experiments to their
                      Corresponding Execution Traces
                 Milan Markovic                                            Daniel Garijo                                Peter Edwards
            University of Aberdeen                              Information Sciences Institute,                     University of Aberdeen
                 Aberdeen, UK                                  University of Southern California                        Aberdeen, UK
          milan.markovic@abdn.ac.uk                                   Los Angeles, USA                              p.edwards@abdn.ac.uk
                                                                        dgarijo@isi.edu

ABSTRACT                                                                              engine. Similarly, scientific workflows may contain high-level ab-
Provenance describes the creation, manipulation and delivery pro-                     stract steps that lead to different implementations depending on the
cesses of scientific results; and has become a crucial requirement                    algorithms selected for execution [4]. Linking these abstract plans
for debugging, understanding, inspecting and reproducing the out-                     with their execution traces requires additional mechanisms which
comes of scientific publications. Scientific experiments, in partic-                  have not been defined in P-Plan or other recent efforts for prove-
ular computational workflows, often include provenance collec-                        nance representation in scientific workflows such as the Research
tion mechanisms that link execution traces to their respective                        Object Model [1] and ProvOne1 specifications.
planned specifications. Such provenance traces are typically very                        In this paper, we use the Extended P-Plan ontology (EP-Plan)2
fine-grained, and may quickly become too complex or difficult for                     to link together different abstractions of scientific workflow plans
humans to interpret. In this paper we describe our approach to                        and their execution traces described using PROV-O.
represent workflow plans and provenance at different levels of ab-                       We first detail the challenges for linking provenance to abstract
straction. We describe EP-Plan, a W3C PROV ontology extension                         plans in Section 2 by using examples from the WINGS worfklow
and we illustrate our approach with a use case using the WINGS                        system [5]. We then describe how we have addressed these chal-
workflow system.                                                                      lenges with the EP-Plan ontology in Section 3, and we conclude
                                                                                      with a discussion of our future work.
KEYWORDS
Plan, scientific workflows, provenance, abstractions
                                                                                      2    ABSTRACT PLANS AND PROVENANCE IN
                                                                                           SCIENTIFIC WORKFLOWS
1    INTRODUCTION                                                                     During their lifecycle, scientific workflows may be defined at dif-
                                                                                      ferent levels of abstraction, from an abstract original specification
Scientific workflows describe the computational steps and data
                                                                                      by a user to a fully detailed execution plan prepared for a workflow
dependencies that are necessary to carry out a scientific experi-
                                                                                      engine [4]. Here we focus on supporting three common use cases
ment [13]. Scientific workflows can be found in a wide range of
                                                                                      in scientific workflows:
domains, ranging from Geosciences to Bioinformatics, as they have
demonstrated their utility for reproducing previous experiments,                           • Collections of activities and entities: Plans may contain ab-
improving standardization practices in a research lab and educat-                            stractions that summarize execution activities to be per-
ing students on existing methods [2]. Scientific workflow systems                            formed in parallel. Figure 1 shows an example using a work-
usually have the ability to capture the provenance traces of exe-                            flow for water quality analysis in the WINGS workflow sys-
cuted experiments, to support inspection of results and debugging                            tem [6]. As shown on the left of the figure, some steps rep-
of workflow errors [12]. The W3C recommendation PROV-O [9] is                                resent collections of executions, depicted as stacked boxes.
a standard model for representing provenance of any entity in the                            For example, the step MetabolismCalcEmpirical receives a
Web, by exposing the series of activities that used or generated such                        collection of HourlyData files which will be executed in par-
entities. PROV-O is often used as a reference model by workflow                              allel. The right side of the figure shows a fragment of the
systems when exposing provenance traces to users.                                            corresponding execution plan of the workflow on the left,
   While PROV-O provides mechanisms to represent provenance                                  after a user has specified the input files and hence the system
in detail, it does not describe how to represent the plan that was                           has prepared the full execution plan. Since provenance is
used to produce a provenance trace. Therefore, in previous work we                           tracked at the granularity of the execution plan (shown at
described the P-Plan ontology [3], a simple representation of the set                        the right of the figure), it is necessary to define properties
of planned steps that guided an execution. However, plan specifica-                          to group entities and associate them to the corresponding
tions may become complex (with hundreds or thousands of steps)                               abstract plan.
and thus workflow designers tend to simplify them by separating                            • Workflow fragments: Workflow systems often include the
them into smaller plans (sub-workflows). In addition, workflow                               ability to define sub-workflows to simplify complex work-
specifications may be parallelised into hundreds or thousands of                             flow plans. A sub-workflow would then appear as a single
jobs when submitted to a distributed environment by a workflow                               step in the bigger workflow. While this mechanism is helpful

                                                                                      1 http://purl.org/provone
Copyright ©2019 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).                                    2 https://w3id.org/ep-plan
SciKnow, Marina Del Rey, CA, USA, November, 2019                                                       Milan Markovic, Daniel Garijo, and Peter Edwards




Figure 1: An abstract workflow for water quality analysis (left) and a fragment of its execution plan (right). Some of the steps
represent collections of executions which will be executed in parallel. Arrows represent the dataflow.


                                                                              workflow execution summary on the left represents the exe-
                                                                              cution of the workflow as a single activity. The provenance
                                                                              trace on the right represents the full workflow execution
                                                                              trace. Both the execution summary and the full provenance
                                                                              trace represent valid views of a workflow execution, and
                                                                              should be linked together.
                                                                         Linking these different levels of granularity together is crucial
                                                                      for workflow systems to inform a user in case of execution errors.
                                                                      To further support users’ ability to understand errors and how
                                                                      plans and their fragments may be reused, plans should also contain
                                                                      additional metadata that provide information about the context in
                                                                      which the individual planned steps were deployed and executed.
                                                                      This may include information about any associated constraints (e.g.
                                                                      for input validation), the agents expected to perform individual
                                                                      steps, objectives the plan is trying to achieve, etc.
                                                                         While specifications such as the Research Object Model, D-PROV
                                                                      [11] ProvOne or CWL-PROV [7] define mechanisms to describe
                                                                      sub-workflows and entity collections (e.g., by defining part of re-
Figure 2: An example of a simple provenance summary. The              lationships) they do not define clear mechanisms to link together
execution of a workflow on the right is summarized as a sin-          provenance traces at different levels of granularity. Below we de-
gle activity on the left. Arrows represent usage/generation           scribe how we address these issues with the EP-Plan ontology.
dependencies.
                                                                      3    USING EP-PLAN TO REPRESENT PLANS AT
                                                                           DIFFERENT LEVELS OF ABSTRACTION
       for easing the understandability of the scientific workflow,
       it requires the means to link the provenance for the sub-      EP-Plan builds on P-Plan3 [3], a vocabulary designed for aligning
       workflow execution back to the provenance of the workflow      simple plans to their corresponding provenance traces. EP-Plan
       where it was included.                                         was designed for cross domain applications (e.g. the use of EP-
     • Execution summaries: When workflow execution plans be-         Plan for enhancing Internet of Things deployments is detailed in
       come complicated, the corresponding provenance traces may      [10]) and uses ep-plan:Step to denote any planned process, and
       be too convoluted to explore by users who only want to know    ep-plan:Variable to represent inputs and outputs of steps.
       more about the inputs used to generate the result of a work-
       flow. Figure 2 shows a simple example of this behaviour: the   3 p-plan namespace: http://purl.org/net/p-plan
Linking Abstract Plans of Scientific Experiments to their Corresponding Execution Traces                                        SciKnow, Marina Del Rey, CA, USA, November, 2019


                                   ep-plan:hasInputVariable                                                           ep-plan:isDecomposedAsPlan
                                  ep-plan:hasOutputVariable
                                                                                                                           ep-plan:isSubPlanOfPlan
          ep-plan:Variable        ep-plan:isElementOfPlan          ep-plan:Step
                                                                                                     :SummarizedWf                                     :ExecutedWf
        ep-plan:                                                                                      (ep-plan:Plan)                                   (ep-plan:Plan)
                     ep-plan:isElementOfPlan           ep-plan:
        hasPart                                    isSubPlanOfPlan                                                  ep-plan:
              ep-plan:                                                                                              hasPart
            MultiVariable               ep-plan:Plan         ep-plan:MultiStep
                                                                                                      :InputFilesVar                       :File1Var                    :File2Var
           prov:wasDerivedFrom                            ep-plan:                                (ep-plan:MultiVariable)              (ep-plan:Variable)           (ep-plan:Variable)
                                                    isDecomposedAsPlan
                ep-plan:                                                ep-plan:                                                                 ep-plan:hasInputVariable
                                     ep-plan:hasTraceElement                                     ep-plan:hasInputVariable
          ExecutiontraceBundle                                     correspondsToStep
                                                                                                 :ExecuteWorkflowStep                                  :AggregateStep
ep-plan:correspondsToVariable    ep-plan:hasTraceElement                                           (ep-plan:MultiStep)
                                                                ep-plan:Activity                                                                        (ep-plan:Step)

          ep-plan:Entity                prov:used                                               ep-plan:hasOutputVariable                       ep-plan:hasOutputVariable

                                        prov:wasGeneratedBy
                    prov:hadMember                                                                   :OutputFilesVar                                 :AggregatedDataVar
                                                                                                  (ep-plan:MultiVariable)                              (ep-plan:Variable)
     ep-plan:EntityCollection                                 ep-plan:MultiActivity
                                                                                                                                                 ep-plan:hasInputVariable
                                       A class for             A class for describing                 ep-plan:hasPart
         owl:ObjectProperty
                                     describing plan              execution trace                                                                           :SortStep
         rdfs:SubClassOf               elements                      elements                                                                             (ep-plan:Step)

                                                                                                                                                ep-plan:hasOutputVariable
Figure 3: An overview of a subset of EP-Plan concepts for de-
scribing and linking plan specifications with their execution                                                                             :OutputVar                     :ErrorLogVar
                                                                                                                                       (ep-plan:Variable)              (ep-plan:Variable)
traces.

                                                                                                         ep-plan:                                                       owl:Named Individual
   Figure 3 illustrates a subset of EP-Plan concepts that define                                    isElementOfPlan                owl:ObjectProperty
                                                                                                                                                                               (plan)
mechanisms for linking plan specification and execution traces
at different levels of abstraction. Both steps and variables belong                        Figure 4: An example illustrating decomposition of ep-
to ep-plan:Plan (modelled as a subclass of prov:Plan defined in                            plan:Multistep into a sub-plan and linking of variables
PROV-O4 ) and are linked to their corresponding executions de-                             across different levels of plan abstractions.
scribed as ep-plan:Activity and ep-plan:Entity (modelled as sub-
                                                                                                    :SummarizedWf                ep-plan:         :ExecutedWf
classes of prov:Activity and prov:Entity). A workflow execution                                      (ep-plan:Plan)          isSubPlanOfPlan      (ep-plan:Plan)
typically produces an execution trace that consists of a number
                                                                                                 prov:wasDerivedFrom                        prov:wasDerivedFrom
of activities and entities representing instantiations of different
parts of a plan. In EP-Plan, a single execution trace is grouped by                             :AbstractExecutionTrace
                                                                                             (ep-plan:ExecutionTraceBundle)
                                                                                                                                             :ExecutionTrace
                                                                                                                                      (ep-plan:ExecutionTraceBundle)
ep-plan:ExecutionTraceBundle (a subclass of prov:Bundle). A single
                                                                                                                    prov:
plan specification may then be linked to multiple execution traces                                               hadMember
using prov:wasDerivedFrom. To allow linking of different levels                                       :InputFiles                           :File1                      :File2
of workflow abstractions, EP-Plan provides mechanisms to group                                 (ep-plan:EntityCollection)              (ep-plan:Entity)            (ep-plan:Entity)

related workflow steps defined at a finer level of detail together as                                   prov:used                                         prov:used
a sub-plan that then further describes a step of a more abstract plan                            :WorkflowEcexution                                    :Aggregate
denoted as ep-plan:MultiStep. The left side of Figure 4 illustrates a                            (ep-plan:MultiActivity)                             (ep-plan:Activity)

high level abstraction of a workflow plan (:SummarizedWf ) contain-                             prov:wasGeneratedBy                              prov:wasGeneratedBy
ing a single ep-plan:MultiStep (:ExecuteWorkflowStep) that is then
described in more detail on the right side of the figure as a sub-plan                               :OutputFiles                                    :AggregatedData
                                                                                               (ep-plan:EntityCollection)                             (ep-plan:Entity)
(:ExecutedWf ). In the same figure, the abstract workflow (:Summa-
rizedWf ) also includes abstractions of two variables (:InputFilesVar                               prov:hadMember
                                                                                                                                                          prov:used

and :OutputFilesVar) described using the class ep-plan:MultiVariable.                                                                                      :Sort
                                                                                                                                                     (ep-plan:Activity)
In the sub-plan specification, each of the multivariables is decom-
posed into two individual variables (e.g., InputFilesVar decomposes                                                                              prov:wasGeneratedBy

into File1Var and File2Var) and linked using ep-plan:hasPart.                                                                             :Output                       :ErrorLog
   Figure 5 illustrates an example execution trace with two exe-                                                                       (ep-plan:Entity)               (ep-plan:Entity)
cution trace bundles corresponding to the plan and its sub-plan
shown in Figure 4. Execution trace elements corresponding5 to                                          ep-plan:
                                                                                                  isElementOfTrace                 owl:Named Individual                 owl:Named Individual
multi variables defined in the :SummarizedWf plan (see Figure 4)                                  owl:ObjectProperty                      (plan)                          (execution trace)
correspond to trace elements of the type ep-plan:EntityCollection
4 prov namespace: http://www.w3.org/ns/prov#
                                                                                           Figure 5: An example description of execution traces corre-
5 Links ep-plan:correspondsToVariable that link ep-plan:EntityCollection from the execu-   sponding to workflows defined at different levels of abstrac-
tion trace record to ep-plan:MultiVariable in the plan specification are not shown in      tion.
the figure.
SciKnow, Marina Del Rey, CA, USA, November, 2019                                                                              Milan Markovic, Daniel Garijo, and Peter Edwards


which is a subclass of prov:Collection (see :InputFiles and :Output-                       aim to focus on using EP-Plan to enhance the provenance traces
Files in Figure 5). The usage and generation of these entity col-                          generated by WINGS (which currently uses the OPMW ontology
lections is ascribed to a trace element :WorkflowExecution (mod-                           8 ) with additional plan descriptions. OPMW extends both P-Plan
elled as ep-plan:MultiActivity) using relationships prov:used and                          and Prov-O and therefore it should be possible to align existing
prov:wasGeneratedBy. The trace element :WorkflowExecution cor-                             provenance descriptions generated by WINGS with the EP-Plan
responds6 to the plan element :executeWorkflowStep shown in Fig-                           vocabulary. The WINGS system also uses semantic implementa-
ure 4. The right side of Figure 5 shows a more detailed execution                          tions of constraints to plan and execute scientific workflows [8]. We
trace corresponding7 to the :ExecutedWf plan specification shown                           will explore how these can be mapped to the constraint concepts
on the right side of Figure 4. Instantiations of plan variables are                        defined in EP-Plan and hence included as apart of the experiment
captured as instances of ep-plan:Entity (e.g. see :File1) and instan-                      metadata with the plan specification.
tiations of steps are captured as instances of ep-plan:Activity (e.g.
see :Aggregate). Relationships prov:hadMember are used to link                             ACKNOWLEDGMENTS
trace elements corresponding to abstract multivariables (modelled                          The work described in this paper was funded by the award made
as ep-plan:EntityCollection in :AbstractExecutionTrace) and their                          by the RCUK Digital Economy programme to the University of
more detailed description in :ExecutionTrace produced by the sub-                          Aberdeen (EP/N028074/1), a SICSA PECE travel award, the Defense
workflow specification.                                                                    Advanced Research Projects Agency with award W911NF-18-1-
   To summarise, using the mechanisms outlined above, EP-Plan                              0027, the SIMPLEX program with award W911NF-15-1-0555 and
enables modelling of abstracted workflow specifications by collaps-                        from the National Institutes of Health under awards 1U01CA196387
ing multiple steps and variables into aggregated plan elements (i.e.                       and 1R01GM117097.
multisteps and multivariables). Sub-plans containing more detailed
descriptions of plan abstractions may be linked and reused by dif-                         REFERENCES
ferent plans (i.e. as workflow fragments), as these are modelled as                         [1] Khalid Belhajjame, Jun Zhao, Daniel Garijo, Matthew Gamble, Kristina Hettne,
individual plan specifications (including any relevant metadata).                               Raul Palma, Eleni Mina, Oscar Corcho, José Manuel Gómez-Pérez, Sean Bechhofer,
                                                                                                et al. 2015. Using a suite of ontologies for preserving workflow-centric research
Furthermore, by leveraging the concept of collections, we are also                              objects. Web Semantics: Science, Services and Agents on the World Wide Web 32
able to maintain links between different abstractions of execution                              (2015), 16–42. https://doi.org/10.1016/j.websem.2015.01.003
traces without violating PROV-O semantics.                                                  [2] Daniel Garijo, Oscar Corcho, Yolanda Gil, Meredith N Braskie, Derrek Hibar, Xue
                                                                                                Hua, Neda Jahanshad, Paul Thompson, and Arthur W Toga. 2014. Workflow
   Finally, in contrast with P-Plan, EP-Plan provides a richer vo-                              reuse in practice: a study of neuroimaging pipeline users. In e-Science (e-Science),
cabulary for capturing plan metadata which (for reasons of space)                               2014 IEEE 10th International Conference on, Vol. 1. IEEE, 239–246. https://doi.org/
                                                                                                10.1109/eScience.2014.33
is not discussed in detail in this paper. Briefly, this includes the                        [3] D. Garijo and Y. Gil. 2012. Augmenting PROV with Plans in P-PLAN: Scientific
ability to associate descriptions of agents that are allowed to exe-                            Processes as Linked Data. In Proceedings of the 2nd International Workshop on
cute different steps of a plan, to link descriptions of policies, and to                        Linked Science, Vol. 951. CEUR Workshop Proceedings.
                                                                                            [4] Y. Gil, D. Garijo, M. Knoblock, A. Deng, R. Adusumilli, V. Ratnakar, and P. Mallick.
describe specifications of how data should be exchanged between                                 2017. Improving Publication and Reproducibility of Computational Experiments
steps. Plan elements can be also associated with descriptions of                                through Workflow Abstractions. In Proceedings of the Workshop on Capturing
constraints that provide a high level reference to any restrictions                             Scientific Knowledge (SciKnow). Austin, Texas.
                                                                                            [5] Y. Gil, V. Ratnakar, E. Deelman, G. Mehta, and J. Kim. 2007. Wings for pegasus:
that can be linked to and evaluated against elements of an execution                            Creating large-scale scientific applications using semantic representations of
trace. EP-Plan also enables descriptions of objectives to be associ-                            computational workflows. In Proceedings of the National Conference on Artificial
                                                                                                Intelligence, Vol. 22. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT
ated with the plan. Objectives may then be linked to the individual                             Press; 1999, 1767.
plan elements that achieve them. Each element may also be linked                            [6] Y. Gil, V. Ratnakar, J. Kim, P. Antonio Gonzalez-Calero, P. Groth, J Moody, and E.
to a rationale (e.g. user-readable description) which details why                               Deelman. 2011. Wings: Intelligent Workflow-Based Design of Computational
                                                                                                Experiments. IEEE Intelligent Systems 26, 1 (2011).
the element was included in the plan specification. These concepts                          [7] F. Khan, S. Soiland-Reyes, R. Sinnott, A. Lonie, C. Goble, and M. Crusoe. 2018.
are important for describing the execution context of a scientific                              Sharing interoperable workflow provenance: A review of best practices and their
experiment. This may include, for example, specifications of indi-                              practical application in CWLProv. (Dec. 2018). https://doi.org/10.5281/zenodo.
                                                                                                1966881 Submitted to GigaScience (GIGA-D-18-00483).
vidual scientists that are allowed to control certain steps of a plan,                      [8] J. Kim, E. Deelman, Y. Gil, G. Mehta, and V. Ratnakar. 2008. Provenance trails in
links to a data protection policy applicable to an experiment using                             the wings/pegasus system. Concurrency and Computation: Practice and Experience
                                                                                                20, 5 (2008), 587–597.
sensitive or personal data, constraint descriptions which provide                           [9] T. Lebo, S. Sahoo, and D. McGuinness. April 2013. PROV-O: The PROV ontology.
further information about the portions of a workflow that failed to                             Technical Report. https://www.w3.org/TR/2013/REC-prov-o-20130430/
execute due to constraint violation, etc.                                                  [10] M. Markovic, D. Garijo, P. Edwards, and W. Vasconcelos. 2019. Semantic Mod-
                                                                                                elling of Plans and Execution Traces for Enhancing Transparency of IoT Systems.
                                                                                                In Proceedings of the 6th IEEE International Conference on Internet of Things. IEEE
4     CONCLUSIONS & FUTURE WORK                                                                 Explore.
                                                                                           [11] P. Missier, S. Dey, K. Belhajjame, V. Cuevas-Vicenttín, and B. Ludäscher. 2013.
In this paper, we introduced the EP-Plan ontology for describing                                D-PROV: Extending the PROV Provenance Model with Workflow Structure. In
scientific experiments. In particular, we focused on describing ex-                             5th USENIX Workshop on the Theory and Practice of Provenance (TaPP 13).
                                                                                           [12] L. Moreau, P. Groth, J. Cheney, T. Lebo, and S. Miles. 2015. The rationale of PROV.
periments at different levels of abstraction. In our future work we                             Journal of Web Semantics 35 (2015), 235 – 257.
                                                                                           [13] Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields. 2014. Work-
6 Links ep-plan:correspondsToStep that link ep-plan:MultiActivity from the execution
                                                                                                flows for e-Science: Scientific Workflows for Grids. Springer Publishing Company,
trace record to ep-plan:MultiStep in the plan specification are not shown in the figure.        Incorporated.
7 Links ep-plan:correspondsToVariable and ep-plan:correspondsToStep that link ep-
plan:Entity and ep-plan:Activity from the execution trace record to ep-plan:Variable
and ep-plan:Step in the plan specification respectively are not shown in the figure.       8 http://www.opmw.org/model/OPMW/