=Paper=
{{Paper
|id=Vol-2526/short1
|storemode=property
|title=Linking Abstract Plans of Scientific Experiments to their Corresponding Execution Traces (short paper)
|pdfUrl=https://ceur-ws.org/Vol-2526/short1.pdf
|volume=Vol-2526
|authors=Milan Markovic,Daniel Garijo,Peter Edwards
|dblpUrl=https://dblp.org/rec/conf/kcap/MarkovicGE19
}}
==Linking Abstract Plans of Scientific Experiments to their Corresponding Execution Traces (short paper)==
Linking Abstract Plans of Scientific Experiments to their Corresponding Execution Traces Milan Markovic Daniel Garijo Peter Edwards University of Aberdeen Information Sciences Institute, University of Aberdeen Aberdeen, UK University of Southern California Aberdeen, UK milan.markovic@abdn.ac.uk Los Angeles, USA p.edwards@abdn.ac.uk dgarijo@isi.edu ABSTRACT engine. Similarly, scientific workflows may contain high-level ab- Provenance describes the creation, manipulation and delivery pro- stract steps that lead to different implementations depending on the cesses of scientific results; and has become a crucial requirement algorithms selected for execution [4]. Linking these abstract plans for debugging, understanding, inspecting and reproducing the out- with their execution traces requires additional mechanisms which comes of scientific publications. Scientific experiments, in partic- have not been defined in P-Plan or other recent efforts for prove- ular computational workflows, often include provenance collec- nance representation in scientific workflows such as the Research tion mechanisms that link execution traces to their respective Object Model [1] and ProvOne1 specifications. planned specifications. Such provenance traces are typically very In this paper, we use the Extended P-Plan ontology (EP-Plan)2 fine-grained, and may quickly become too complex or difficult for to link together different abstractions of scientific workflow plans humans to interpret. In this paper we describe our approach to and their execution traces described using PROV-O. represent workflow plans and provenance at different levels of ab- We first detail the challenges for linking provenance to abstract straction. We describe EP-Plan, a W3C PROV ontology extension plans in Section 2 by using examples from the WINGS worfklow and we illustrate our approach with a use case using the WINGS system [5]. We then describe how we have addressed these chal- workflow system. lenges with the EP-Plan ontology in Section 3, and we conclude with a discussion of our future work. KEYWORDS Plan, scientific workflows, provenance, abstractions 2 ABSTRACT PLANS AND PROVENANCE IN SCIENTIFIC WORKFLOWS 1 INTRODUCTION During their lifecycle, scientific workflows may be defined at dif- ferent levels of abstraction, from an abstract original specification Scientific workflows describe the computational steps and data by a user to a fully detailed execution plan prepared for a workflow dependencies that are necessary to carry out a scientific experi- engine [4]. Here we focus on supporting three common use cases ment [13]. Scientific workflows can be found in a wide range of in scientific workflows: domains, ranging from Geosciences to Bioinformatics, as they have demonstrated their utility for reproducing previous experiments, • Collections of activities and entities: Plans may contain ab- improving standardization practices in a research lab and educat- stractions that summarize execution activities to be per- ing students on existing methods [2]. Scientific workflow systems formed in parallel. Figure 1 shows an example using a work- usually have the ability to capture the provenance traces of exe- flow for water quality analysis in the WINGS workflow sys- cuted experiments, to support inspection of results and debugging tem [6]. As shown on the left of the figure, some steps rep- of workflow errors [12]. The W3C recommendation PROV-O [9] is resent collections of executions, depicted as stacked boxes. a standard model for representing provenance of any entity in the For example, the step MetabolismCalcEmpirical receives a Web, by exposing the series of activities that used or generated such collection of HourlyData files which will be executed in par- entities. PROV-O is often used as a reference model by workflow allel. The right side of the figure shows a fragment of the systems when exposing provenance traces to users. corresponding execution plan of the workflow on the left, While PROV-O provides mechanisms to represent provenance after a user has specified the input files and hence the system in detail, it does not describe how to represent the plan that was has prepared the full execution plan. Since provenance is used to produce a provenance trace. Therefore, in previous work we tracked at the granularity of the execution plan (shown at described the P-Plan ontology [3], a simple representation of the set the right of the figure), it is necessary to define properties of planned steps that guided an execution. However, plan specifica- to group entities and associate them to the corresponding tions may become complex (with hundreds or thousands of steps) abstract plan. and thus workflow designers tend to simplify them by separating • Workflow fragments: Workflow systems often include the them into smaller plans (sub-workflows). In addition, workflow ability to define sub-workflows to simplify complex work- specifications may be parallelised into hundreds or thousands of flow plans. A sub-workflow would then appear as a single jobs when submitted to a distributed environment by a workflow step in the bigger workflow. While this mechanism is helpful 1 http://purl.org/provone Copyright ©2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 https://w3id.org/ep-plan SciKnow, Marina Del Rey, CA, USA, November, 2019 Milan Markovic, Daniel Garijo, and Peter Edwards Figure 1: An abstract workflow for water quality analysis (left) and a fragment of its execution plan (right). Some of the steps represent collections of executions which will be executed in parallel. Arrows represent the dataflow. workflow execution summary on the left represents the exe- cution of the workflow as a single activity. The provenance trace on the right represents the full workflow execution trace. Both the execution summary and the full provenance trace represent valid views of a workflow execution, and should be linked together. Linking these different levels of granularity together is crucial for workflow systems to inform a user in case of execution errors. To further support users’ ability to understand errors and how plans and their fragments may be reused, plans should also contain additional metadata that provide information about the context in which the individual planned steps were deployed and executed. This may include information about any associated constraints (e.g. for input validation), the agents expected to perform individual steps, objectives the plan is trying to achieve, etc. While specifications such as the Research Object Model, D-PROV [11] ProvOne or CWL-PROV [7] define mechanisms to describe sub-workflows and entity collections (e.g., by defining part of re- Figure 2: An example of a simple provenance summary. The lationships) they do not define clear mechanisms to link together execution of a workflow on the right is summarized as a sin- provenance traces at different levels of granularity. Below we de- gle activity on the left. Arrows represent usage/generation scribe how we address these issues with the EP-Plan ontology. dependencies. 3 USING EP-PLAN TO REPRESENT PLANS AT DIFFERENT LEVELS OF ABSTRACTION for easing the understandability of the scientific workflow, it requires the means to link the provenance for the sub- EP-Plan builds on P-Plan3 [3], a vocabulary designed for aligning workflow execution back to the provenance of the workflow simple plans to their corresponding provenance traces. EP-Plan where it was included. was designed for cross domain applications (e.g. the use of EP- • Execution summaries: When workflow execution plans be- Plan for enhancing Internet of Things deployments is detailed in come complicated, the corresponding provenance traces may [10]) and uses ep-plan:Step to denote any planned process, and be too convoluted to explore by users who only want to know ep-plan:Variable to represent inputs and outputs of steps. more about the inputs used to generate the result of a work- flow. Figure 2 shows a simple example of this behaviour: the 3 p-plan namespace: http://purl.org/net/p-plan Linking Abstract Plans of Scientific Experiments to their Corresponding Execution Traces SciKnow, Marina Del Rey, CA, USA, November, 2019 ep-plan:hasInputVariable ep-plan:isDecomposedAsPlan ep-plan:hasOutputVariable ep-plan:isSubPlanOfPlan ep-plan:Variable ep-plan:isElementOfPlan ep-plan:Step :SummarizedWf :ExecutedWf ep-plan: (ep-plan:Plan) (ep-plan:Plan) ep-plan:isElementOfPlan ep-plan: hasPart isSubPlanOfPlan ep-plan: ep-plan: hasPart MultiVariable ep-plan:Plan ep-plan:MultiStep :InputFilesVar :File1Var :File2Var prov:wasDerivedFrom ep-plan: (ep-plan:MultiVariable) (ep-plan:Variable) (ep-plan:Variable) isDecomposedAsPlan ep-plan: ep-plan: ep-plan:hasInputVariable ep-plan:hasTraceElement ep-plan:hasInputVariable ExecutiontraceBundle correspondsToStep :ExecuteWorkflowStep :AggregateStep ep-plan:correspondsToVariable ep-plan:hasTraceElement (ep-plan:MultiStep) ep-plan:Activity (ep-plan:Step) ep-plan:Entity prov:used ep-plan:hasOutputVariable ep-plan:hasOutputVariable prov:wasGeneratedBy prov:hadMember :OutputFilesVar :AggregatedDataVar (ep-plan:MultiVariable) (ep-plan:Variable) ep-plan:EntityCollection ep-plan:MultiActivity ep-plan:hasInputVariable A class for A class for describing ep-plan:hasPart owl:ObjectProperty describing plan execution trace :SortStep rdfs:SubClassOf elements elements (ep-plan:Step) ep-plan:hasOutputVariable Figure 3: An overview of a subset of EP-Plan concepts for de- scribing and linking plan specifications with their execution :OutputVar :ErrorLogVar (ep-plan:Variable) (ep-plan:Variable) traces. ep-plan: owl:Named Individual Figure 3 illustrates a subset of EP-Plan concepts that define isElementOfPlan owl:ObjectProperty (plan) mechanisms for linking plan specification and execution traces at different levels of abstraction. Both steps and variables belong Figure 4: An example illustrating decomposition of ep- to ep-plan:Plan (modelled as a subclass of prov:Plan defined in plan:Multistep into a sub-plan and linking of variables PROV-O4 ) and are linked to their corresponding executions de- across different levels of plan abstractions. scribed as ep-plan:Activity and ep-plan:Entity (modelled as sub- :SummarizedWf ep-plan: :ExecutedWf classes of prov:Activity and prov:Entity). A workflow execution (ep-plan:Plan) isSubPlanOfPlan (ep-plan:Plan) typically produces an execution trace that consists of a number prov:wasDerivedFrom prov:wasDerivedFrom of activities and entities representing instantiations of different parts of a plan. In EP-Plan, a single execution trace is grouped by :AbstractExecutionTrace (ep-plan:ExecutionTraceBundle) :ExecutionTrace (ep-plan:ExecutionTraceBundle) ep-plan:ExecutionTraceBundle (a subclass of prov:Bundle). A single prov: plan specification may then be linked to multiple execution traces hadMember using prov:wasDerivedFrom. To allow linking of different levels :InputFiles :File1 :File2 of workflow abstractions, EP-Plan provides mechanisms to group (ep-plan:EntityCollection) (ep-plan:Entity) (ep-plan:Entity) related workflow steps defined at a finer level of detail together as prov:used prov:used a sub-plan that then further describes a step of a more abstract plan :WorkflowEcexution :Aggregate denoted as ep-plan:MultiStep. The left side of Figure 4 illustrates a (ep-plan:MultiActivity) (ep-plan:Activity) high level abstraction of a workflow plan (:SummarizedWf ) contain- prov:wasGeneratedBy prov:wasGeneratedBy ing a single ep-plan:MultiStep (:ExecuteWorkflowStep) that is then described in more detail on the right side of the figure as a sub-plan :OutputFiles :AggregatedData (ep-plan:EntityCollection) (ep-plan:Entity) (:ExecutedWf ). In the same figure, the abstract workflow (:Summa- rizedWf ) also includes abstractions of two variables (:InputFilesVar prov:hadMember prov:used and :OutputFilesVar) described using the class ep-plan:MultiVariable. :Sort (ep-plan:Activity) In the sub-plan specification, each of the multivariables is decom- posed into two individual variables (e.g., InputFilesVar decomposes prov:wasGeneratedBy into File1Var and File2Var) and linked using ep-plan:hasPart. :Output :ErrorLog Figure 5 illustrates an example execution trace with two exe- (ep-plan:Entity) (ep-plan:Entity) cution trace bundles corresponding to the plan and its sub-plan shown in Figure 4. Execution trace elements corresponding5 to ep-plan: isElementOfTrace owl:Named Individual owl:Named Individual multi variables defined in the :SummarizedWf plan (see Figure 4) owl:ObjectProperty (plan) (execution trace) correspond to trace elements of the type ep-plan:EntityCollection 4 prov namespace: http://www.w3.org/ns/prov# Figure 5: An example description of execution traces corre- 5 Links ep-plan:correspondsToVariable that link ep-plan:EntityCollection from the execu- sponding to workflows defined at different levels of abstrac- tion trace record to ep-plan:MultiVariable in the plan specification are not shown in tion. the figure. SciKnow, Marina Del Rey, CA, USA, November, 2019 Milan Markovic, Daniel Garijo, and Peter Edwards which is a subclass of prov:Collection (see :InputFiles and :Output- aim to focus on using EP-Plan to enhance the provenance traces Files in Figure 5). The usage and generation of these entity col- generated by WINGS (which currently uses the OPMW ontology lections is ascribed to a trace element :WorkflowExecution (mod- 8 ) with additional plan descriptions. OPMW extends both P-Plan elled as ep-plan:MultiActivity) using relationships prov:used and and Prov-O and therefore it should be possible to align existing prov:wasGeneratedBy. The trace element :WorkflowExecution cor- provenance descriptions generated by WINGS with the EP-Plan responds6 to the plan element :executeWorkflowStep shown in Fig- vocabulary. The WINGS system also uses semantic implementa- ure 4. The right side of Figure 5 shows a more detailed execution tions of constraints to plan and execute scientific workflows [8]. We trace corresponding7 to the :ExecutedWf plan specification shown will explore how these can be mapped to the constraint concepts on the right side of Figure 4. Instantiations of plan variables are defined in EP-Plan and hence included as apart of the experiment captured as instances of ep-plan:Entity (e.g. see :File1) and instan- metadata with the plan specification. tiations of steps are captured as instances of ep-plan:Activity (e.g. see :Aggregate). Relationships prov:hadMember are used to link ACKNOWLEDGMENTS trace elements corresponding to abstract multivariables (modelled The work described in this paper was funded by the award made as ep-plan:EntityCollection in :AbstractExecutionTrace) and their by the RCUK Digital Economy programme to the University of more detailed description in :ExecutionTrace produced by the sub- Aberdeen (EP/N028074/1), a SICSA PECE travel award, the Defense workflow specification. Advanced Research Projects Agency with award W911NF-18-1- To summarise, using the mechanisms outlined above, EP-Plan 0027, the SIMPLEX program with award W911NF-15-1-0555 and enables modelling of abstracted workflow specifications by collaps- from the National Institutes of Health under awards 1U01CA196387 ing multiple steps and variables into aggregated plan elements (i.e. and 1R01GM117097. multisteps and multivariables). Sub-plans containing more detailed descriptions of plan abstractions may be linked and reused by dif- REFERENCES ferent plans (i.e. as workflow fragments), as these are modelled as [1] Khalid Belhajjame, Jun Zhao, Daniel Garijo, Matthew Gamble, Kristina Hettne, individual plan specifications (including any relevant metadata). Raul Palma, Eleni Mina, Oscar Corcho, José Manuel Gómez-Pérez, Sean Bechhofer, et al. 2015. Using a suite of ontologies for preserving workflow-centric research Furthermore, by leveraging the concept of collections, we are also objects. Web Semantics: Science, Services and Agents on the World Wide Web 32 able to maintain links between different abstractions of execution (2015), 16–42. https://doi.org/10.1016/j.websem.2015.01.003 traces without violating PROV-O semantics. [2] Daniel Garijo, Oscar Corcho, Yolanda Gil, Meredith N Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad, Paul Thompson, and Arthur W Toga. 2014. Workflow Finally, in contrast with P-Plan, EP-Plan provides a richer vo- reuse in practice: a study of neuroimaging pipeline users. In e-Science (e-Science), cabulary for capturing plan metadata which (for reasons of space) 2014 IEEE 10th International Conference on, Vol. 1. IEEE, 239–246. https://doi.org/ 10.1109/eScience.2014.33 is not discussed in detail in this paper. Briefly, this includes the [3] D. Garijo and Y. Gil. 2012. Augmenting PROV with Plans in P-PLAN: Scientific ability to associate descriptions of agents that are allowed to exe- Processes as Linked Data. In Proceedings of the 2nd International Workshop on cute different steps of a plan, to link descriptions of policies, and to Linked Science, Vol. 951. CEUR Workshop Proceedings. [4] Y. Gil, D. Garijo, M. Knoblock, A. Deng, R. Adusumilli, V. Ratnakar, and P. Mallick. describe specifications of how data should be exchanged between 2017. Improving Publication and Reproducibility of Computational Experiments steps. Plan elements can be also associated with descriptions of through Workflow Abstractions. In Proceedings of the Workshop on Capturing constraints that provide a high level reference to any restrictions Scientific Knowledge (SciKnow). Austin, Texas. [5] Y. Gil, V. Ratnakar, E. Deelman, G. Mehta, and J. Kim. 2007. Wings for pegasus: that can be linked to and evaluated against elements of an execution Creating large-scale scientific applications using semantic representations of trace. EP-Plan also enables descriptions of objectives to be associ- computational workflows. In Proceedings of the National Conference on Artificial Intelligence, Vol. 22. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT ated with the plan. Objectives may then be linked to the individual Press; 1999, 1767. plan elements that achieve them. Each element may also be linked [6] Y. Gil, V. Ratnakar, J. Kim, P. Antonio Gonzalez-Calero, P. Groth, J Moody, and E. to a rationale (e.g. user-readable description) which details why Deelman. 2011. Wings: Intelligent Workflow-Based Design of Computational Experiments. IEEE Intelligent Systems 26, 1 (2011). the element was included in the plan specification. These concepts [7] F. Khan, S. Soiland-Reyes, R. Sinnott, A. Lonie, C. Goble, and M. Crusoe. 2018. are important for describing the execution context of a scientific Sharing interoperable workflow provenance: A review of best practices and their experiment. This may include, for example, specifications of indi- practical application in CWLProv. (Dec. 2018). https://doi.org/10.5281/zenodo. 1966881 Submitted to GigaScience (GIGA-D-18-00483). vidual scientists that are allowed to control certain steps of a plan, [8] J. Kim, E. Deelman, Y. Gil, G. Mehta, and V. Ratnakar. 2008. Provenance trails in links to a data protection policy applicable to an experiment using the wings/pegasus system. Concurrency and Computation: Practice and Experience 20, 5 (2008), 587–597. sensitive or personal data, constraint descriptions which provide [9] T. Lebo, S. Sahoo, and D. McGuinness. April 2013. PROV-O: The PROV ontology. further information about the portions of a workflow that failed to Technical Report. https://www.w3.org/TR/2013/REC-prov-o-20130430/ execute due to constraint violation, etc. [10] M. Markovic, D. Garijo, P. Edwards, and W. Vasconcelos. 2019. Semantic Mod- elling of Plans and Execution Traces for Enhancing Transparency of IoT Systems. In Proceedings of the 6th IEEE International Conference on Internet of Things. IEEE 4 CONCLUSIONS & FUTURE WORK Explore. [11] P. Missier, S. Dey, K. Belhajjame, V. Cuevas-Vicenttín, and B. Ludäscher. 2013. In this paper, we introduced the EP-Plan ontology for describing D-PROV: Extending the PROV Provenance Model with Workflow Structure. In scientific experiments. In particular, we focused on describing ex- 5th USENIX Workshop on the Theory and Practice of Provenance (TaPP 13). [12] L. Moreau, P. Groth, J. Cheney, T. Lebo, and S. Miles. 2015. The rationale of PROV. periments at different levels of abstraction. In our future work we Journal of Web Semantics 35 (2015), 235 – 257. [13] Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields. 2014. Work- 6 Links ep-plan:correspondsToStep that link ep-plan:MultiActivity from the execution flows for e-Science: Scientific Workflows for Grids. Springer Publishing Company, trace record to ep-plan:MultiStep in the plan specification are not shown in the figure. Incorporated. 7 Links ep-plan:correspondsToVariable and ep-plan:correspondsToStep that link ep- plan:Entity and ep-plan:Activity from the execution trace record to ep-plan:Variable and ep-plan:Step in the plan specification respectively are not shown in the figure. 8 http://www.opmw.org/model/OPMW/