=Paper= {{Paper |id=Vol-1800/short2 |storemode=property |title=Integrating Domain-Data Steering with Code-Profiling Tools to Debug Data-Intensive Workflows |pdfUrl=https://ceur-ws.org/Vol-1800/short2.pdf |volume=Vol-1800 |authors=Vitor Silva,Leonardo Neves,Renan Souza,Alvaro Coutinho,Daniel de Oliveira,Marta Mattoso |dblpUrl=https://dblp.org/rec/conf/sc/SousaNSC0M16 }} ==Integrating Domain-Data Steering with Code-Profiling Tools to Debug Data-Intensive Workflows== https://ceur-ws.org/Vol-1800/short2.pdf

Integrating Domain-data Steering with Code-profiling Tools
to Debug Data-intensive Workflows
Vítor Silva§, Leonardo Neves§, Renan Souza§◊, Alvaro Coutinho§,
Daniel de Oliveira★, Marta Mattoso§
§Federal University of Rio de Janeiro (COPPE/UFRJ), Rio de Janeiro, Brazil
★
Computing Institute, Fluminense Federal University (IC/UFF), Niterói, Brazil
◊IBM Research Brazil, Rio de Janeiro, Brazil

{silva, renanfs, marta}@cos.ufrj.br, lrmneves@poli.ufrj.br,
alvaro@nacad.ufrj.br, danielcmo@ic.uff.br

ABSTRACT values and input data that produces output data. Besides the
activations, parallel SWfMS control the data dependencies among
Computer simulations may be composed of scientific programs activities. It is worth mentioning that the dependency management of
chained in a coherent flow and executed in High Performance this dataflow and provenance support are some of the advantages of
Computing environments. These executions may present anomalies SWfMS in relation to executing workflows using Python scripts [8]
associated to the data that flows in parallel among programs. Several or Spark [7].
parallel code-profiling tools already support performance analysis,
such as Tuning and Analysis Utilities (TAU) or provide fine-grained It is far from trivial to monitor and steer performance of the resource
performance statistics such as the System Activity Report (SAR). consumption related to domain data during the parallel execution of
However, these tools do not associate their results to their workflows [8]. Users need to relate performance and resource
corresponding dataflows. Such analysis is fundamental to trace back consumption information with domain data to plan actions. For
the data origins of an error. In this paper, we propose to couple a example, the execution of simulation programs with several
workflow monitoring data approach to parallel code-profiling tools combinations of parameter values correspond to the production of
for workflow executions. The goal is to profile and debug parallel many data files, whose contents present relevant domain data to the
workflow executions by querying a database that is able to integrate result analysis. Despite the challenges to find those several raw data
performance, resource consumption, provenance, and domain data files, users have to develop ad-hoc programs to access and extract the
from simulation programs at runtime. We have implemented our data contents of these files (often binary or specific formats) to analyze
monitoring approach as a software component that was coupled to the workflow results.
TAU and SAR code profiling tools. We show how querying the Most of the parallel SWfMS have addressed the need of performance
resulting integrated database enables domain-aware runtime steering and resource consumption monitoring facilities by adding new
of performance anomalies by using the astronomy Montage components in their workflow engines, or loading data into databases
workflow, as a motivating example. We observe that the overhead (at runtime or after the workflow execution) to be further queried by
introduced by our approach is negligible. users [9]. Approaches such as STAMPEDE [10], which has been
coupled to Pegasus, are also able to monitor the execution of
Keywords workflows in HPC environments at runtime. However, this coarse
grain information prevents users to understand the behavior of the
Performance analysis; debugging; scientific workflow; provenance. data derivation (i.e., dataflow path) associated to the performance and
resource consumption.
1. INTRODUCTION To address low level execution information, there are code-profiling
A workflow is an abstraction that defines a set of activities and a tools that support debugging and profiling of HPC scientific
dataflow among them [1]. Each activity is associated to a simulation applications, such as Tuning and Analysis Utilities (TAU) [11]. TAU
program, which is responsible for the consumption of an input instruments the application code to capture performance data and
dataset and the production of an output dataset. Many workflows invokes ParaProf for presenting these data, for instance, using 3D
process a large volume of data, requiring the effective use of High visualizations. Other tools, such as System Activity Report (SAR)1,
Performance Computing (HPC) or High-Throughput Computing provide system statistics from time to time, but disconnected from
(HTC) environments allied to parallelization techniques such as data the workflow execution data. When performance and resource
parallelism or parameter sweep [2]. consumption data are not related to fine-grained domain data (i.e.
data value within raw data files), the user may not see that a certain
To support the modeling and execution of workflows in those data value from a huge file is presenting an anomalous behavior.
environments, standalone parallel Scientific Workflow Management
Systems (SWfMS) were developed, such as Swift/T [3], Pegasus [4] In this paper, we present a database-oriented approach that is able to
and Chiron [5], or SWfMS embedded in Science gateways such as extract and represent fine-grained performance with resource
WorkWays [6]. To foster data parallelism, the activities of workflows consumption data associated to workflow information, provenance
can be instantiated as tasks for each input data, known as activations and domain-specific data all into a single database managed by a
[6] (we are going to use the term activation consistently throughout
this paper). Each activation executes a specific program or
1
computational service in parallel, consuming a set of parameter http://pubs.opengroup.org/onlinepubs/7908799/xsh/sysstat.h.html

Copyright held by the author(s)
59
WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah

relational database management system at runtime. In [12] and [13], DARSHAN is a resource consumption profiler that monitors I/O
we showed how domain and provenance data associated to execution operations in applications with a non-intrusive solution.
data are able to improve steering, debugging and workflow execution
time. However, execution data was limited to performance data System Activity Report (SAR) is a Linux monitor command that
captured by SWfMS. Thus, users still had to explore TAU or other informs system loads, including CPU activity, memory usage
tools to improve debugging, while having a hard time to associate statistics, etc. The statistics provided by SAR are fine-grained, but
debugging tools to the enriched provenance database. Moreover, we this approach is disconnected from the workflow concept. To use
also contribute in this paper by developing a component for capturing SAR in SWfMS, one should couple it to the workflow engine or call
performance and resource consumption metrics that is designed on it within the program that is invoked. We have used SAR to help on
top of TAU and SAR tools, which we named as PerfMetricEval. We other workflow applications, but associating SAR information to
coupled PerfMetricEval to Chiron SWfMS. provenance and domain-specific data is far from trivial. Similarly to
SAR, CCTools 2 presents a resource monitor tool for gathering
This paper is organized in five sections. Section II discusses related performance data during the execution of applications, which enables
work. Section III describes our approach for performance and visualization of performance data, such as the memory usage and %
resource consumption monitoring and the integration between of CPU usage. Ganglia [18] is a distributed monitoring system for
PerfMetricEval with Chiron SWfMS. We also show the evaluation of distributed infrastructures. Ganglia captures performance information
the proposed approach in this section using the Montage workflow in from infrastructure and also presents similar visualizations for
a cluster environment. Section IV concludes the paper and presents memory, disk usage, network statistics, number of running processes,
some final remarks. etc. However, Ganglia usage is similar to SAR’s and CCTools’, and
thus they all present the same difficulties to associate performance
2. RELATED WORK data with provenance and domain-specific data.
Therefore, we observe that existing approaches do provide valuable
Related work is organized in two broad categories, systems that
information for computer science experts to debug a code or to
monitor the execution of workflows and tools for monitoring low
understand the performance of a workflow execution or to follow a
level information on performance and resource consumption. There
scientific workflow execution in an HPC environment. However,
are several SWfMS that provide monitoring and performance
when using those existing solutions, users may miss important
analysis mechanisms within their engines. ASKALON [13], Swift/T,
opportunities to understand the behavior of data derivation based on
Pegasus (kickstart tool [14]), Makeflow [15] and Chiron provide
resource consumption information. When resource consumption data
monitoring mechanisms for users to follow the execution of the
are not related to domain data, it may be hard to find which specific
workflow and to analyze its behavior. Swift/T and Pegasus provide
data value is presenting an anomalous memory consumption or
interfaces to follow the execution while Chiron allows for database
execution behavior and act directly on this. In data-intensive
queries to be submitted at runtime for workflow monitoring. Swift/T
workflows this lack becomes really an issue.
and Pegasus provide information about the amount of activations
executed, the execution time of each activation and the resources
used. Specifically, Pegasus SWfMS uses the Kickstart tool to do 3. THE PERFMETRICEVAL COMPONENT
performance analyses that can also be associated to provenance data.
Makeflow is a workflow approach that enables performance TAU is a tool that supports debugging and profiling of HPC
monitoring and debugging on HPC environments, such as application scientific applications for computer experts, like instrumenting the
elapsed time. Chiron provides information about the amount of application code to capture performance data and to present these
activations, their execution times, and domain data in the same data using a graphical representation. However, users from the
database, but it does not provide performance metrics neither application domain also need to analyze performance and resource
resource consumption data. STAMPEDE [10] is a SWfMS- consumption together with the domain-specific data, as well as to be
independent solution that provides a common model for workflow aware of all data transformations that have occurred in the workflow
monitoring, however it does not consider domain-specific data. These parallel execution. In this section, we show how TAU, SAR and
SWfMS solutions fail to combine domain data and workflow other tools may be integrated to the provenance database of a
execution data with performance and resource consumption SWfMS. To integrate TAU, SAR and the provenance database, we
information. have extended a W3C PROV-compliant provenance database schema
with performance and resource consumption information.
WorkWays [6] and FireWorks [17] enable users to monitor the status
and the elapsed time of tasks, and correlate those data to domain- In this paper we consider metrics such as total elapsed time (that can
specific data. They also display performance data related to memory be decomposed to identify bottlenecks related to the computational
usage and I/O operations. Those performance data allow for simulation; for example, communication bottleneck), CPU usage,
identifying performance bottlenecks in HPC environments, however, memory consumption and I/O, and transfer rates statistics to be
they are not related to domain data. captured and stored in the provenance database. We decomposed the
total workflow elapsed time (T_wf), which corresponds to the
There are other approaches that provide detailed performance and workflow wall-clock time, into three different metrics: useful
resource consumption information for applications (i.e. disconnected computing time (time needed for executing a specific activation –
from the workflow concept). Tuning and Analysis Utilities (TAU) T_comp), communication time (time needed to perform
[11] is a profiling tool that gathers performance information and communication between processes/machines – T_comm) and time
visualize it on interactive graphs using ParaProf. TAU gathers taken to access the provenance database (T_prov), thus T_wf =
performance information by instrumenting functions, methods, basic T_comp + T_comm + T_prov.
blocks, and statements as well as event-based sampling. To use TAU
in workflows, it is required to instrument both the applications and
the workflow engine to collect performance data. Similar to TAU,
2
http://ccl.cse.nd.edu/software

60
WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah

For CPU usage, we consider the cumulative runtime CPU usage of We use the well-known scientific workflow Montage [19], presented
all machines involved in the execution of the workflow. The CPU in Figure 2, from the astronomy domain as the case study of
usage can be decomposed into the percentage of CPU utilization that PerfMetricEval. A basic analysis assessment is the workflow
occurred while executing the application (usr), the percentage of computing time in each machine. One simple SQL query can sum the
CPU utilization that occurred while executing at the system level actual computing time, the time spent with communication and the
(sys), the percentage of time that the CPU was idle during which the time needed for storing provenance for each activation and group by
system had an outstanding disk I/O request (iowait) and the each used machine. Based on these queries, we register that the
percentage of time that the CPU was idle and the system did not have experimental results refer to a workflow execution of 17 hours in a
an outstanding disk I/O request (idle). SGI Altix ICE 8200 at NACAD/COPPE/UFRJ with four machines
2x Quad Core Intel Xeon X5355 2.66 GHz (32 cores). The Montage
PerfMetricEval captures performance and resource consumption execution consumed 1,585 input file images, which produced 17,503
metrics using both TAU and SAR. Using TAU, we capture the activations that were executed in parallel.
elapsed time of computing, communication, and provenance
operations. However, since TAU does not provide memory, CPU and
I/O statistics, we get those from SAR.
The integration of Chiron with PerfMetricEval is based on inserting
an invocation of PerfMetricEval component (after the execution of
each activation) to gather fine-grained performance information from
TAU and SAR, insert this data in the provenance database and then
convert it into TAU files (profiles for each computing node, e.g.
profile.0.x.0 for node x). The generated TAU files serve as input for
TAU to plot, for instance, 3D graphs using ParaProf [11].
PerfMetricEval execution flow is presented in Figure 1. In
https://github.com/hpcdb/PerfMetricEval, the component is available
for download with explanations on how to configure a database and
invoke Chiron with PerfMetricEval.

ﬁles extracted with
Provenance SAR and TAU SAR and TAU
Database

PerfMetricEval
Proﬁle ﬁles in TAU’s format Figure 2. Montage workflow
Analyzing Figure 3 we can also state that only Machine0 loads
provenance data in the database. It is due to the architectural
PerfMetricEval
characteristics of Chiron, where only the master node is responsible
for storing provenance data in the database. This was an architectural
choice for Chiron, since the slave nodes can process new activations
Figure 1. The PerfMetricEval execution flow without being locked by the provenance management. We can also
state that Machine0 is the one that presents the highest
After the execution of each activation, Chiron invokes the communication overhead, because it integrates provenance data
PerfMetricEval component that identifies the elapsed time of the storage and all machines send provenance data to it using messages,
activation and invokes SAR to gather resource consumption increasing the communication cost for Machine0.
information related to the corresponding activation. For each metric,
it is created a new file and stored in the workflow workspace. By
creating those files, PerfMetricEval is able to parse them, extract
performance information, and asynchronously load it into the
provenance database.
After loading the performance and resource consumption data into
the same database, the PerfMetricEval component provides a feature
to query the data relevant to the user-defined parameters for
performance and resource consumption analysis by executing
standard SQL queries made by users. After querying the database and
gathering the results, the PerfMetricEval provides a component to
generate a file in TAU format to profile the execution. The TAU
visualization tool (paraprof command) graphically displays the
generated files. Since the used format is compatible with this code-
profiling tool, it enables the creation of images as bar graphs, 3D
meshes and scatter charts, all interactive and with customization Figure 3. The computing time of each machine
capabilities inherent to TAU. This integration of PerfMetricEval and
Chiron enables in-depth analysis graphs generated within seconds Another important analysis considers activity average resource
and support decision-making by users at runtime. consumption per machine. Besides, users need to relate the resource

61
WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah

consumption and the domain-specific data. Domain scientists Most SWfMS already provide some level of monitoring capabilities.
commonly have a fairly good execution time estimate for a specific However, their monitoring mechanisms are limited to following the
activation (based on their experience and previous executions, all amount of activations executed, the volume of data transferred, the
registered in the provenance database). Using Chiron + average execution time of activities, etc. In this paper, we provided
PerfMetricEval they are able to check if the real activation execution an important opportunity to understand the behavior of the data
time or resource consumption meets the estimate. If the real derivation based on the performance and resource consumption
execution time is considerably higher than the estimate, they are able metrics. When performance data and resource consumption data are
to identify an anomalous behavior with the corresponding domain not related to domain data, users may not see that a certain data value
files and parameter data of that particular anomaly. For example, it is is presenting an anomalous behavior.
well known for Montage users that the image region of interest can
impact the performance and resource consumption of workflow This paper proposes an approach that integrates provenance data,
activations. Thus, users often need to analyze the behavior of a domain data, performance information and resource consumption
specific subset of the input data. In this case, the image region of information in the same integrated database. To achieve this, we
interest can be defined by setting the domain attributes CRPIX1 and introduce PerfMetricEval, a component for capturing performance
CRPIX2 (which values are also loaded from data files to the and resource consumption data using specialized tools such as SAR
provenance database) that represent the pixels that define the region and TAU. We integrated PerfMetricEval to Chiron SWMS. We
of interest. In this small scale of Montage workflow execution with evaluated the present approach by monitoring the Montage workflow
Chiron, we observed a generation of 10,647 files. Finding which file and performing analytical queries that mix different types of data,
has the region of interest with the anomalous behavior is very error thus leaving room for domain specialists and code developers to fine
prone when the performance data is separated from the workflow tune activities such as investigating input and output data for that
execution data and domain data. With the data integration, one query particular activity execution when its execution is taking too long.
can retrieve the average memory consumption, the average memory Using the Montage workflow, we also noticed that the overhead is
used, the average CPU usage, the average CPU usage for one negligible when compared to the total time needed to execute the
operational system, the average amount of disk blocks read/write, etc. workflow without PerfMetricEval. In this experiment, it was around
All these data are associated with the domain attributes CRPIX1 and 0.6% of the total workflow execution. Although the results are
CRPIX2 and their corresponding FITS file ids. With the result of this promising, we still have to evaluate Chiron + PerfMetricEval in large
query, the domain user can monitor the performance of activations scale scientific experiments. Despite our component has been only
for building the mosaic (Create Mosaic), limited to only one specific integrated to Chiron SWfMS, we intend to adapt/integrate it to other
image region to check if there is an anomaly in the execution or on SWfMS in the near future. The only restriction is that the SWfMS
the data file contents. The data extracted from these queries allowed needs to store provenance in a database, like Pegasus and Swift/T do.
us to generate several TAU graphs. In Figure 4 we see the CPU usage
per machine when executing activations where 100 < CRPIX1 < 150
and 50 < CRPIX2 < 80. Since Chiron considers the CPU usage on its 5. ACKNOWLEDGMENTS
scheduling algorithm, we can state that there is a load balancing Authors would like to thank CNPq, FAPERJ, HPC4E (EU H2020
among the machines from the CPU usage perspective, i.e., all Programme and MCTI/RNP-Brazil, grant no. 689772), and Intel for
machines present an equivalent CPU use, considering the metrics idle partially funding this work. This research made use of Montage,
(in light blue), iowait (dark blue), sys (green), and user (red). which is funded by the National Science Foundation (NSF).
However, the same behavior is not found when we consider both Leonardo Neves is currently at Language Technologies Institute,
memory and disk usage statistics. Carnegie Mellon University, Pittsburgh, PA, USA.

6. REFERENCES
[1] I. J. Taylor, E. Deelman, D. B. Gannon, and M. Shields,
Workflows for e-Science: Scientific Workflows for Grids, 1st ed.
Springer, 2007.
[2] E. Walker and C. Guiang, “Challenges in executing large
parameter sweep studies across widely distributed computing
environments,” in Workshop on Challenges of large
applications in distributed environments, Monterey, California,
USA, 2007, pp. 11–18.
[3] J. M. Wozniak, T. G. Armstrong, M. Wilde, D. S. Katz, E.
Lusk, and I. T. Foster, “Swift/T: Large-Scale Application
Composition via Distributed-Memory Dataflow Processing,” in
CCGrid, 2013, pp. 95–102.
Figure 4. The CPU statistics for activations executed
[4] E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan, P. J.
Maechling, R. Mayani, W. Chen, R. Ferreira, M. Livny, and K.
4. CONCLUSIONS AND FINAL REMARKS Wenger, “Pegasus, a workflow management system for science
Performing analytical queries in workflows in distributed automation,” FGCS, vol. 46, pp. 17–35, 2015.
environments is an open, yet important, issue. It is fundamental to [5] E. Ogasawara, J. Dias, V. Silva, F. Chirigati, D. Oliveira, F.
follow the status of the workflow execution, especially when they Porto, P. Valduriez, and M. Mattoso, “Chiron: A Parallel
execute for weeks or even months. To be aware of the bottlenecks,
resource consumption, and other performance issues is essential.

62
WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah

Engine for Algebraic Scientific Workflows,” CCPE, vol. 25, no. [13] J. Dias, G. Guerra, F. Rochinha, A. L. G. A. Coutinho, P.
16, pp. 2327–2341, 2013. Valduriez, and M. Mattoso, “Data-centric iteration in dynamic
workflows,” FGCS, vol. 46, pp. 114–126, 2015.
[6] H. A. Nguyen, D. Abramson, T. Kipouros, A. Janke, and G.
Galloway, “WorkWays: interacting with scientific workflows,” [14] R. Prodan, S. Ostermann, and K. Plankensteiner, “Performance
CCPE, vol. 27, no. 16, pp. 4377–4397, Nov. 2015. analysis of grid applications in the ASKALON environment,” in
2009 10th IEEE/ACM International Conference on Grid
[7] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Computing, 2009, pp. 97–104.
Stoica, “Spark: cluster computing with working sets,” in
USENIX conference on Hot topics in cloud computing, Boston, [15] J. S. Vockler, G. Mehta, Y. Zhao, E. Deelman, and M. Wilde,
2010, pp. 10–17. “Kickstarting remote applications,” in International Workshop
on Grid Computing Environments, 2007.
[8] B. Balis, M. Bubak, and B. Łabno, “Monitoring of Grid
scientific workflows,” Scientific Programming, vol. 16, no. 2–3, [16] M. Albrecht, P. Donnelly, P. Bui, and D. Thain, “Makeflow: a
pp. 205–216, 2008. portable abstraction for data intensive computing on clusters,
clouds, and grids,” in Proceedings of the 1st ACM SIGMOD
[9] M. Mattoso, J. Dias, K. A. C. S. Ocaña, E. Ogasawara, F. Costa, Workshop on Scalable Workflow Execution Engines and
F. Horta, V. Silva, and D. de Oliveira, “Dynamic steering of Technologies, 2012, p. 1.
HPC scientific workflows: A survey,” FGCS, vol. 46, pp. 100–
113, May 2015. [17] A. Jain, S. P. Ong, W. Chen, B. Medasani, X. Qu, M. Kocher,
M. Brafman, G. Petretto, G.-M. Rignane, G. Hautier, D. Gunter,
[10] D. Gunter, E. Deelman, T. Samak, C. H. Brooks, M. Goode, G. and K. A. Persson, “FireWorks: a dynamic workflow system
Juve, G. Mehta, P. Moraes, F. Silva, M. Swany, and K. Vahi, designed for high-throughput applications,” CCPE, vol. 27, no.
“Online workflow management and performance analysis with 17, pp. 5037–5059, 2015.
Stampede,” in CNSM, 2011, pp. 1–10.
[18] Monitoring with Ganglia, 1 edition. Sebastopol, CA: O’Reilly
[11] S. S. Shende, “The TAU Parallel Performance System,” Media, 2012.
International Journal of High Performance Computing
Applications, vol. 20, no. 2, pp. 287–311, May 2006. [19] J.C. Jacob, D. S. Katz, G. B. Berriman, J. C. Good, A. C. Laity,
E. Deelman, C. Kesselman, G. Singh, M.-H. Su, T. A. Prince,
[12] V. Silva, D. de Oliveira, P. Valduriez, and M. Mattoso, and R. Williams, “Montage: a grid portal and software toolkit
“Analyzing related raw data files through dataflows,” CCPE, for science-grade astronomical image mosaicking,” IJCSE, vol.
vol. 28, no. 8, pp. 2528–2545, 2016. 4, no. 2, pp. 73–87, 2009.