=Paper=
{{Paper
|id=Vol-1800/short2
|storemode=property
|title=Integrating Domain-Data Steering with Code-Profiling Tools to Debug Data-Intensive Workflows
|pdfUrl=https://ceur-ws.org/Vol-1800/short2.pdf
|volume=Vol-1800
|authors=Vitor Silva,Leonardo Neves,Renan Souza,Alvaro Coutinho,Daniel de Oliveira,Marta Mattoso
|dblpUrl=https://dblp.org/rec/conf/sc/SousaNSC0M16
}}
==Integrating Domain-Data Steering with Code-Profiling Tools to Debug Data-Intensive Workflows==
Integrating Domain-data Steering with Code-profiling Tools to Debug Data-intensive Workflows Vítor Silva§, Leonardo Neves§, Renan Souza§◊, Alvaro Coutinho§, Daniel de Oliveira★, Marta Mattoso§ §Federal University of Rio de Janeiro (COPPE/UFRJ), Rio de Janeiro, Brazil ★ Computing Institute, Fluminense Federal University (IC/UFF), Niterói, Brazil ◊IBM Research Brazil, Rio de Janeiro, Brazil {silva, renanfs, marta}@cos.ufrj.br, lrmneves@poli.ufrj.br, alvaro@nacad.ufrj.br, danielcmo@ic.uff.br ABSTRACT values and input data that produces output data. Besides the activations, parallel SWfMS control the data dependencies among Computer simulations may be composed of scientific programs activities. It is worth mentioning that the dependency management of chained in a coherent flow and executed in High Performance this dataflow and provenance support are some of the advantages of Computing environments. These executions may present anomalies SWfMS in relation to executing workflows using Python scripts [8] associated to the data that flows in parallel among programs. Several or Spark [7]. parallel code-profiling tools already support performance analysis, such as Tuning and Analysis Utilities (TAU) or provide fine-grained It is far from trivial to monitor and steer performance of the resource performance statistics such as the System Activity Report (SAR). consumption related to domain data during the parallel execution of However, these tools do not associate their results to their workflows [8]. Users need to relate performance and resource corresponding dataflows. Such analysis is fundamental to trace back consumption information with domain data to plan actions. For the data origins of an error. In this paper, we propose to couple a example, the execution of simulation programs with several workflow monitoring data approach to parallel code-profiling tools combinations of parameter values correspond to the production of for workflow executions. The goal is to profile and debug parallel many data files, whose contents present relevant domain data to the workflow executions by querying a database that is able to integrate result analysis. Despite the challenges to find those several raw data performance, resource consumption, provenance, and domain data files, users have to develop ad-hoc programs to access and extract the from simulation programs at runtime. We have implemented our data contents of these files (often binary or specific formats) to analyze monitoring approach as a software component that was coupled to the workflow results. TAU and SAR code profiling tools. We show how querying the Most of the parallel SWfMS have addressed the need of performance resulting integrated database enables domain-aware runtime steering and resource consumption monitoring facilities by adding new of performance anomalies by using the astronomy Montage components in their workflow engines, or loading data into databases workflow, as a motivating example. We observe that the overhead (at runtime or after the workflow execution) to be further queried by introduced by our approach is negligible. users [9]. Approaches such as STAMPEDE [10], which has been coupled to Pegasus, are also able to monitor the execution of Keywords workflows in HPC environments at runtime. However, this coarse grain information prevents users to understand the behavior of the Performance analysis; debugging; scientific workflow; provenance. data derivation (i.e., dataflow path) associated to the performance and resource consumption. 1. INTRODUCTION To address low level execution information, there are code-profiling A workflow is an abstraction that defines a set of activities and a tools that support debugging and profiling of HPC scientific dataflow among them [1]. Each activity is associated to a simulation applications, such as Tuning and Analysis Utilities (TAU) [11]. TAU program, which is responsible for the consumption of an input instruments the application code to capture performance data and dataset and the production of an output dataset. Many workflows invokes ParaProf for presenting these data, for instance, using 3D process a large volume of data, requiring the effective use of High visualizations. Other tools, such as System Activity Report (SAR)1, Performance Computing (HPC) or High-Throughput Computing provide system statistics from time to time, but disconnected from (HTC) environments allied to parallelization techniques such as data the workflow execution data. When performance and resource parallelism or parameter sweep [2]. consumption data are not related to fine-grained domain data (i.e. data value within raw data files), the user may not see that a certain To support the modeling and execution of workflows in those data value from a huge file is presenting an anomalous behavior. environments, standalone parallel Scientific Workflow Management Systems (SWfMS) were developed, such as Swift/T [3], Pegasus [4] In this paper, we present a database-oriented approach that is able to and Chiron [5], or SWfMS embedded in Science gateways such as extract and represent fine-grained performance with resource WorkWays [6]. To foster data parallelism, the activities of workflows consumption data associated to workflow information, provenance can be instantiated as tasks for each input data, known as activations and domain-specific data all into a single database managed by a [6] (we are going to use the term activation consistently throughout this paper). Each activation executes a specific program or 1 computational service in parallel, consuming a set of parameter http://pubs.opengroup.org/onlinepubs/7908799/xsh/sysstat.h.html Copyright held by the author(s) 59 WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah relational database management system at runtime. In [12] and [13], DARSHAN is a resource consumption profiler that monitors I/O we showed how domain and provenance data associated to execution operations in applications with a non-intrusive solution. data are able to improve steering, debugging and workflow execution time. However, execution data was limited to performance data System Activity Report (SAR) is a Linux monitor command that captured by SWfMS. Thus, users still had to explore TAU or other informs system loads, including CPU activity, memory usage tools to improve debugging, while having a hard time to associate statistics, etc. The statistics provided by SAR are fine-grained, but debugging tools to the enriched provenance database. Moreover, we this approach is disconnected from the workflow concept. To use also contribute in this paper by developing a component for capturing SAR in SWfMS, one should couple it to the workflow engine or call performance and resource consumption metrics that is designed on it within the program that is invoked. We have used SAR to help on top of TAU and SAR tools, which we named as PerfMetricEval. We other workflow applications, but associating SAR information to coupled PerfMetricEval to Chiron SWfMS. provenance and domain-specific data is far from trivial. Similarly to SAR, CCTools 2 presents a resource monitor tool for gathering This paper is organized in five sections. Section II discusses related performance data during the execution of applications, which enables work. Section III describes our approach for performance and visualization of performance data, such as the memory usage and % resource consumption monitoring and the integration between of CPU usage. Ganglia [18] is a distributed monitoring system for PerfMetricEval with Chiron SWfMS. We also show the evaluation of distributed infrastructures. Ganglia captures performance information the proposed approach in this section using the Montage workflow in from infrastructure and also presents similar visualizations for a cluster environment. Section IV concludes the paper and presents memory, disk usage, network statistics, number of running processes, some final remarks. etc. However, Ganglia usage is similar to SAR’s and CCTools’, and thus they all present the same difficulties to associate performance 2. RELATED WORK data with provenance and domain-specific data. Therefore, we observe that existing approaches do provide valuable Related work is organized in two broad categories, systems that information for computer science experts to debug a code or to monitor the execution of workflows and tools for monitoring low understand the performance of a workflow execution or to follow a level information on performance and resource consumption. There scientific workflow execution in an HPC environment. However, are several SWfMS that provide monitoring and performance when using those existing solutions, users may miss important analysis mechanisms within their engines. ASKALON [13], Swift/T, opportunities to understand the behavior of data derivation based on Pegasus (kickstart tool [14]), Makeflow [15] and Chiron provide resource consumption information. When resource consumption data monitoring mechanisms for users to follow the execution of the are not related to domain data, it may be hard to find which specific workflow and to analyze its behavior. Swift/T and Pegasus provide data value is presenting an anomalous memory consumption or interfaces to follow the execution while Chiron allows for database execution behavior and act directly on this. In data-intensive queries to be submitted at runtime for workflow monitoring. Swift/T workflows this lack becomes really an issue. and Pegasus provide information about the amount of activations executed, the execution time of each activation and the resources used. Specifically, Pegasus SWfMS uses the Kickstart tool to do 3. THE PERFMETRICEVAL COMPONENT performance analyses that can also be associated to provenance data. Makeflow is a workflow approach that enables performance TAU is a tool that supports debugging and profiling of HPC monitoring and debugging on HPC environments, such as application scientific applications for computer experts, like instrumenting the elapsed time. Chiron provides information about the amount of application code to capture performance data and to present these activations, their execution times, and domain data in the same data using a graphical representation. However, users from the database, but it does not provide performance metrics neither application domain also need to analyze performance and resource resource consumption data. STAMPEDE [10] is a SWfMS- consumption together with the domain-specific data, as well as to be independent solution that provides a common model for workflow aware of all data transformations that have occurred in the workflow monitoring, however it does not consider domain-specific data. These parallel execution. In this section, we show how TAU, SAR and SWfMS solutions fail to combine domain data and workflow other tools may be integrated to the provenance database of a execution data with performance and resource consumption SWfMS. To integrate TAU, SAR and the provenance database, we information. have extended a W3C PROV-compliant provenance database schema with performance and resource consumption information. WorkWays [6] and FireWorks [17] enable users to monitor the status and the elapsed time of tasks, and correlate those data to domain- In this paper we consider metrics such as total elapsed time (that can specific data. They also display performance data related to memory be decomposed to identify bottlenecks related to the computational usage and I/O operations. Those performance data allow for simulation; for example, communication bottleneck), CPU usage, identifying performance bottlenecks in HPC environments, however, memory consumption and I/O, and transfer rates statistics to be they are not related to domain data. captured and stored in the provenance database. We decomposed the total workflow elapsed time (T_wf), which corresponds to the There are other approaches that provide detailed performance and workflow wall-clock time, into three different metrics: useful resource consumption information for applications (i.e. disconnected computing time (time needed for executing a specific activation – from the workflow concept). Tuning and Analysis Utilities (TAU) T_comp), communication time (time needed to perform [11] is a profiling tool that gathers performance information and communication between processes/machines – T_comm) and time visualize it on interactive graphs using ParaProf. TAU gathers taken to access the provenance database (T_prov), thus T_wf = performance information by instrumenting functions, methods, basic T_comp + T_comm + T_prov. blocks, and statements as well as event-based sampling. To use TAU in workflows, it is required to instrument both the applications and the workflow engine to collect performance data. Similar to TAU, 2 http://ccl.cse.nd.edu/software 60 WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah For CPU usage, we consider the cumulative runtime CPU usage of We use the well-known scientific workflow Montage [19], presented all machines involved in the execution of the workflow. The CPU in Figure 2, from the astronomy domain as the case study of usage can be decomposed into the percentage of CPU utilization that PerfMetricEval. A basic analysis assessment is the workflow occurred while executing the application (usr), the percentage of computing time in each machine. One simple SQL query can sum the CPU utilization that occurred while executing at the system level actual computing time, the time spent with communication and the (sys), the percentage of time that the CPU was idle during which the time needed for storing provenance for each activation and group by system had an outstanding disk I/O request (iowait) and the each used machine. Based on these queries, we register that the percentage of time that the CPU was idle and the system did not have experimental results refer to a workflow execution of 17 hours in a an outstanding disk I/O request (idle). SGI Altix ICE 8200 at NACAD/COPPE/UFRJ with four machines 2x Quad Core Intel Xeon X5355 2.66 GHz (32 cores). The Montage PerfMetricEval captures performance and resource consumption execution consumed 1,585 input file images, which produced 17,503 metrics using both TAU and SAR. Using TAU, we capture the activations that were executed in parallel. elapsed time of computing, communication, and provenance operations. However, since TAU does not provide memory, CPU and I/O statistics, we get those from SAR. The integration of Chiron with PerfMetricEval is based on inserting an invocation of PerfMetricEval component (after the execution of each activation) to gather fine-grained performance information from TAU and SAR, insert this data in the provenance database and then convert it into TAU files (profiles for each computing node, e.g. profile.0.x.0 for node x). The generated TAU files serve as input for TAU to plot, for instance, 3D graphs using ParaProf [11]. PerfMetricEval execution flow is presented in Figure 1. In https://github.com/hpcdb/PerfMetricEval, the component is available for download with explanations on how to configure a database and invoke Chiron with PerfMetricEval. files extracted with Provenance SAR and TAU SAR and TAU Database PerfMetricEval Profile files in TAU’s format Figure 2. Montage workflow Analyzing Figure 3 we can also state that only Machine0 loads provenance data in the database. It is due to the architectural PerfMetricEval characteristics of Chiron, where only the master node is responsible for storing provenance data in the database. This was an architectural choice for Chiron, since the slave nodes can process new activations Figure 1. The PerfMetricEval execution flow without being locked by the provenance management. We can also state that Machine0 is the one that presents the highest After the execution of each activation, Chiron invokes the communication overhead, because it integrates provenance data PerfMetricEval component that identifies the elapsed time of the storage and all machines send provenance data to it using messages, activation and invokes SAR to gather resource consumption increasing the communication cost for Machine0. information related to the corresponding activation. For each metric, it is created a new file and stored in the workflow workspace. By creating those files, PerfMetricEval is able to parse them, extract performance information, and asynchronously load it into the provenance database. After loading the performance and resource consumption data into the same database, the PerfMetricEval component provides a feature to query the data relevant to the user-defined parameters for performance and resource consumption analysis by executing standard SQL queries made by users. After querying the database and gathering the results, the PerfMetricEval provides a component to generate a file in TAU format to profile the execution. The TAU visualization tool (paraprof command) graphically displays the generated files. Since the used format is compatible with this code- profiling tool, it enables the creation of images as bar graphs, 3D meshes and scatter charts, all interactive and with customization Figure 3. The computing time of each machine capabilities inherent to TAU. This integration of PerfMetricEval and Chiron enables in-depth analysis graphs generated within seconds Another important analysis considers activity average resource and support decision-making by users at runtime. consumption per machine. Besides, users need to relate the resource 61 WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah consumption and the domain-specific data. Domain scientists Most SWfMS already provide some level of monitoring capabilities. commonly have a fairly good execution time estimate for a specific However, their monitoring mechanisms are limited to following the activation (based on their experience and previous executions, all amount of activations executed, the volume of data transferred, the registered in the provenance database). Using Chiron + average execution time of activities, etc. In this paper, we provided PerfMetricEval they are able to check if the real activation execution an important opportunity to understand the behavior of the data time or resource consumption meets the estimate. If the real derivation based on the performance and resource consumption execution time is considerably higher than the estimate, they are able metrics. When performance data and resource consumption data are to identify an anomalous behavior with the corresponding domain not related to domain data, users may not see that a certain data value files and parameter data of that particular anomaly. For example, it is is presenting an anomalous behavior. well known for Montage users that the image region of interest can impact the performance and resource consumption of workflow This paper proposes an approach that integrates provenance data, activations. Thus, users often need to analyze the behavior of a domain data, performance information and resource consumption specific subset of the input data. In this case, the image region of information in the same integrated database. To achieve this, we interest can be defined by setting the domain attributes CRPIX1 and introduce PerfMetricEval, a component for capturing performance CRPIX2 (which values are also loaded from data files to the and resource consumption data using specialized tools such as SAR provenance database) that represent the pixels that define the region and TAU. We integrated PerfMetricEval to Chiron SWMS. We of interest. In this small scale of Montage workflow execution with evaluated the present approach by monitoring the Montage workflow Chiron, we observed a generation of 10,647 files. Finding which file and performing analytical queries that mix different types of data, has the region of interest with the anomalous behavior is very error thus leaving room for domain specialists and code developers to fine prone when the performance data is separated from the workflow tune activities such as investigating input and output data for that execution data and domain data. With the data integration, one query particular activity execution when its execution is taking too long. can retrieve the average memory consumption, the average memory Using the Montage workflow, we also noticed that the overhead is used, the average CPU usage, the average CPU usage for one negligible when compared to the total time needed to execute the operational system, the average amount of disk blocks read/write, etc. workflow without PerfMetricEval. In this experiment, it was around All these data are associated with the domain attributes CRPIX1 and 0.6% of the total workflow execution. Although the results are CRPIX2 and their corresponding FITS file ids. With the result of this promising, we still have to evaluate Chiron + PerfMetricEval in large query, the domain user can monitor the performance of activations scale scientific experiments. Despite our component has been only for building the mosaic (Create Mosaic), limited to only one specific integrated to Chiron SWfMS, we intend to adapt/integrate it to other image region to check if there is an anomaly in the execution or on SWfMS in the near future. The only restriction is that the SWfMS the data file contents. The data extracted from these queries allowed needs to store provenance in a database, like Pegasus and Swift/T do. us to generate several TAU graphs. In Figure 4 we see the CPU usage per machine when executing activations where 100 < CRPIX1 < 150 and 50 < CRPIX2 < 80. Since Chiron considers the CPU usage on its 5. ACKNOWLEDGMENTS scheduling algorithm, we can state that there is a load balancing Authors would like to thank CNPq, FAPERJ, HPC4E (EU H2020 among the machines from the CPU usage perspective, i.e., all Programme and MCTI/RNP-Brazil, grant no. 689772), and Intel for machines present an equivalent CPU use, considering the metrics idle partially funding this work. This research made use of Montage, (in light blue), iowait (dark blue), sys (green), and user (red). which is funded by the National Science Foundation (NSF). However, the same behavior is not found when we consider both Leonardo Neves is currently at Language Technologies Institute, memory and disk usage statistics. Carnegie Mellon University, Pittsburgh, PA, USA. 6. REFERENCES [1] I. J. Taylor, E. Deelman, D. B. Gannon, and M. Shields, Workflows for e-Science: Scientific Workflows for Grids, 1st ed. Springer, 2007. [2] E. Walker and C. Guiang, “Challenges in executing large parameter sweep studies across widely distributed computing environments,” in Workshop on Challenges of large applications in distributed environments, Monterey, California, USA, 2007, pp. 11–18. [3] J. M. Wozniak, T. G. Armstrong, M. Wilde, D. S. Katz, E. Lusk, and I. T. Foster, “Swift/T: Large-Scale Application Composition via Distributed-Memory Dataflow Processing,” in CCGrid, 2013, pp. 95–102. Figure 4. The CPU statistics for activations executed [4] E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan, P. J. Maechling, R. Mayani, W. Chen, R. Ferreira, M. Livny, and K. 4. CONCLUSIONS AND FINAL REMARKS Wenger, “Pegasus, a workflow management system for science Performing analytical queries in workflows in distributed automation,” FGCS, vol. 46, pp. 17–35, 2015. environments is an open, yet important, issue. It is fundamental to [5] E. Ogasawara, J. Dias, V. Silva, F. Chirigati, D. Oliveira, F. follow the status of the workflow execution, especially when they Porto, P. Valduriez, and M. Mattoso, “Chiron: A Parallel execute for weeks or even months. To be aware of the bottlenecks, resource consumption, and other performance issues is essential. 62 WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah Engine for Algebraic Scientific Workflows,” CCPE, vol. 25, no. [13] J. Dias, G. Guerra, F. Rochinha, A. L. G. A. Coutinho, P. 16, pp. 2327–2341, 2013. Valduriez, and M. Mattoso, “Data-centric iteration in dynamic workflows,” FGCS, vol. 46, pp. 114–126, 2015. [6] H. A. Nguyen, D. Abramson, T. Kipouros, A. Janke, and G. Galloway, “WorkWays: interacting with scientific workflows,” [14] R. Prodan, S. Ostermann, and K. Plankensteiner, “Performance CCPE, vol. 27, no. 16, pp. 4377–4397, Nov. 2015. analysis of grid applications in the ASKALON environment,” in 2009 10th IEEE/ACM International Conference on Grid [7] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Computing, 2009, pp. 97–104. Stoica, “Spark: cluster computing with working sets,” in USENIX conference on Hot topics in cloud computing, Boston, [15] J. S. Vockler, G. Mehta, Y. Zhao, E. Deelman, and M. Wilde, 2010, pp. 10–17. “Kickstarting remote applications,” in International Workshop on Grid Computing Environments, 2007. [8] B. Balis, M. Bubak, and B. Łabno, “Monitoring of Grid scientific workflows,” Scientific Programming, vol. 16, no. 2–3, [16] M. Albrecht, P. Donnelly, P. Bui, and D. Thain, “Makeflow: a pp. 205–216, 2008. portable abstraction for data intensive computing on clusters, clouds, and grids,” in Proceedings of the 1st ACM SIGMOD [9] M. Mattoso, J. Dias, K. A. C. S. Ocaña, E. Ogasawara, F. Costa, Workshop on Scalable Workflow Execution Engines and F. Horta, V. Silva, and D. de Oliveira, “Dynamic steering of Technologies, 2012, p. 1. HPC scientific workflows: A survey,” FGCS, vol. 46, pp. 100– 113, May 2015. [17] A. Jain, S. P. Ong, W. Chen, B. Medasani, X. Qu, M. Kocher, M. Brafman, G. Petretto, G.-M. Rignane, G. Hautier, D. Gunter, [10] D. Gunter, E. Deelman, T. Samak, C. H. Brooks, M. Goode, G. and K. A. Persson, “FireWorks: a dynamic workflow system Juve, G. Mehta, P. Moraes, F. Silva, M. Swany, and K. Vahi, designed for high-throughput applications,” CCPE, vol. 27, no. “Online workflow management and performance analysis with 17, pp. 5037–5059, 2015. Stampede,” in CNSM, 2011, pp. 1–10. [18] Monitoring with Ganglia, 1 edition. Sebastopol, CA: O’Reilly [11] S. S. Shende, “The TAU Parallel Performance System,” Media, 2012. International Journal of High Performance Computing Applications, vol. 20, no. 2, pp. 287–311, May 2006. [19] J.C. Jacob, D. S. Katz, G. B. Berriman, J. C. Good, A. C. Laity, E. Deelman, C. Kesselman, G. Singh, M.-H. Su, T. A. Prince, [12] V. Silva, D. de Oliveira, P. Valduriez, and M. Mattoso, and R. Williams, “Montage: a grid portal and software toolkit “Analyzing related raw data files through dataflows,” CCPE, for science-grade astronomical image mosaicking,” IJCSE, vol. vol. 28, no. 8, pp. 2528–2545, 2016. 4, no. 2, pp. 73–87, 2009. 63