Mathematical and Information Technologies, MIT-2016 — Information technologies LiFlow: A Workflow Automation System for Reproducible Simulation Studies Evgheniy Kuklin1,2,3 , Andrey Sozykin1,2,3 , Konstantin Ushenin2,3 , and Dmitriy Byordov2,3 1 Krasovskii Institute of Mathematics and Mechanics, Ekaterinburg, Russia 2 Institute of Immunology and Physiology UrB RAS, Ekaterinburg, Russia 3 Ural Federal University, Ekaterinburg, Russia key@imm.uran.ru http://www.ipgg.sbras.ru Abstract. Simulation of living systems often requires numerous compu- tational experiments on the same model for different parameter values. This paper describes the design of a user-friendly workflow automation system LiFlow for simulation of living systems, which is capable of con- ducting such a large series of computational experiments on supercom- puters. The system provides a convenient interface for preparing input experimental data, executing the experiments on a supercomputer, and storing experimental results in a storage system. A distinctive feature of LiFlow is its simplicity and usability—the system is intended to be used by researchers in mathematical biology and biophysics without extensive knowledge in parallel computing. The paper provides examples of the use of the LiFlow system for simulation of the human heart left ventricle. Keywords: parallel computing systems, supercomputers, living system simulation, computational workflow, computational experiment repro- ducibility. 1 Introduction Simulation of living systems requires significant computational resources. Such investigations are certain to be rather time-consuming and, thus, are hard to be conducted in a reasonable time without parallel computing systems and su- percomputers. However, the use of parallel computing systems requires a high degree of qualification in computer science, which many researchers involved in living systems modeling do not possess or want. Moreover, the data prepara- tion for computational experiments is routine and time-consuming. The user needs to copy data to a supercomputer, compile the source code if necessary, enqueue the jobs with the supercomputer resource manager, and keep track of their completion. Such routine tasks should be automated. Living system simulation often demands a large number of computational experiments on the same model but with varying parameter values. Nowadays, researchers have to prepare the configuration, input data, and the desired pa- rameter values, and then separately execute the simulation software for each 208 Mathematical and Information Technologies, MIT-2016 — Information technologies experiment. With the number of required computational experiments increasing up to hundreds or even thousands, which is typical for living systems simulation, manual preparation and execution of experiments becomes very labor-intensive and often nearly impossible. In such a case, scientists often execute only a frac- tion of the required computational experiments, which negatively affects the research results, hence the need to automate the execution of series of computa- tional experiments with varying parameter values. Another problem that researchers in simulation of living system often face is non-reproducibility of computational experiments. This problem is directly related to the large number of computational experiments that scientists have to carry out in order to obtain meaningful results. Due to pressure for publish- ing, scientists devote little time to keeping the records of experimental details, especially in case of hundreds or thousands of experiments. In addition, many other factors can affect computational results, such as a change in the version of the compiler or a required library on the supercomputer. Automated record- ing of experimental details and storage of simulation results can help to ensure reproducibility of the computational experiments. We developed LiFlow (LIving system simulation workFLOW), a workflow system that addresses this need for automation. LiFlow provides the scientists with a convenient graphical user interface (GIU) that allows to prepare and execute a series of computational experiments on a parallel computing system with a single click. One of the important goals of creating the LiFlow system was to make the initial learning process of the workflow tool very simple. Otherwise, busy scien- tists will not invest their time in studying the capabilities of the new system, and it will be useless. The LiFlow system is primarily intended for simulation of living systems; we provide some examples of using LiFlow to simulate the human heart left ventri- cle. However, LiFlow could also be used in other areas that require conducting a large number of computational experiments on parallel computing systems. 2 Related Work To bridge the gap between researchers and software engineers and reduce exper- iment preparation time, scientific computation workflow systems [1] are being developed. The most frequently used among them are Taverna, Kepler, and Triana. Taverna [2] is an open source workflow system particularly focused on bioinformatic applications and services; it is based on the XScufl language. Ke- pler [3] is a scientific workflow system that builds on the PtolemyII system, which is a visual modeling tool written in Java. Triana [4] is a GUI-based work- flow system for coordinating and executing a collection of services. All these tools have some visual interfaces that allow graphical composition of operations. The systems provide the ability to integrate distributed computing resources, applications, data sets, and tools for computational experiments. In addition, the systems hide the complexity of distributed computing systems from users, 209 Mathematical and Information Technologies, MIT-2016 — Information technologies enabling them to describe the workflow graphically. The existing systems for scientific computing workflows are able to use the computing resources of vari- ous types (GRID, supercomputers, distributed systems, etc.), data stores (local, network, cloud), and tools (visualization, statistical processing, etc.); they also include provenance tracking, either as an integral part or as an optional module. As a result, such systems are very complicated and difficult to install, maintain, and use. Their main disadvantage is the fact that creating a new component can require considerable efforts and a detailed knowledge of the workflow sys- tem architecture. However, when simplified workflows are sufficient, there is no need for unwieldy options with a lot of settings. On the contrary, computational experiments should preferably launch “in one click.” Another possible solution to the problem is to use an environment that pro- vides the integration of application software packages with supercomputers. An example of such system is DiVTB [5], which provides user-friendly graphical in- terface where parameters of a computational experiment can be specified, and the experiment can then be executed on a supercomputer. However, such sys- tems do not provide automation of the tasks that are popular in living systems modeling—such as launching of a series of computational experiments with the same model but varying parameter values; also, they do not support metadata tracking. To solve the reproducibility problem special software tools can be used. They provide the ability to automatically capture and store for future use all the environment of a computational experiment, such as the simulation software, the input and output data, the hardware and software configuration of the computing system, etc. There are two basic methods used by the reproducibility improvement tools. One method is based on executing experiments in a virtual environment, such as virtual machines or cloud [6]. After an experiment completes, the snapshot of the virtual machine is saved together with the simulation software, the output data, the experimental log, and so on. Furthermore, the snapshot can be made publicly available; other scientists can use it to reproduce the experiment and cite in their papers. Unfortunately, this approach is not suitable for parallel computing systems because virtualization considerably reduces the performance of such systems. In addition, such approach will require capturing the snapshots of all nodes in the cluster that were used for running the experiment, which is not feasible. The second method is based on capturing the snapshot not of the entire virtual machine but of the simulation software executable and the output data. This approach is used in the CDE system (Code, Data, and Environment pack- aging) [7]. However, a package prepared by the CDE system depends on the software configuration of the computational system. Although the configuration of a personal computer or a virtual machine is relatively easy to replicate, it can be very difficult to adjust the configuration of a parallel computing system. Most of such systems are shared among a great number of users; only qualified 210 Mathematical and Information Technologies, MIT-2016 — Information technologies administrators can install or configure the software. Hence, such approach is also not suitable for parallel computing systems. In order to ensure the reproducibility of computational experiments, we aimed at integrating LiFlow with Sumatra [8], which is an open source tool to support reproducible computational research. The Sumatra system [8] aims to capture the information required to recreate the computational experiment environment instead of capturing the experimental context itself. Sumatra uses the source code of the program instead of the binaries, stores the logs of the com- pilation process, and saves the information about all dependencies and general operating system configuration. Furthermore, Sumatra provides the ability to store the output data for future use in a database. In addition, Sumatra allows to index and search the data about experiments carried out, including addi- tional information provided by scientists. For example, if experimental data was published, scientists can add tags with the name of the paper (and, perhaps, ad- ditional information such as the figure or table with the data) to the experiment record in the catalog. This allows researchers to quickly find the information re- quired to reproduce the experiment they are interested in among a large number of experiment records. Unfortunately, Sumatra lacks a convenient desktop user interface. Although Sumatra is a standalone project, it can be used as a library for third-party development and has its own API. LiFlow can use Sumatra for capturing and storing the information of previously conducted experiments in the database. 3 LiFlow system 3.1 Workflow Workflow in the LiFlow system corresponds the one shown in Fig. 1. During the first stage, researchers prepare the description of the so-called experiment series, which is a set of experiments with the same model and varying parameter values. The preparation includes the selection of simulation software that will implement the required model, generation of the configuration files with the required parameters, and creation of the input data files for each experiment. Next, the experiments are launched on a parallel computing system. Fig. 1. LiFlow system workflow When the experiments are completed, the obtained results are automatically stored in the archive in a form ready for processing (visualization, statistical 211 Mathematical and Information Technologies, MIT-2016 — Information technologies processing, etc.). Thus, a user is only required to create a description of the experiment series, all the rest is done automatically. In addition, the user is able to process the results of experiments from the archive manually using third-party tools. 3.2 Computational Package Similarly to the CDE system, LiFlow uses the concept of a computational pack- age that contains all the information required to execute a series of experiments. The LiFLow computational package consists of the following components: – Source code of the simulation software, which can be loaded from a code repository of a version control system, such as Git. – Generator of experiment series that describes how to generate the desired parameter values for the experiment series. – Initial data and parameters to launch the simulation software. A distinctive feature of the LiFlow computational package is that it describes not one experiment but a whole series of experiments. Each experiment in the series uses the same simulation software but different values of the model param- eters. The parameter values for every experiment are produced by the generators, which are a part of the software package that is based on the rules specified by the user. 3.3 LiFlow Architecture The LiFlow system consists of the four main components (Fig. 2). The Com- putational Package Preparation Tool and the Experiment Execution GUI are installed on the researcher’s personal computer, while the Experiment Execu- tion Engine and the Parallel Computing System Adapter are deployed to the parallel computing system. A user creates a computational package with the help of the Computational Package Preparation Tool and uses the Experiment Execution GUI to transmit the package to the desired parallel computing system and run the experiment series. Experiment Execution Engine on the computational cluster receives the package, compiles the source code of the simulation software, and executes the generator of the experiment series to produce a set of input data files for simula- tion software with various parameter values. Next, the set of computational jobs is generated with the same simulation software but different input files. The jobs are queued on the computational cluster using the Parallel Computing System Adapter, which interacts with the resource manager of the cluster. Once the job is completed, the results of the experiment are automatically recorded to the Experiment Archive on the storage system. After all the jobs in the experiment series are completed, the Experiment Execution Engine sends an email with the report on the experiments’ execution to the user. The planned Sumatra module would be able to capture the environment of the computational experiment and store it in the Experiment Catalog in order 212 Mathematical and Information Technologies, MIT-2016 — Information technologies Experiment Execution LiFlow System GUI Computational Package Experiment Experiment Catalog Preparation Tool Execution Engine Parallel Computing Sumatra System Adapter Source Code Experiment Archive Repository Parallel Computing System Fig. 2. LiFlow system architecture to share the initial data and simulation results among the researchers and to improve the experiments’ reproducibility. 4 Technical Details The first stage of the LiFlow system implementation has been currently com- pleted. The computational package in the implementation is represented by a directory in a file system that contains the subdirectories with the following components: the source code of the simulation software, the generator of the experiment series, the initial data for the generator, and the script for executing the experiments. In the current implementation, the generator is a script that creates a series of experiments by varying the parameters in the configuration file of the simulation software. The LiFlow system supports two options for specifying the parameter values: – The range of the parameter: an initial value, a final value, and an increment. One record in a configuration file of the generator produces the input data for several experiments. – The explicit parameter values declaration. The parameter values must be specified for each experiment in the series. The prepared computational package is transferred to the parallel computing system using the SSH or SFTP protocols. Next, the source code of the simulation software is built on the computational cluster. If the build process fails, LiFlow warns the user and sends back to him the build log file. In the case of a successful compilation, the system runs the generator of the experiment series to produce the input data for the experiments. 213 Mathematical and Information Technologies, MIT-2016 — Information technologies Currently, only one version of the Parallel Computing System Adapter is implemented, which is based on the SLURM Workload Manager [9]. The ex- periment startup script from the computational package enqueues the generated tasks for the series of experiments in the SLURM job queue. After the job is complete, the LiFlow system copies the output data to the Experiment Archive using the NFS protocol. The scripts in the LiFlow system are written in Python. The storage of simulation software source codes is implemented as a Git repository provided by a third-party service. Fig. 3. LiFlow system GUI Users are provided with a simple graphical interface, which allows one to execute a series of experiments on a parallel computing system in one click (Fig. 3). The user needs to select the parallel computing system to perform the computation, specify the credentials (login and password), the path to the folder with the computational package, and the email address (for job completion notifications). When the user clicks the Launch button, the LiFlow system starts the workflow process. The text output shows the current stage of the process of setting up the experiment and, if an error occurs, specifies where did it happen. Fig. 3 demonstrates an example of a successfully submitted experiment. The LiFlow GUI is also written in Python using the PyQt4 library and is designed to work both on Windows and Linux. 214 Mathematical and Information Technologies, MIT-2016 — Information technologies A disadvantage of the current LiFlow implementation is the lack of a failover mechanism. If an error occurs, the experiment will not be repeated. This ap- proach is chosen because the failure can be caused not only by problems with hardware or system software, but also, more frequently, by an error in the simu- lation software or a wrong combination of parameters, for which the computation cannot be performed. In such a case, restarting the experiment will not lead to solving the problem, it will only unnecessarily load the computational cluster. Still, a failure in carrying out one experiment does not lead to termination of the entire experiment series. 5 Using LiFlow for Heart Simulation Nowadays, the LiFLow system is integrated with the URAN supercomputer at the Krasovskii Institute of Mathematics and Mechanics and the computational cluster of the Ural Federal University. The system had been used on these clusters to execute several experiments in simulation of human heart left ventricle (LV) using the LeVen simulation system [10]. The study of the influence of the fiber direction in the LV anatomical model on the speed and consistency of its electrophysiological activation was performed using the LiFLow system [11]. A series of 55 experiments was performed, where two parameters, corresponding to the direction of the fiber course in electrophys- iological models, were varied. The same system can be used to reproduce the results of the research manu- ally conducted before. Two series of experiments were performed in the investi- gation of the excitation speed of the LV myocardial tissue by using an anatomical model that allows to change the shape of the ventricle and the direction of the fiber course in it [12]. In one of the series of experiments, the area of the initial activation, the fiber direction of the anatomical model, and the ratio of coeffi- cients in the diffusion tensor of the electrophysiological model were varied. In total, the work was based on more than 36 experiments with the parameter values generated by certain rules. The paper [13] describes the research in the dynamics of the spiral waves in the LV of the human heart model with different geometry and direction of the fiber course. In the research, several series of experiments with the anatomy that approximate normal and pathological anatomy of the LV were carried out. In each series of experiments, the following parameters were varied: the thickness of the top, the value of the diffusion tensor, and the place of the initial start-up wave. In total, the work was based on more than 84 experiments. 6 Discussion The users of the LiFlow system, researchers in mathematical biology and bio- physics from the Institute of Immunology and Physiology UrB RAS, provided a generally positive feedback. Before, they needed approximately 30 minutes to 215 Mathematical and Information Technologies, MIT-2016 — Information technologies manually prepare and run one computational experiment on a parallel comput- ing system. With the help of the LiFLow system, they could execute a series of dozens or hundreds of experiments in less than one hour. The users appreci- ated the convenience of the LiFlow GUI and the ability to obtain the results of simulation from the storage system. As a result, they do not have to deal with the Linux operating system on the computational cluster, which is unfamiliar to them. Overall, LiFlow helped the researchers from the Institute of Immunology and Physiology UrB RAS to conduct computational experiments more efficiently. As opposed to the popular scientific computation workflow systems such as Taverna, Kepler, and Triana, which provide for building large, complex com- putational workflows, the LiFLow system provides only one simple workflow. However, this limitation provides an opportunity to make the LiFlow system extremely easy to use. The complicated computational workflow systems are especially useful in domains with standardized data formats and tools, such as bioinformatics (the Taverna system is specifically targeted at bioinformatic applications). However, adding new components into such workflow systems is rather difficult. In contrast, the LiFlow system is more suitable for researchers who write the simulation code by themselves. Unfortunately, due to the beta version of our project, we will be able to publish the source code later. 7 Conclusion and Future Work The paper presents the LiFlow computational workflow system intended to au- tomate the processing of a large number of computational experiments for living systems simulation on parallel clusters. Distinctive features of LiFlow are the automatic generation of the input data and parameters for carrying out exper- iment series. The system has been used for simulation of the human heart left ventricle. The use of LiFlow can significantly reduce the preparation time of a series of experiments, as well as make processing of their results more convenient. Directions for future work include: – Full integration with the Sumatra tool to ensure the reproducibility of launched experiments. – Developing the mechanisms of secure integration of several computational clusters from different organizations with a single LiFLow instance in order to share computational resources and simulation results. – Implementing the Parallel Computing System Adapters for cluster resource managers other than SLURM, as well as for cloud. – Creating more advanced and flexible generators of experiment series inte- grated with GUI. Acknowledgments. The work is supported by the RAS Presidium grant I.33P “Fundamental problems of mathematical modeling”, project no. 0401-2015-0025, and by the Research Program of Ural Branch of RAS, project no. 15-7-1-26. Our study was performed using the Uran supercomputer of the Krasovskii Institute 216 Mathematical and Information Technologies, MIT-2016 — Information technologies of Mathematics and Mechanics and computational cluster of the Ural Federal University. References 1. Talia, D. Workflow Systems for Science: Concepts and Tools. In: ISRN Soft Eng. 2013, 15 pages (2013). doi: 10.1155/2013/404525. 2. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T. Taverna: A tool for building and running workflows of services. In: Nucleic Acids Res. 34(Web Server issue): pp. 729-732 (2006). 3. Ludscher, B., Altintas, I., Berkley, Ch., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y. Scientific workflow management and the Kepler system: Research Articles. In: Concurrency and Computation: Practice and Experience - Workflow in Grid Systems. Vol. 18, No. 10. pp. 1039–1065 (2006). 4. Taylor, I., Shields, M., Wang, I., Harrison, A. Visual Grid Workflow in Triana. In: Journal of Grid Computing. Vol. 3, No. 3, pp. 153–169 (2005). 5. Savchenko, D.I., Radchenko, G.I. DiVTB Server: sreda vypolneniya virtual’nykh eksperi-mentov [DiVTB Server: an environment for virtual experiments executions]. In: Parallel’nye vychislitel’nye tekhnologii (PaVT’2013): trudy mezhduna-rodnoy nauchnoy konferentsii (1-5 April 2013, Chelyabinsk) [Parallel Computational Tech- nologies (PCT’2010): Proceedings of the International Scientific Conference (1-5 April 2013, Chelyabinsk, Russia)] Chelyabinsk, Publishing of the South Ural State University. pp. 532–539 (2013). 6. Howe, B. Virtual Appliances, Cloud Computing, and Reproducible Research. In: Computing in Science & Engineering, vol. 14, no. 4, pp. 36–41 (July-Aug 2012). 7. Guo, P. CDE: A Tool for Creating Portable Experimental Software Packages. In: Computing in Science & Engineering, vol. 14, no. 4, pp. 32–35 (July-Aug 2012). 8. Davison, A.P.: Sumatra: a toolkit for reproducible research. In: Implementing re- producible research, Stodden, V., Leisch, F., Peng, R.D., pp. 57–78, CRC Press (2014). 9. Jette, M.A., Yoo, A.B., Grondona, M. SLURM: Simple Linux Utility for Resource Management. In: Lecture Notes in Computer Science: Proceedings of Job Scheduling Strategies for Parallel Processing (JSSPP). Vol 2862. pp.44–60 (2003). 10. Sozykin, A., Pravdin, S., Koshelev, A., Zverev, V., Ushenin, K. and Solovyova O.: LeVen – a parallel system for simulation of the heart left ventricle. In: 9th Interna- tional Conference on Application of Information and Communication Technologies, AICT 2015 Proceedings, pp. 249–252 (2015). 11. Ushenin, K, Byordov, D. An HPC-Based Approach to Study Living System Com- putational Model Parameter Dependency. In: Proceedings of the 1st Ural Workshop on Parallel, Distributed, and Cloud Computing for Young Scientists Yekaterinburg, Russia. CEUR Workshop Proceedings Vol. 1513, pp. 67-74 (2015). 12. Pravdin, S.F., Dierckx, H., Katsnelson, L.B., Solovyova, O., Markhasin, V.S., Pan- filov, A.V. Electrical Wave Propagation in an Anisotropic Model of the Left Ventri- cle Based on Analytical Description of Cardiac Architecture. In: PLoS ONE 9(5): e93617 (2014). doi:10.1371/journal.pone.0093617 13. Pravdin, S., Dierckx, H, Markhasin, V.S., and Panfilov, A.V. Drift of Scroll Wave Filaments in an Anisotropic Model of the Left Ventricle of the Human Heart. In: BioMed Research International, vol. 2015, Article ID 389830, 13 pages (2015). doi:10.1155/2015/389830. 217