Principles of Computing Resources Planning in Cloud-Based Problem Solving Environment Kirill Borodulin and Gleb Radchenko South Ural State University, Chelyabinsk, Russia borodulinkv@susu.ru Abstract. “Problem-solving environments” recently became a widely accepted approach to providing computational resources to solve com- plex eScience problems. This approach represents a problem as a work- flow, orchestrating a set of various computational services. The existing cloud computing resources planning methods do not take into account a relation between such services, problem domain specifics, predicted work- flow execution timespan, etc. On the other hand, usage of cloud system provides efficient HPC resources usage, distributing tasks on the most suitable resources. Therefore, we need to develop algorithms that pro- vide efficient cloud system resources usage and take into account domain- specific information of the problem. Keywords: cloud · problem-solving environment · workflow · cloud planning 1 Introduction Nowadays cloud systems become the primary provider to solve problems in physics, biological, and social research. These problems are characterized by structural complexity which causes different resources (informational, software or hardware) to be integrated into a single solution. The problem often can be represented as a workflow, orchestrating a set of actions represent computational parts of the whole problem. Each action can require a different amount of computing resources and a specific execution environment to work. Comparing to “classical” approach to HPC resources provision, cloud computing systems provide more efficient usage of computational resources, providing each task a unique execution environment and mapping them to the most suitable resources. Also, cloud system can provide a user-friendly web interface for computational research tasks submission. To provide scientists and engineers a transparent access to the computing resources a Problem Solving Environment (PSE) concept is commonly used. A PSE is a software solution that warps and provides a problem-oriented access to computational resources to solve a specific class e-Science problems. PSE uses the language of the target problem domain, and users do not need to have a specialized knowledge of the underlying hardware or software to submit and get a solution of their computational problem [1]. Planning in Cloud-Based PSE 23 A PSE problem domain consists of a finite set of task classes. Each task class is a set of tasks that have the same semantics and the same set of input parameters and output data. This limits the class of problems that can be solved using the PSE. On the other hand, this would allow using a domain-specific information of task for selecting a set of computing resources and planning the task’s execution in the cloud system. The aim of the paper is to describe the principles of computing resources planning in Cloud-based Problem Solving Environment. It’s required to solve the following tasks for gain the aim of paper: 1. Analyze related solutions for the planning of execution of problem-solving workflow’s. 2. Define a structure of cloud system for problem-solving environment’s deploy- ment. 3. Describe a scheme of an approach for the computing resources planning in Cloud-based problem-solving environment. The paper is organized as follows. In section 2 we present the review of the sys- tems provide workflow’s execution and methods used to support the resources scheduling for workflow execution. In section 3 we describe the main compo- nents of a cloud-based PSE. In section 4 we provide the scheme of an algorithm for the planning of computing resources in the Cloud-based problem-solving en- vironment. In section 5 we summarize the results of our research and provide further research directions. 2 Scheduling Methods in Workflow Systems A lot of scientific groups in e-Science and engineering fields use workflow plat- forms to provide scientific computations. Authors of the papers [3–5] describe an analysis of modern (Scientific Workflow Management Systems (SWMS) that support e-Science approach, such as Pegasus [6], Kepler [7], Taverna [8], Galaxy [9]. The Pegasus system [6] is widely used to solve problems in such fields as astronomy, bioinformatics, climate simulation, etc. the System contains 4 com- ponents: Planner (Workflow Mapper), Workflow Manager, Task dispatcher (Job Scheduler), Monitoring system (Monitoring Component). Workflow is presented as DAX-file (XML-representation of directed acyclic graph). Pegasus uses HT- Condor [11] task scheduler. This scheduler provides computing resources man- agement in cluster computing systems, GRIDs [12], and cloud computing systems [13]. Authors of paper [14] present the cloud-based Code Execution Framework. The paper describes an architecture of the framework to run a problem-solving environment in the cloud system, like Amazon EC2 cloud or OpenStack. These systems can use numerous optimization algorithms to provide scheduling the resources in the computing system, including Grid or cloud system. The algo- rithms like Improved Differential Evolutionary Algorithm combined with the Taguchi method, Multi-Objective Evolutionary Algorithm based on NSGA-II, 24 Kirill Borodulin and Gleb Radchenko Case Library and Pareto Solution based hybrid GA Particle Swarm Optimiza- tion, Auction-Based Biobjective Scheduling Strategy [15] have the main draw- back that they do not use information about previous executions. Problem-oriented scheduling (POS) algorithm, that takes into account both specifics of the problem-oriented jobs and multi-core structure of the computing system nodes is proposed in [16]. The algorithm allows one to schedule execution of one task on several processor cores with regard to constraints on the scalability of the task. The paper [17] describes the scheduling algorithm PO-HEFT for a workflow- based problem-solving environment, which would effectively use a domain- specific information (such as task execution time, scalability limits, and the amount of data transfer) for prediction of cloud computing environment re- sources load. The paper also presents a model for cloud problem-solving en- vironment. These algorithms are designed for scheduling the workflow actions’ execution - when the task has to run to provide a minimal makespan, for exam- ple. The reviewed algorithms don’t select the set of computing resources required for task’s execution and don’t distribute the services on the computing nodes to utilize the maximum of computing resources. 3 Model of Cloud-Based Problem Solving Environment We can define the following main components of cloud-based problem-solving environment: 1. Service is an entity that implements functionality for particular computa- tional action. Service provides transparent access to its functionality and interacts with another system’s components by an open protocol (HTTP- based or binary). Service is implemented as a virtual machine file, including a base image and applied software for particular computation action imple- mentation. 2. Base image - is a basic image of a computational environment (a Virtual machine) that includes a set of system software tools (“middleware”). The following elements are included in the common middleware set of the base image: – execution agent (provides common interface for submission, execution, live cycle management of tasks); – agent of monitoring system; – storage client. Virtual machine is an isolated computational environment that provides a limited set of resources for service’s execution. Virtual machines work on computing nodes of a computing cluster. Every computing node of a cluster runs an agent of virtualization platform that provides execution for virtual machines and cloud system’s agent for virtual machines management. Sev- eral virtual machines can share resources of one computing node, such as main memory, computing resources, a bandwidth of the network adapter, Planning in Cloud-Based PSE 25 local storage of the computing node. We can define a set of performance characteristics for every virtual machine [17]. 3. Workflow executor is a cloud computing system service that provides ser- vices orchestration, executing computational actions by workflow planners request. 4. Workflow planner - generates a list of services with the computing resources are required for service’s execution and send the list to the Workflow Execu- tor. 5. Workflow predictor - predicts task’s execution parameters for all tasks in the job on the basis of the domain-specific parameters. 6. Cloud monitoring system - monitors the execution of task and computing resources’ consumption. After a task is finished the Cloud monitoring system sends the statistics about consumption of resources during task’s execution with the specified domain-specific arguments of tasks for further improve- ment in task’s prediction. 4 Workflow-Based Resources Planning in Cloud-Based PSE We can define four layers in the proposed workflow planning approach consists (see Figure 1): the workflow layer, the service layer, the virtual machine layer and the computing nodes layer. The input parameters for the workflow planning wold be abstract problem-solving workflow, the set of the domain-specific arguments’ values, paths for input and result files, and quality-of-service (QoS) description. The following agreements can be defined in QoS: – workflow execution time; – maximum amount of computing resources; – cost of calculation. 4.1 Workflow Planning Layer The workflow layer implements the transformation of the abstract workflow into the executable job. The data sources of the input parameters are being connected with the certain tasks and sub-flows of the workflow during the transformation. The abstract workflow [3] is executable if input arguments of any task are inde- pendent of the result of another task’s execution. In other words, the arguments for workflow control nodes (such as decision and fork none) may come only before workflow execution, the influence of the intermediate arguments are prohibited. 4.2 Service Planning Layer The Service layer provides assignment of particular services to the required com- puting resources for any task in the workflow. The Service layer performs the assignment on the basis of task executions prediction by Workflow predictor. The Workflow predictor sends to the Workflow planner the following prediction information: 26 Kirill Borodulin and Gleb Radchenko Fig. 1. The layers of the workflow planning approach – time of the task execution (on the 1 computing core); – the amount of main memory, needed for the task execution; – maximum task scaling (how much cores can be provided to the task); – the amount of the result data; – prediction accuracy for each value. 4.3 Virtual Machine Planning Layer The Virtual machine layer selects type (“instance”) for each service from the existing types in the cloud computing system. This layer performs the instances selection, using the prediction of computing resources required for the certain task execution, but (1) Workflow executor can allocate another set of resources for QoS satisfaction, (2) if the prediction accuracy is low, i.e. most likely predic- tion if false, then executor choose the type of virtual machine which is default for the certain service. 4.4 Computing Node Planning Layer The Computing node planning layer maps virtual machines’ onto computing nodes on the basis of a virtual machine computing resources and a volume of Planning in Cloud-Based PSE 27 node’s local storage. At this layer, planner tends to place related virtual machine from the workflow (which on this layer are presented as Task-to-VM list [18]) on the same node to reduce the amount of data are transferred between the computing nodes. 5 Conclusion In this paper, we provide a description of an approach for computing resources planning for workflows execution in the Cloud-based problem-solving environ- ment. We present the components of the Cloud-based problem-solving environ- ment. The scheme of workflow planning approach consists of 4 layers: workflow layer, service layer, virtual machines layer, computing nodes layer. Currently, we implement the problem-solving algorithm for planning workflows in the Cloud- based problem-solving environment. For the validation and evaluation of the planning algorithm, it is planned to implement a model of problem-based Cloud platform in the simulation environment, as well as on the basis of real HPC system. Acknowledgements. The reported paper was supported by RFBR, research project No. 15-29-07959. References 1. Kobashi, H., Kawata, S., Manabe, Y., Matsumoto, M., Usami, H., Barada, D.: PSE park: Framework for problem solving environments. J. Converg. Inf. Technol. 5, 225239 (2010). 2. Fox, G.C., Gannon, D.: Special issue: Workflow in grid systems. Concurr. Comput. Pract. Exp., 18, 1009-1019 (2006). 3. Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: An overview of workflow system features and capabilities. Futur. Gener. Comput. Syst. 25, 528–540 (2009). 4. Taylor, I., Deelman, E., Gannon, D., Shields, M.S.: Workflows for e-Science. Work. e-Science Sci. Work. Grids. 1–523 (2007). 5. Miles, S., Wong, S.C., Fang, W., Groth, P., Zauner, K.P., Moreau, L.: Provenance- based validation of e-science experiments. Web Semant. 5, 28–38 (2007). 6. Deelman, E., Vahi, K., Juve, G., Rynge, M., Callaghan, S., Maechling, P.J., Mayani, R., Chen, W., Ferreira Da Silva, R., Livny, M., Wenger, K.: Pegasus, a workflow management system for science automation. Futur. Gener. Comput. Syst. 46, 17–35 (2015). 7. Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: an extensible system for design and execution of scientific workflows. Sci. Stat. Database Manag. 2004. Proceedings. 16th Int. Conf. I, 423–424 (2004). 8. Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics. 20, 3045–3054 (2004). 28 Kirill Borodulin and Gleb Radchenko 9. Goecks, J., Nekrutenko, A., Taylor, J.: Galaxy: a comprehensive approach for sup- porting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010). 10. Wolstencroft, K., Haines, R., Fellows, D., Williams, A., Withers, D., Owen, S., Soiland-Reyes, S., Dunlop, I., Nenadic, A., Fisher, P., Bhagat, J., Belhajjame, K., Bacall, F., Hardisty, A., Nieva de la Hidalga, A., Balcazar Vargas, M.P., Sufi, S., Goble, C.: The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res. 41, (2013). 11. Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: The Condor experience. Concurrency Computation Practice and Experience. vol. 17, no. 24, 323-356 (2005). 12. Deelman, E.: Grids and Clouds: Making Workflow Applications Work in Heteroge- neous Distributed Environments. Int. J. High Perform. Comput. Appl. 24, 284-298 (2010). 13. Juve, G., Deelman, E.: Scientific workflows and clouds. Crossroads. 16, 14-18 (2010). 14. Ludescher, T., Feilhauer, T., Brezany, P.: Cloud-Based Code Execution Framework for scientific problem solving environments. J. Cloud Comput. Adv. Syst. Appl. 2, 1–16 (2013). 15. Pandey, S., Wu, L., Guru, S.M., Buyya, R.: A particle swarm optimization-based heuristic for scheduling workflow applications in cloud computing environments. In: Proceedings - International Conference on Advanced Information Networking and Applications, AINA. pp. 400407 (2010). 16. Sokolinsky, L.B., Shamakina, A. V.: Methods of resource management in problem- oriented computing environment. Program. Comput. Softw. 42, 17–26 (2016). 17. Nepovinnykh, E.A., Radchenko, G.I.: Problem-Oriented Scheduling of Cloud Ap- plications : PO-HEFT Algorithm Case Study. 2016 39th Int. Conv. Inf. Commun. Technol. Electron. Microelectron. MIPRO 2016 - Proc. 196–201 (2016). 18. Yang, Z.W.L.N.Y.: A market-oriented hierarchical scheduling strategy in cloud workflow systems. 63, (2013).