Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 SOME ASPECTS OF THE WORKFLOW SCHEDULING IN THE COMPUTING CONTINUUM SYSTEMS V. Kashansky1,a, R. Prodan1, G. Radchenko2 1 Institute of Information Technology, University of Klagenfurt, Austria 2 Silicon Austria E-mail: a vladislav.kashanskii@aau.at Contemporary computing systems are commonly characterized in terms of data-intensive workflows, that are managed by utilizing large number of heterogeneous computing and storage elements interconnected through complex communication topologies. As the scale of the system grows and workloads become more heterogeneous in both inner structure and the arrival patterns, scheduling problem becomes exponentially harder, requiring problem-specifc heuristics. Despite several decades of the active research on it, one issue that still requires effort is to enable efficient workflows scheduling in such complex environments, while preserving robustness of the results. Moreover, recent research trend coined under term "computing continuum" prescribes convergence of the multi- scale computational systems with complex spatio-temporal dynamics and diverse sets of the management policies. This paper contributes with the set of recommendations and brief analysis for the existing scheduling algorithms. Keywords: scheduling, algorithms, brief review, workflows Vladislav Kashansky, Radu Prodan, Gleb Radchenko Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 106 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 1. Introduction Recent advancements [1-2] in the field of parallel and distributed computing led to the definition of the computing continuum [3] as the environment incorporating highly heterogeneous systems with dynamic spatio-temporal organizational structures [4], varying in-nature workloads (Fig 1-a), complex control hierarchies [5], governing computational clusters with multiple scales of the processing latencies, and diverse sets of the management policies [6-7]. The emergence of these systems is the natural response to the ever-growing variability of computational demands. However, architecting of the algorithms in such environments, e.g. task schedulers, I/O schedulers and resource scalers, is affected by the high degree of uncertainty in relation to the future operational conditions and suffers from tractability issues. (a) (b) Figure 1. Examples of networks studied in one of the last works [8]. (a) - directed acyclic graph of 60 tasks, rendered und library PSLIB. (b) is a network of 256 agents of the computational continuum with a scale-free topology [9]. As the scale of the system grows and the workloads become more heterogeneous in the inner structure and the arrival patterns, scheduling problem becomes exponentially harder, requiring problem-specific heuristics. This paper contributes with the set of recommendations and brief analysis for the existing scheduling algorithms. Due to the lack of space we address reader interested in theoretical aspects and definition of the computing continuum to the works [3,8]. 2. Scheduling Methods Due to the NP-hardness [11], the large varieties of algorithms were proposed. We summarized some of those in the Table 1. To begin with, one possible approach involves problem reformulation in order to have a more flexible and less computationally demanding control structures [7]. List- scheduling methods (Tab.1 #1, 2) do not provide full knowledge on the workflow execution, by effectively skipping scheduling phase, and replace it with the simplest possible ranking and matching policies [10]. Such methods typically run very fast in polynomial time, however precision is the major problem in this case. It is important to highlight, that those approaches are de-facto standard in the case of the large-scale computational systems / workflows and maintain significant robustness with low resource consumption. Further, non-stationary aspects (e.g. price, performance, reliability prediction etc.) of the computational network are normally out of scope in this approach and can be incorporated via various averaging techniques. More specifically, in case we have set of identified ordinary differential equations or hybrid stochastic/difference equations, we can propagate dynamics further in time, but use only averaged values in the ranking and matching phases. Interruptions are not considered here, since schedule is not generated at all. 107 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 Table 1. Comparative summary on the various workflow scheduling algorithms Method Complexity Precision Dynamic System Int. Stoch. Based On Complete Method Type Type Aspects Aspects Schedule 1. HEFT Low Low Averaging Not No Averaging Recursive No Heuristic & Important Ranking [10,7] Variations 2. CPOP Low Low Averaging Not No Averaging Critical Path No Heuristic & Important [10,7] Variations 3. 2P- Medium Medium Averaging Discrete No Monte Direct Graph Yes Heuristic SGS Carlo Propagation [11] 4. GQAP1 High Medium Mean Field Not No Averaging Quadratic MIP No Heuristic Important Precedence Relaxation [12] 5. GAP High Medium Mean Field Not No Averaging Recursive No Heuristic Important Ranking / GAP2 Relaxation [13] 6. 2P- High Very ODE Continous Yes Monte Pontryagin’s Yes Exact DYNA High Formulation / Mixed Carlo Principle [14] (FDTO, FOTD) 7. MINLP Very High Very Any Mixed Yes Monte Direct / MILP Yes Exact High Carlo Family Relaxation Solution [15] Generally speaking, we would call these approaches highly entropic, since here we are dealing with coarse-grained processes. However there is still a very intriguing research question on how to integrate simple HEFT-like ranking policies [10] and FCFS insertion-based polices with dynamics propagation at different time-scales. Another two methods, namely GQAP [12] and GAP [13] are based on the quadratic and linear precedence relaxation in MIP3 formulation. Precedence constraints, prescribed by the directed acyclic graph (DAG), are not reflected in the equations. These approaches can be effectively combined with previous two and offer global picture, when performing matching of the tasks to the partitions of machines. Again, applicability of these techniques significantly depends on the structure of the optimization objectives, as quadratic and linear relaxations can inadequately model minimum makespan problems with the workflow structure. It contrasts, for example, with cost optimization, which can be independent of the task running times. Dynamic aspects and interruptions are not considered, because schedule is not generated. One of the drawbacks here is that unbalanced centralized optimization approach with conventional optimization packages CPLEX or SCIP becomes impractical for large-scale systems, due to the large number of variables, delays and information exchange volumes. It leads to huge computational and communication costs. An open question to investigate is how to define static/dynamic relationships between these controllers and groups of controlled agents. Therefore, several hybrid schemes have been proposed to solve large-scale problems, allowing to divide a complex problem into several less complex subtasks. Distributed hierarchical system management strategies are expected to allow sub-optimal performance of control systems in large-scale environments, using the principles of locality while balancing the performance of the whole system, thus enabling scalable infrastructures. 1 GQAP – Generalized Quadratic Assignment Problem 2 GAP – Generalized Assignment Problem 3 MIP – Mixed-Integer Programming 108 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 2P-SGS is a polynomial heuristic method in MRCPSP notation [11] which takes an intermediate place as it allows to compute complete schedule in two phases. First phase prescribes matching of the tasks to machines and second phase performs activity sequencing subject to the precedence and resource constraints. Algorithm works for discrete execution times and normally suitable for medium scales, as complexity of the SGS phase is , where n is the number of tasks and k is the number of resource constraints. More information on that approach can be found in [11]. We call 2P-DYNA family of the exact algorithms based on the ODE formulation and optimal control theory principles [14-16]. Interruptions can be considered here, however discretization intervals must allow that. Non-stationary aspects (e.g. price, performance, reliability prediction etc.) of the computational network can be modeled very well in this approach and can be incorporated also with various averaging techniques. Both FDTO4 and FOTD5 formulations are possible, but each will lead to the different family of methods, including solution of the TPBVP 6 and MIP [15, 16]. The general possible approach to optimization at the planning horizon is to implement it by solving the MILP or MINLP problem (Tab. 1. #7) with appropriate direct, heuristic and metaheuristics methods. Book [17] contains excellent review of the various mathematical models, including exact and heuristic methods. Both methods #6 and #7 offer very high precision, but also pose extreme difficulty to solve. Normally, attempt to find exact optimum within these techniques can be used in frames of Monte Carlo supercomputing simulations of the complex multilayered computing continuum networks. 4. Conclusion In this paper we attempted to summarize the set of recommendations and brief analysis for the existing scheduling algorithms. Ideally this paper will help beginners in the field to get on track and select corresponding research direction. It can also serve as the reference point. 5. Acknowledgement This work has been supported by ADAPT Project funded by the Austrian Research Promotion Agency (FFG) under grant agreement No. 881703 and ASPIDE Project funded by the European Union's Horizon 2020 Programme (H2020) under grant agreement No. 801091. 4 FDTO – First Discretize Then Optimize 5 FOTD – First Optimize Then Discretize 6 TPBVP – Two Point Boundary Value Problem 109 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 References [1] Reed, D.A. and Dongarra, J., 2015. Exascale computing and big data. Communications of the ACM, 58(7), pp.56-68. [2] Asch, M., et al., 2018. Big data and extreme-scale computing: Pathways to convergence-toward a shaping strategy for a future software and data ecosystem for scientific inquiry. The International Journal of High Performance Computing Applications, 32(4), pp.435-479. [3] Beckman, P., Dongarra, J., Ferrier, N., Fox, G., Moore, T., Reed, D. and Beck, M., 2020. Harnessing the computing continuum for programming our world. Fog Computing: Theory and Practice, pp.215-230. [4] Kashansky, V., et al. 2021, September. The ADAPT Project: Adaptive and Autonomous Data Performance Connectivity and Decentralized Transport Network. In Proceedings of the Conference on Information Technology for Social Good (pp. 115-120). [5] Copil, G., Moldovan, D., Truong, H.L. and Dustdar, S., 2013, December. Multi-level elasticity control of cloud services. In International Conference on Service-Oriented Computing (pp. 429-436). Springer, Berlin, Heidelberg. [6] Reuther, A., et al. 2018. Scalable system scheduling for HPC and big data. Journal of Parallel and Distributed Computing, 111, pp.76-92. [7] Ilyushkin, A., Ali-Eldin, A., Herbst, N., Papadopoulos, A.V., Ghit, B., Epema, D. and Iosup, A., 2017, April. An experimental performance evaluation of autoscaling policies for complex workflows. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering (pp. 75-86). [8] Kashansky, V., Radchenko, G. and Prodan, R., 2021, June. Monte Carlo Approach to the Computational Capacities Analysis of the Computing Continuum. In International Conference on Computational Science (pp. 779-793). Springer, Cham. [9] Albert, R. and Barabási, A.L., 2002. Statistical mechanics of complex networks. Reviews of modern physics, 74(1), p.47. [10] Topcuoglu, H., Hariri, S. and Wu, M.Y., 2002. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE transactions on parallel and distributed systems, 13(3), pp.260-274. [11] Kolisch, R. and Hartmann, S., 1999. Heuristic algorithms for the resource-constrained project scheduling problem: Classification and computational analysis. In Project scheduling (pp. 147-178). Springer, Boston, MA. [12] Lee, C.G. and Ma, Z., 2004. The generalized quadratic assignment problem. Research Rep., Dept., Mechanical Industrial Eng., Univ. Toronto, Canada, p.M5S. [13] Cattrysse, D.G. and Van Wassenhove, L.N., 1992. A survey of algorithms for the generalized assignment problem. European journal of operational research, 60(3), pp.260-272. [14] Dolgui, A., Ivanov, D., Sethi, S.P. and Sokolov, B., 2019. Scheduling in production, supply chain and Industry 4.0 systems by optimal control: fundamentals, state-of-the-art and applications. International Journal of Production Research, 57(2), pp.411-432. [15] Moiseev, N. N. 1974. Element of the Optimal Systems Theory. Moscow: Nauka (in Russian). [16] Zimin, I.N. and Ivanilov, Y.P., 1971. Solution of network planning problems by reducing them to optimal control problems. USSR Computational Mathematics and Mathematical Physics, 11(3), pp.113-124. [17] Brucker P, Knust S (2012) Complex scheduling, 2nd edn. Springer, Berlin. DOI: https://doi.org/10.1007/978-3-642-23929-8 110