=Paper= {{Paper |id=Vol-2267/293-297-paper-55 |storemode=property |title=Modeling of task scheduling in desktop grid systems at the initial stage of development |pdfUrl=https://ceur-ws.org/Vol-2267/293-297-paper-55.pdf |volume=Vol-2267 |authors=Ilya I. Kurochkin,Evgeny A. Gerk }} ==Modeling of task scheduling in desktop grid systems at the initial stage of development== https://ceur-ws.org/Vol-2267/293-297-paper-55.pdf
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018




     MODELING OF TASK SCHEDULING IN DESKTOP GRID
     SYSTEMS AT THE INITIAL STAGE OF DEVELOPMENT
                              I.I. Kurochkin 1,2, a, E.A. Gerk 2, b
 1
     Institute for Information Transmission Problems of Russian Academy of Sciences, Bolshoy Karetny
                                  per. 19, build.1, Moscow, 127051, Russia
 2
     The National University of Science and Technology MISiS, Leninskiy prospekt 4, Moscow, 119049,
                                                 Russia

                         E-mail: a kurochkin@iitp.ru, b gerkevgeny@gmail.com


The paper presents an overview of modern methods for task scheduling in desktop grid systems,
estimates of the quality of methods, including: the time of execution of all tasks, the level of resource
utilization. Heuristic approach to task scheduling is considered, which allows ensuring high
performance and reliability of such systems at the early stages of development. A comparative
analysis of the results of computational experiments performed with the help of the GridSim high
performance computing simulation tool for various desktop grid system configurations is carried out.

Keywords: desktop grid, grid system, task scheduling, GridSim, simulation of desktop grid

                                                                   © 2018 Ilya I. Kurochkin, Evgeny A. Gerk




                                                                                                        293
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018




1. Introduction
        The desktop grid is a grid system that unites personal devices (personal computers, laptops,
smartphones, etc.) using telecommunications networks and uses their idle computing resources for
calculations. In the early 2000s, the desktop grid implied the unification of only personal computers.
Today, the desktop grid is a heterogeneous system that combines not only various non-specialized
computing devices, but also multiprocessor computing systems with a large amount of hardware and
software heterogeneity. The idea of recycling idle resources of a large number of personal computers
is extremely attractive. Such a solution, which requires almost no costs, has enormous potential. The
technology of organization desktop grid has the following advantages: ease of deployment; low cost;
high scalability (hundreds of thousands computing nodes); high potential peak performance; low
financial cost of creation and maintenance. But there are also a number of features characteristic of
such grid systems: high hardware and software heterogeneity; lack of information about the
availability of nodes; low reliability of nodes. And the simultaneous provision of energy efficiency,
safety, reliability and high performance is a pressing issue for modern grid systems. These goals
conflict with each other and require increased attention to the problem of scheduling tasks at the
desktop grid.


2. Overview of existing task scheduling algorithms
         In papers [1] [2], a wide range of heuristics and algorithms were proposed for solving separate
scheduling problems in grid systems. These heuristics use additional information, such as reliability
ratings, estimation of availability periods, node failure functions, etc. The MET (Minimum Execution
Time) algorithm proposed in [3] assigns each task in an arbitrary order to the compute node with the
minimum time to complete this task without regard to its availability. Ignoring the availability of a
computational node when assigning tasks (jobs) to it can lead to a load imbalance in the grid system.
In turn, MCT (Minimum Completion Time), also proposed in [3], assigns each task in an arbitrary
order to the compute node with the earliest completion time for this task (the sum of the compute node
ready time and the task execution time at the computation node), thereby taking into account the
current resource load when assigning the next task. This algorithm allows you to balance the load on
the computational nodes of the grid system, but at the same time leads to the execution of tasks on less
fast computational nodes. The Min-min algorithm, described in [4], selects the machine with the
minimum completion time and assigns the task according to the MCT. The main difference between
this algorithm and the MCT algorithm is that Min-min considers all unselected tasks while making an
assignment decision and MCT considers only one randomly selected task. This advantage allows you
to build a more efficient map of task performance, comprehensively assessing the performance of the
MCT algorithm. The Max-min algorithm [4], in contrast to the Min-min algorithm, selects the
machine with the minimum completion time, and the task with the maximum completion time. The
RASA heuristic uses the Min-min algorithm to assign the first task (job), if the number of available
resources is odd, otherwise it uses the Max-min algorithm. Further, sequential alternation of the
specified algorithms is applied when assigning a task to the available nodes of the system. Heuristics
LBMM (Load Balanced Min-Min) allows to reduce the total time to complete all tasks and increase
the degree of utilization of resources.


3. Proposed approach to scheduling tasks
         The proposed approach to scheduling jobs – Sort-Median (SM). The main objective of the
proposed heuristics is to reduce the overall execution time of all tasks and increase the level of
utilization of the resources of the grid system. Let T={ti : i=1, 2, …, n} – a set of computing tasks in
the desktop grid, with each task ti has a length lengthti in FLOP. Let H={hj : j=1, 2, …, m} – a set of
computational nodes, with each node hj have performance powerhj in FLOPS. Then the schedule for
performing the set of tasks T on the set of computing nodes H of the system is determined by the
following expression:


                                                                                                        294
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018



                                     𝑆 = {𝑠𝑖𝑗 ∶ 𝑖 ∈ 𝑁, 𝑗 ∈ 𝑀},                             (1)
        where N is the set of numbers of computing tasks to be performed on the nodes of the grid
system, according to schedule S, and M is the set of numbers of computational nodes to which these
tasks will be assigned. We will also define a matrix in which we will store estimates for the
completion of tasks (Completion Time, CT):
                                           𝑐𝑖𝑗 = 𝑒𝑖𝑗 + 𝑟𝑗 ,                                (2)
        where eij is the task execution time ti on the hj node, rj is the release time of the hj resource.
        The description of this algorithm can be divided into the following steps:
1. CT matrix initialization — for each task, the execution time on each computing node is estimated;
2. bypassing a set of tasks is performed if it is not empty:
    2.1. The CT matrix is sorted and the SCT (Sorted Completion Time) matrix contains the medians
         in rows (for each computational task) – this characteristic, unlike the average value, reflects
         how “expensive” the miscalculation of a particular task will cost, if it is not counted first;
    2.2. assigns the task tk with the maximum median to the computation node hp with the minimum
         completion time;
    2.3. the rp value is updated, as well as the column with the index p of the matrix CT;
    2.4. removes the tk task sent for rendering from the set.
        Unlike the previously proposed Min-Min or Min-Max algorithms, this algorithm allows you to
get a comprehensive assessment when assigning tasks to a computational node, namely, to show the
degree of efficiency of calculating a particular task in the first place.


4. Grid system modeling tools
        To model distributed computing systems, a special class of software is used. Depending on the
method of reproducing the processes of functioning of distributed systems, these tools are divided into
two classes: emulators and simulators. Simulators include the following systems: OptorSim [5],
GridSim [6], WorkflowSim, SimGrid, etc. These systems provide an opportunity to obtain statistical
data on the most important characteristics of the simulated environment. The GridSim platform allows
users to simulate the operation of a grid system with the ability to simulate the characteristics of
resources and computer networks in various configurations. With the help of GridSim, it is possible to
carry out reproducible experiments that are difficult to implement in the present environment of
dynamic grid systems.


5. Results and discussions
        As part of the studies conducted on the basis of the GridSim software, an application was
developed for modeling the operation of the grid system and testing the tasks scheduling algorithms.
The following modeling scenarios were highlighted (Table 1), the main task of which is to evaluate
the effectiveness of the proposed approach to scheduling in comparison with existing algorithms
according to the above criteria and the behavior of the grid system at the early stages of its
development, when the number of computing nodes is sufficiently small.
                                                           Table 1. Scenarios for desktop grid simulation
Number         Number of              Number of      Performance of computational          Complexity of
           computational nodes          tasks              nodes (𝐺𝐹𝐿𝑂𝑃𝑆)                  tasks (𝑇𝐹𝐿𝑂𝑃)
   1               10                   1000                  N(50,10)                       N(700, 50)
   2               50                   6000                  N(50,10)                       N(700, 50)
   3              100                  15000                  N(50,10)                       N(700, 50)




                                                                                                        295
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018



         For a comparative analysis of the proposed scheduling algorithm with existing ones, we
introduce two quality assessments: the level of utilization of the resources of the grid system (Grid
Utilization, GU) and the total execution time (Makespan). Let the execution time of all tasks:
                                𝑀𝑎𝑘𝑒𝑠𝑝𝑎𝑛 = max{𝑟𝑗 |∀ 1 ≤ 𝑗 ≤ 𝑚}.                     (3)
         Then the level of utilization of the resources of the grid system:
                                                    ∑𝑚
                                                     𝑗=1 𝑟𝑗                         (4)
                                          𝐺𝑈 =               .
                                                 𝑚×𝑀𝑎𝑘𝑒𝑠𝑝𝑎𝑛
        The proposed quality estimates were calculated for the following scheduling algorithms: Max-
Min, Min-Min, RASA, Sort-Median. The diagram of the total task execution time (Figure 1) shows
the advantage of the proposed Sort-Median heuristics for all grid modeling scenarios.




              Figure 1. Total execution time            Figure 2. Resource utilization chart on nodes
         When scaling the grid system and increasing the number of tasks, it was possible to reduce the
total time to complete all tasks. The indicators of the resource utilization diagram (Figure 2) show a
significant advantage of the proposed scheduling algorithm, which allows you to evenly distribute the
load among all the computational nodes of the system. Despite the fact that the Sort-Median algorithm
has improved the performance of the grid system and increased the level of resource utilization, using
it in the proposed form for scheduling tasks on a real grid system is unacceptable. This approach
allows to take into account the variation in the performance of resources and the complexity of
computing tasks, but does not take into account the possibility of failure of the computational nodes,
as well as their return of erroneous results. These restrictions make it difficult to complete tasks on
time, they require additional time spent on re-execution of tasks. Therefore, it is necessary to modify
the Sort-Median algorithm to ensure the reliability of the calculations. Since the planning of tasks is
considered at the early stages of the development of the grid system, when there are no statistics on the
activity of computational nodes, reputational methods lose their relevance. The use of various
prediction heuristics is also difficult, since it is difficult to estimate the number and reliability of
computational nodes that can join or leave the system at any time. Therefore, premature scheduling of
all tasks on all compute nodes is the wrong approach. It is advisable to use short-term planning: break
up the set of all tasks into small subsets and make a decision about planning a specific subset of tasks
after returning a certain percentage of correct results. This approach will allow assessing the state of
computational nodes set at each scheduling iteration and make a decision on the assignment of tasks.
         Let the size of each subset be equal to the number of nodes in the grid system, and each new
iteration of short-term scheduling occurs when returning 35% of the valid results of the subset. It is
proposed to calculate the delay for each node:
                                                  𝑞
                                                 ∑𝑖=1(𝑒𝑖𝑗 −𝑎𝑖𝑗 )
                                          𝑑𝑗 =                 ,                            (5)
                                                       𝑞
        where aij is the actual time of execution of the task ti on the hj node, and q is the number of
correctly calculated tasks by the hj node. Then we introduce two types of penalties for each node: for
delay and for failure (failure or incorrect answer). The size of the penalty for failure for each
computing node is determined by the expression:
                                                  𝑔
                                           𝑝𝑗 = ∑𝑖=1 𝑒𝑖𝑗 ,                           (6)
        where g is the number of failures and incorrect answers for the hj node. Define the resulting
matrix as a weighted sum:
                                      𝑓𝑖𝑗 = 𝑐𝑖𝑗 + 𝑤(𝑑𝑗 + 𝑝𝑗 ),                        (7)
        The weighting factor w is the proportion of correct calculated results from all tasks.
Accordingly, as the number of tasks not counted decreases, the importance of the above penalties

                                                                                                        296
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018



increases during planning. The modified heuristics of the algorithm Sort-Median called PSM (Penalty
Sort-Median). Let us determine the characteristics of scenarios with the presence of failures in the
operation of the grid system for comparing the heuristics of Sort-Median and PSM. The simulated
percentage of failover computing nodes will vary from 20 to 35. The results show a significant
decrease in the total runtime of the script (about 60%) when using short-term planning and the PSM
algorithm in comparison with the Sort-Median algorithm. The main disadvantage of this solution is a
decrease in the level of utilization of resources due to an increase in their idle time.


6. Conclusion
         An approach using Sort-Median heuristics for scheduling tasks in desktop grid is proposed. A
comparative analysis of the algorithms showed the advantage of the proposed approach. Sort-Median
utilizes more than 99% of the grid system's resources in all cases, and also improves the performance
of the grid system by 3%. Penalty Sort-Median (PSM) heuristic was also proposed using short-term
scheduling, which is important for desktop grids at the early stages of their development.


7. Acknowledgement
       This work was supported by the Russian Foundation for Basic Research (grants No. 18-29-
03264, 18-57-06003).


References
[1] Casanova H. F. Dufossé, Y. Robert, F. Vivien. Scheduling parallel iterative applications on volatile
resources // Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International. – IEEE,
2011. – pp. 1012-1023.
[2] Wang X. Ch. Sh. Yeo, R. Buyya, J. Su. Optimizing the makespan and reliability for workflow
applications with reputation and a look-ahead genetic algorithm // Future Generation Computer
Systems. – 2011. – Vol. 27. – № 8. – pp. 1124-1134.
[3] Freund R. F. Scheduling resources in multi-user, heterogeneous, computing environments with
SmartNet //Heterogeneous Computing Workshop, 1998. (HCW 98) Proceedings. 1998 Seventh. –
IEEE, 1998. – pp. 184-199.
[4] Anousha S., Ahmadi M. An improved Min-Min task scheduling algorithm in grid computing //
International Conference on Grid and Pervasive Computing. – Springer, Berlin, Heidelberg, 2013. –
pp. 103-113.
[5] Korenkov, V. V., & Nechaevskiy, A. V. (2009). DataGrid simulation packages. System Analysi s
in Science and Education (Online), ISSN, 2071-9612, pp.21-35.
[6] Buyya R. Gridsim – Toolkit for the modeling and simulation of distributed resource management
and scheduling for grid computing / R. Buyya, M. Murshed // Сoncurrency and computation: practice
and experience. – 2002. – Vol. 14. – pp. 1175-1220.




                                                                                                        297