Task scheduling in Desktop GRID by FSA method: a practical example Taras A. Uzdenova,b a Zhytomyr Polytechnic State University, 103 Chudnivska Str., Zhytomyr, 10005, Ukraine b G.E. Pukhov Institute for Modelling in Energy Engineering of NAS of Ukraine, 15 General Naumova Str., Kyiv, 03164, Ukraine Abstract The paper considers a new approach to solving the problem of dispatching task flows, the complexity of which is known, for GRID-systems with inalienable resources, the performance of which can be determined. A method based on this approach has been developed. The efficiency of the proposed method is compared with the well-known and widely used in various projects method FCFS. A feature of this method is the simplicity of implementation. An example of a simple practical problem that can be solved using the proposed method is described in this paper. Keywords GRID-systems, Desktop GRID, task scheduling, non-alienable resources, FCFS 1. Introduction Every modern organization has a number of computers on which its staff works, their use is not the most efficient, as most of them tasks performed on them do not take up 10–20% of the maximum performance of the PC. Therefore, it makes sense to use free resources to solve other problems. Therefore, the idea arose to create on the basis of such resources computer systems that would allow other tasks to be performed in parallel with the current ones for each of the nodes. Such systems are called GRID systems with non-alienable resources. Such systems are also known as Desktop GRID. Thus, Desktop GRID is a GRID system that uses non-specialized computing resources as computing nodes, but disparate computing nodes (computers, laptops, smartphones, etc.) using local and global networks and special software. Back in 1999, the first large-scale project of distributed voluntary computing SETI @ home was launched. Today, Desktop Grid is part of the high-performance computing industry along with clusters and GRID. One of the main tasks in creating a GRID-system with inalienable resources, as well as for GRID-systems, is the task of task scheduling. Therefore, in GRID-systems, a planning mecha- QuaInT 2021: Workshop on the Quantum Information Technologies, April 11, 2021, Zhytomyr, Ukraine doors 2021: Edge Computing Workshop, April 11, 2021, Zhytomyr, Ukraine " uzdenov.taras@gmail.com (T.A. Uzdenov)  0000-0002-0731-7620 (T.A. Uzdenov) Β© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Figure 1: New approach nism must be implemented. It is necessary for the distribution of tasks for execution between the nodes of the system, in order to minimize the execution time and balance the load of the system. One of the problems that needs to be solved when developing software for a GRID-system with inalienable resources is the task of task scheduling. Task scheduling is quite complex and now there is no clear and unambiguous solution [1, 2]. Analysis of scheduling methods used in real systems shows that most systems use mainly the FCFS method [3]. In addition, were analyzed a number of publications in recent years on this topic , in which developers offer new methods and various modifications of known ones. They are compared with FCFS and SJF (Shortest Job First) [4, 5, 6, 7, 8, 9, 10, 11, 12]. But since in real systems the FCFS method is most often used, it was decided to make a comparison with it in this study as well. This is due to the fact that this method is very simple and reliable in both development and operation. The use of other methods significantly complicates the system, making it less reli- able. Since such systems are quite unstable, it is clear why developers abandon complex meth- ods and prefer FCFS. This leads to the conclusion that it is necessary to develop new methods, the main characteristics of which should be simplicity and better performance compared to FCFS. 2. New approach This article further studies the methods developed on the basis of the new approach outlined in [13]. In particular, a simple practical task that could be performed on a GRID-system with non-alienable resources is considered, and the FSA method is used to distribute tasks between nodes. In figure 1 schematically shows the main essence of the proposed in [13] approach. The proposed methods are developed on the basis of a new approach, which proposes to consider tasks as one force, and the nodes on which they should be performed as another force, like 98 Newton’s third law. And distribute the tasks in such a way as to maintain a balance of forces. Given that such concepts as the strength of the task and the strength of the node are quite abstract concepts, the author proposes to use the concepts of task power and node power. And to carry out distribution already according to balance of capacities. This choice is not accidental, and can be explained as follows. It is known from physics that power is equal to the ratio of work to time. And the work, in turn, is equal to the force multiplied by the path to be traversed. Given that when performing a task on a computing node, the time to determine the power of the task will be equivalent to the time to determine the power of the node, and the path that the task must pass is equal to the path that the node must pass, such a replacement is quite logical and acceptable. In the developed methods, these concepts are quantities relative and therefore can be calculated in different ways, depending on different characteristics of tasks (volume, computational complexity, algorithmic complexity, etc.) and different characteristics of nodes (CPU clock speed, RAM volume, communication channel speed and others). The power of the task means the totality of all the characteristics of the task, and the power of the PC means the totality of all the characteristics of the PC, compiled in some way. Moreover, if for a PC it is still possible to use some general formula for calculation, then for the power of the problem such a formula will change each time, depending on the type of problem that needs to be solved. The approach described above is quite simple and effective to use, but it actually divides the scheduling task into three subtasks: 1. Calculation powers of tasks 2. Calculation powers of nodes 3. Distribution (FSA method) Both the first and the second subtasks are quite complex and today there are no unambiguous and universal solution for they. The fact is that any task has a number of characteristics, which have already been written above, and to compare them and somehow reduce to one value is quite difficult. This requires the development of additional methods that would provide such an opportu- nity. On the other hand, the task of calculating the powers of nodes is no less complex and also requires a separate study and solution. But at the same time there are a number of tasks for which the calculation of capacity will not cause much difficulty. This is well illustrated by the example of a simple practical problem described in the last section. 2.1. Formulation of the problem Suppose there is a GRID system with 𝑁 task and 𝑀 nodes. By nodes we mean a compu- tational element. We introduce the concept of task power 𝑃𝑛 and node power π‘…π‘š . There- fore, we have the set of power of tasks 𝑃 = {𝑃1 , 𝑃2 , 𝑃3 , ..., 𝑃𝑛 } and the set of power of nodes 𝑅 = {𝑅1 , 𝑅2 , 𝑅3 , ..., π‘…π‘š }. We need to optimally distribute tasks across nodes. Schematically, a given task is shown in figure 2. 2.2. Flow Scheduling Algorithm (FSA) The method of flow scheduling has the following form: 99 Figure 2: Scheduling task 1. calculate the power of tasks and the power of nodes 2. choose the 𝑖-th task 3. find the pair 𝑖 βˆ’ 𝑗, for which the ratio of the power of the task 𝑃𝑖 to the power of the node 𝑅𝑗 will be as close as possible to unity 4. send the 𝑖-th task to the 𝑗-th node 5. recalculate the power without 𝑃𝑖 and 𝑅𝑗 6. if there are unsent tasks, then go to point 2 7. completion 2.3. Flow Scheduling Algorithm Parallel (FSA_P) The described approach is universal and allows to develop not only methods for distribution of consecutive tasks but also for tasks which can be parallelized. Below is the following method: 1. calculate the power of the nodes, and the total power π‘ƒπ‘ π‘’π‘š 2. select the 𝑖-th task from the task queue 3. distribute it proportionally, according to the power of the nodes 4. send for execution 5. if there are unallocated tasks in the queue, go to point 2 6. completion 3. Software package To create software that allows you to test and investigate the effectiveness of the proposed methods, developed its client-server architectural model (figure 3). This model is based on the WCF service [14] for software that requires distributed computing in computer networks and the Internet, as well as to create a Desktop GRID. Based on this model, a software package for simulating the operation of a GRID-system with inalienable resources and the study of scheduling algorithms was developed, which was named SgridAR-1 [15]. A number of tests were performed using the developed simulator. To determine their ef- fectiveness, the FCFS scheduling method was chosen because it is most often used for GRID 100 Figure 3: Architectural model of Desktop GRID systems with inalienable resources. In addition, other methods cannot be used because in this case they do not involve a change in the power of the nodes, and other characteristics, such as time quantum or priority, it was decided not to enter. Figure 4 highlights nine areas in the Server window, which during testing displays informa- tion about the results of the simulation: 1. the total volume of all tasks to be solved 2. total execution time of all tasks for each algorithm 3. the area in which messages about the beginning and end of calculations are displayed, as well as information about which algorithm is currently working 4. in this area the conditions for testing are set: the number of tasks, their minimum and maximum value, the choice of algorithm 101 Figure 4: SGridAR-1 server window 5. list of all generated tasks, their scope and power, status (performed or not), execution time for each algorithm 6. after completing the test, this area receives summary information about the operating time of each algorithm, after which in area 2 the information is updated to perform a new test, and the button β€œExport to Excel” becomes active 7. the number of Clients to run 8. the maximum and minimum value of the power of the system nodes, after pressing the β€œChange Power PC” button, the system will automatically change the power in any way for all nodes, in the range from minimum to maximum values 9. information about the nodes connected to the system, their capacity and status (free or busy) SGridAR-1 allows to show work of the offered method and to compare results of its work with other, well-known, methods of dispatching. With the help of this software you can con- duct experiments and explore the work of algorithms, changing the number and size of tasks, the number and power of the PC. In this program the mechanism of generation of tasks was implemented. The task of which is to create a tasks to be executed in an arbitrary way, specifying the time required for the task to be performed. Power of tasks are calculated in proportion to the given time. The power of the same nodes is also generated arbitrarily. But depending on their size slow down the timer. The implemented visual interface clearly shows how the GRID system works. This set of programs can be used not only for research but also for the educational process. 102 4. Test results The SGridAR-1 system implements a testing mechanism that provides for the execution of all tasks generated by the system, using different algorithms for the distribution of tasks between nodes. That is, first the tasks are distributed according to algorithm 1, then according to algo- rithm 2 and so on until the last task. The following scheduling methods are introduced into the system: ETALON_P; FCFS_P; FSA_P; ETALON; FSA, FSA_Min, FSA_Max. Parallel methods: β€’ ETALON_P – the method is taken as a reference. This is a FCFS method that does not take into account the power of the personal computer (PC) and tasks. A system is modeled in which all nodes are the same in power. Parallel tasks are possible β€’ FCFS_P – FCFS method, which takes into account the power of the PC and the ability to distribute one task to several PCs β€’ FSA_P is a proposed method that takes into account the power of PCs and the ability to distribute one task to multiple PCs The following methods: β€’ ETALON – the method is taken as a reference. This is a FCFS method that does not take into account the power of the PC and the task cannot be distributed β€’ FCFS – a method that takes into account the power of the PC and the task can not be distributed between nodes β€’ FSA – the proposed methods, which take into account the power of the PC and the task can not be distributed β€’ FSA_Min – modified FSA method combined with Min-Min method [16] β€’ FSA_Max – modified FSA method combined with Max-Min method [17] For comparative experiments, 100 tasks with a runtime of 1 to 10 seconds were generated and 10 nodes were run on which they should be executed. Testing was as follows: first, the calculation of the total execution time of all tasks for 1 method, entered into the system, at an average power of the system 10%, then programmatically increased the system power by 5% and again performed calculations for 1 method. Then again increased the power of the system by 5% and performed the calculation by 1 method. And so on until the power of the system has grown to 100%. Thus, testing was performed for each of the methods. Each time performing the same tasks, changing only the average power of the system. A total of 19 average power tests were performed for each of the planning methods intro- duced into the system. A total of 152 tests were conducted. Such studies were conducted to show how the average power of the system affects the effi- ciency of the methods. 103 Figure 5: Test results for parallel methods In figures 5-7 are diagrams based on test results. Based on the results obtained, it can be concluded that the FSA and FSA_P methods give better results than the FCFS and FCFS_P methods. As can be seen in figure 5, the curve of the FSA_P method has a smooth shape, and the curve of the FCFS_P method is broken. This allows us to conclude that the FSA_P method is well predictable and, knowing the power of the system and the amount of tasks, it is possible to predict the completion of calculations. However, solving problems by the FCFS_P method, it is quite difficult to predict the completion of calculations, because a lot depends on which node, which task will be performed. In figure 6, it is seen that the sequential FSA method has a smoother curve than the FCFS method, which also indicates a better predictability. In figure 7, which shows additional methods FSA Min and FSA Max in comparison with the FSA method, it is seen that the differences in the results are insignificant, but this is only for the situation when the number of tasks exceeds the number of nodes. As can be seen from the graphs, the system also has ETALON and ETALON_P methods, and at first glance it may seem that their efficiency is much better than the FSA and FSA_P methods. But this is not the case. The fact is that the methods ETALON and ETALON_P are shown not for comparison with others, but to demonstrate the reference state of the system, when all the nodes in it have a capacity of 100%. As can be seen from the graphs, when the av- erage power increases to 1, the execution time by different methods approaches the maximum possible reference value. Figures 5, 6 show that the behavior of the FCFS method is very unstable, large jumps in the results. This is because the distribution by the FCFS method very much depends on a random factor, and which node will have what task. This is quite logical, because if, for example, a node with a capacity of 0.1 receives a task with a capacity of 1, then the execution time in this case will increase significantly, because there may be a situation that other nodes will work 104 Figure 6: Test results for following methods Figure 7: Test results for methods FSA, FSA_Min, FSA_Max and will wait too long to complete this task. 5. Practical task It is proposed to consider such a task. If, for example, we have a web resource with a large number of images (for example, a portfolio, an online store, etc.), and we want to promote it in search engines in order to attract more users and increase revenue, then we need will perform page optimization according to search engine rules (for example, using Google PageSpeed [18]), which includes image optimization. 105 Figure 8: Practical task In figure 8 schematically shows the problem described above. Let’s say we have 1000 images that need to be optimized according to Google Page Speed. Total files size is 5 GB. If you perform this task on one PC, it will take a long time (maybe even a full day). It all depends on the speed of the PC on which to do it. Therefore, to speed up this task, it would be advisable to divide it into different PCs that we have. To do this, we will need to determine which method will be used for distribution. In fact, as mentioned above, there is a new method of FSA, based on the described approach. The main difficulty of this method is to calculate the power of tasks and nodes. But for this practical example, this problem is quite simple to solve. Given that all images differ from each other only in size, the largest image (π‘†π‘šπ‘Žπ‘₯ ) is assigned power (π‘ƒπ‘šπ‘Žπ‘₯ = 1), and the remaining (𝑛 βˆ’ 1) images are calculated by the power of the tasks by the formula: 𝑆𝑖 β‹… π‘ƒπ‘šπ‘Žπ‘₯ 𝑃𝑖 = π‘†π‘šπ‘Žπ‘₯ Given that (π‘ƒπ‘šπ‘Žπ‘₯ = 1) we can shorten the expression: 𝑆𝑖 𝑃𝑖 = (1) π‘†π‘šπ‘Žπ‘₯ On the other hand, we have π‘š PCs (computing nodes), which differ from each other, for example, only the clock speed of the processor. Then, similarly to the calculation of the power of the problem, we can find the power of the nodes, according to the proportion. The largest by the clock frequency of the node (πΉπ‘šπ‘Žπ‘₯ ) is assigned power (π‘…π‘šπ‘Žπ‘₯ = 1), and the remaining (π‘š βˆ’ 1) power of the nodes is calculated by the formula: 106 𝐹𝑗 β‹… π‘…π‘šπ‘Žπ‘₯ 𝑅𝑗 = πΉπ‘šπ‘Žπ‘₯ Given that (π‘…π‘šπ‘Žπ‘₯ = 1) we can shorten the expression: 𝐹𝑗 𝑅𝑗 = (2) πΉπ‘šπ‘Žπ‘₯ Then the algorithm of the method of scheduling FSA tasks can be reduced to the following form: 1. calculate 𝑃 = 𝑃1 , 𝑃2 , 𝑃3 , ..., 𝑃𝑛 by formula (1) 2. calculate 𝑅 = 𝑅1 , 𝑅2 , 𝑅3 , ..., π‘…π‘š by formula (2) 3. choose the 𝑖-th task 4. find the pair 𝑖 βˆ’ 𝑗, for which the condition { } |𝑃 | | 𝑖 | min | βˆ’ 1| | 𝑅𝑗 | | | is fulfilled. 5. send the 𝑖-th task to the 𝑗-th node 6. recalculate the powers without 𝑃𝑖 and 𝑅𝑗 7. if there are unsent tasks, then go to point 2 8. completion Thus, we showed that calculating the power of tasks and the power of nodes is not such a difficult task when you need to distribute the image between nodes in order to optimize the size. If you use different nodes, but different in speed of connection to the network, then the power of the nodes must be calculated in some other way, because not always the node with the processor with the highest clock speed will have the highest power. And if this point is not taken into account, then the distribution will not be as effective as in the first case. As mentioned above, this method is universal and the distribution of tasks according to it makes it possible to significantly speed up the execution of the task queue and thus increase the efficiency of GRID-systems with inalienable resources compared to the FCFS method. 6. Conclusions The results of the study of the effectiveness of the proposed methods showed that their use for task allocation in GRID-systems with non-alienable resources provides a significant reduction in task queue time compared to the FCFS scheduling method, provided that the number of tasks exceeds the number of nodes. All the proposed methods are quite stable and well-predicted, which means that their use in GRID-systems will give advantages not only in time and performance, but also allow more efficient planning of the system work. The FCFS method works well, but for GRID systems that 107 have different power resources, its performance depends heavily on a random fact that can be considered a significant drawback. The main difficulty in using these methods is the need to somehow reduce all the character- istics of tasks and nodes to one relative value. But there are a number of tasks for which this can be done quite easily. As shown in the example of the problem of image optimization, it is quite easy to calculate both the power of tasks and the power of nodes. References [1] V. Kropyvnytska, B. Klim, A. Romanchuk, M. Slabinoga, Investigation of scheduling al- gorithms in computer systems, Rozvidka ta rozrobka naftovykh i hazovykh rodovyshch 2 (2011) 93–105. [2] S. Node, Desktop grids, connecting everyone to science, 2021. URL: https://sciencenode. org/feature/desktop-grids-connecting-everyonescience.php. [3] S. Choi, H. Kim, E. Byun, C. Hwang, A Taxonomy of Desktop Grid Systems Focusing on Scheduling, Technical Report, KU-CSE-2006-1120-02, 2006. [4] S. Sahana, Evolutionary based hybrid GA for solving multi-objective grid scheduling problem, Microsystem Technologies (2019). URL: https://doi.org/10.1007/ s00542-019-04673-z. doi:10.1007/s00542-019-04673-z. [5] D. Carastan-Santos, R. Y. De Camargo, D. Trystram, One can only gain by replacing EASY Backfilling, a simple scheduling policies case study, in: 19th IEEE/ACM Interna- tional Symposium on Cluster, Cloud and Grid Computing (CCGRID), IEEE, 2019, pp. 1–10. doi:10.1109/CCGRID.2019.00010. [6] K. Dheenadayalan, V. N. Muralidhara, G. Srinivasaraghavan, Storage load control through meta-scheduler using predictive analytics, in: N. BjΓΈrner, S. Prasad, L. Parida (Eds.), Dis- tributed Computing and Internet Technology, Springer International Publishing, Cham, 2016, pp. 75–86. [7] A. A. Haruna, L. T. Jung, N. Zakaria, Design and Development of Hybrid Integrated Thermal Aware Job Scheduling on Computational Grid Environment, in: 2015 Interna- tional Symposium on Mathematical Sciences and Computing Research, 2015, pp. 13–17. doi:10.1109/ismsc.2015.7594020. [8] Y. Thet, H. Hlaing, T. T. Yee, Static Independent Task Scheduling on Virtualized Servers in Cloud Computing Environment, in: 2019 International Conference on Advanced In- formation Technologies, IEEE, 2019, pp. 55–59. doi:10.1109/aitc.2019.8920865. [9] P. S. Kumar, L. Parthiban, V. Jegatheeswari, Privacy and security issues in cloud computing using idyllic approach Latha Parthiban, Networking and Virtual Organisations 21 (2019) 30–42. doi:10.1504/IJNVO.2019.101146. [10] M. Kaur, Multi-objective Evolution-Based Scheduling of Computational Intensive Ap- plications in Grid Environment, in: Proceedings of the International Conference on Data Engineering and Communication Technology, 2017, pp. 457–467. doi:10.1007/ 978-981-10-1678-3_44. [11] P. Naithani, Genetic Algorithm Based Scheduling To Reduce Energy Consumption In 108 Cloud, in: 2018 Fifth International Conference on Parallel, Distributed and Grid Comput- ing, IEEE, 2018, pp. 616–620. doi:10.1109/pdgc.2018.8745801. [12] A. Pujiyanta, L. E. Nugroho, Planning and Scheduling Jobs on Grid Computing, in: 2018 International Symposium on Advanced Intelligent Informatics, IEEE, 2018, pp. 162–166. doi:10.1109/icic47613.2019.8985978. [13] T. Uzdenov, A New Task Scheduling Algorithm for GRID Systems with Non-alienable Resources, Springer International Publishing, Cham, 2021, pp. 207–220. doi:10.1007/ 978-3-030-69189-9_12. [14] M. Docs, What Is Windows Communication Foundation β€” WCF, 2021. URL: https://docs. microsoft.com/en-us/dotnet/framework/wcf/whats-wcf. [15] T. Uzdenov, Simulator of Task Sheduling in Geographicaly Distributed Computer sys- temswith Non-Alienable Resources, Electronic Modeling 42 (2021). [16] T. Kokilavani, D. I. G. Amalarethinam, Load Balanced Min-Min Algorithm for Static Meta- Task Scheduling in Grid Computing, International Journal of Computer Applications 20 (2011) 43–49. [17] D. Ramyachitra, P. P. Kumar, Frog leap algorithm for homology modelling in grid envi- ronment, Journal of Emerging Technologies and Innovative Research 7 (2016). [18] G. Docs, Pagespeed insights, 2021. URL: https://developers.google.com/speed/docs/ insights/v5/about. 109