Dynamic task scheduling in cloud computing based on Naïve Bayesian classifier Fatemeh Ebadifard Seyed Morteza Babamir Department of Computer Engineering Department of Computer Engineering University of Kashan University of Kashan Kashan, Iran Kashan, Iran e-mail: ebadifard.fatemeh@gmail.com e-mail: babamir@kashanu.ac.ir Abstract—the issue of task scheduling in a cloud environment time, increases the efficiency of resources. The advantage of the is one of the most important issues that must be considered proposed algorithm in the above method is to reduce the by the cloud platform providers in data centers. The use of the overload and increase resource utilization. Simulation results right solution to solve this problem enables cloud platform show that this method performs well in completion time of the providers to have the most use of available resources; and also increase the customer satisfaction by providing quality longest task and increasing the level of load balancing. In of service parameters. In this paper it has been tried to provide summary it can be said that our main focus in this article a dynamic scheduling algorithm using machine learning includes the following: techniques and naïve-Bayes classifier in a cloud 1. Providing an appropriate method for scheduling the requests environment. The proposed method is one of the dynamic task using data mining techniques to reduce makespan time and scheduling methods and load distribution at any moment is increase resource utilization; conducted according to the latest information from previous 2. The use of classification techniques for equitable distribution and current server status. The distinction of this method of requests; with previous studies is the use of data mining techniques 3. Targeted analysis to show the effectiveness of the proposed (classification) in load distribution. Since this classification method has higher accuracy and speed compared with algorithm, compared to previous algorithms; other methods, therefore this classifier helps us to achieve the The remainder of this paper consists of the following sections: optimal solution in less time. Simulation results show that Section 2 reviews related work, in section 3 after the the proposed method has a good improvement in terms of introduction of Naïve Bayesian classifier and formulation of the Makespan time and load balancing degree. problem, details of the proposed method is described in detail. In Section 4, simulation and evaluation of the proposed method Keywords—task scheduling; cloud computing; naïve- is presented and finally, in Section 5, conclusions and future Bayes classifier; Virtual machine; Makespan works are expressed. I. INTRODUCTION II. RELATED WORK Cloud environment provides a huge context of servers in the Different studies are carried out in conjunction with load data center, so that when users request resources, provides balancing in cloud environment, load balancing algorithms are them in shared mode. To enable the applications to use the generally divided into two categories: static and dynamic: In the resources in accordance with their requirements, an static method, the allocation of tasks to virtual machines is based appropriate mechanism to distribute requests to virtual on the functionality of virtual machine and initial condition of machines is required. That is why the scheduling of the each machine; in other words, this process is only based on the requests is one of the most important issues in cloud data of the nodes and their features. This information includes environment and in recent years it has attracted much the amount of processing power, internal memory and storage attention of researchers. Task scheduling in cloud capabilities and the power of communications between other computing means optimal allocation of requests to the virtual machines. An important feature of static algorithms is computing resources in the data center [1]. In scheduling, that, these algorithms do not consider changes occurred tasks are allocated to different types of virtual dynamically on virtual machines at any moment. In addition, machines with regard to the limitations specified by the they do not have the ability to adapt to changing workloads on user and service provider. One of the major challenges in each virtual machine over time. Some static algorithms are task scheduling is the equitable distribution requests on Round Robin (RR) algorithms or weighted RR or load balancing the resources according to application requirements. algorithm using ant colony algorithm [2] or procedures based on Providing scheduling algorithm with the aim of load the amount of the resources of physical machines [3]. balancing can reduce makespan time and increase Unlike static algorithms, dynamic method of distribution in productivity of machines. addition to the basic functionality of each virtual machine, In this paper, a dynamic task scheduling algorithm has been proposed to increase the load balancing in the cloud environment. Using Naïve Bayesian classifier technique, the proposed algorithm has tried to put the requests on the machines in a balanced way; as in addition to the reduction of makespan Copyright © 2017 held by the authors 91 assigns tasks to virtual machines based on the current status of In Eq. (3-11) probability is calculated using Eq. (3-2). the machine and the workload on it. Such algorithms are required to review each moment of machines and based on the P(X|Yi)×P(Yi) P(Yi|X) = (3-2) results of this review, the requests are transferred from one P(X) machine to another. These methods, however, are of greater complexity than the static method, but they are more efficient In the Eq. (3-2), P (X) is constant for all classes and only the [4]. A lot of related work has been done on dynamic load values P (X | Yi) × P (Yi) should be highest value. To calculate balancing and each has been presented with different objective P (X | Yi) under the assumptions of Naïve Bayes, it is assumed such as response time[5,6,7], scalability[6,8], reducing that class conditional is independent and based on previous migration time in requests [9,10] and etc. since our objective in training is obtained according to Eq. (3-3). load balancing is reducing the response time, we mention some new studies in this field. Nakai et al. have presented a load P(X|Yi) = ∏nk=1 P(xk |Yi) (3-3) balancing mechanism in 2014 [6] based on reserve policy to distribute requests between replicated servers. This allowed overload servers to reserve a part of the capacity of remote In Eq. (3-3) if the values of P (X | Yi) × P (Yi) is more than most servers before receiving a new request and if requests were other classes in Yi class, this class is selected. higher than the amount of shared capacity of remote server, some of requests were discarded. Simulation results show that B. Details of the proposed method the proposed method reduces the response time and increases Suppose VM = {VM1, VM2… VMm} is a set of virtual load balancing. Even though the proposed method reduced the machines used to host user requests. Also Task = {T1, T2... Tn} response time, still some of the requests would be discarded and is a set of tasks that are intended to be run on virtual machines. that is why this method was not appropriate for our work. Details of the proposed method for assigning requests are as Yuan et al. [7] tried to improve the performance of load follows: balancing algorithm in 2015. They considered the network 1. First for initial allocation of the requests on virtual machines structure in addition to technical factors of load balancing. Thus, MAX-MIN method is used. As from the request queue the they provided a method which was efficiently applicable in request with the highest runtime is selected and put on the network and reduced the level of overhead in network in machine where the runtime of this request is less. Then the addition to reducing response time. This proposed algorithm was requests in the queue waiting and the runtime of requests on also not suitable for cloud environment due to low productivity each machine are updated. And it will continue until the and lack of scalability. completion of all requests. Mittal et al. [16] provided a method for scheduling requests in 2. After the initial allocation of the requests to the method by 2016 with the objective of reducing makespan time using load Max-Min, in this step, the load balancing status of the balancing algorithm in cloud platform. They compared their system is investigated. To do this, the standard deviation of algorithm with some of the most well-known algorithms of load the system load is obtained and thereby the load balancing balancing such as MIN-MIN [13], MAX-MIN [11] and of the system will be evaluated. To calculate the standard improved algorithms of MAX_MIN [14, 15] and RASA [12]. deviation of the load in the system the Eq.(3-4) can be used: Simulation results show that their algorithm has better 1 performance than other listed algorithms. We have also σ = √ ∑m i=1(PTi − PT) 2 (3-4) m compared our proposed algorithm with this algorithm. Where m is the number of machines in the system and PTi III. PROPOSED METHOD is the processing time of virtual machine i and PT is the processing time of the system. If the standard deviation is A. Naïve Bayesian Classification Method greater than the desired threshold, the system is unbalanced Naïve Bayesian is a statistical method for classification. and requires a request transfer from overloaded machines to Research shows that although this method compared with other under-loaded machines. For this, the machines are divided methods such as classification decision tree and selected neural into three categories: overload, balanced and under load and network classifiers have equal efficiency; in contrast has higher the request is then removed from the overload set and then accuracy and speed than other methods [17]. in accordance with the following procedure they will Assume that, there is m classes called {Y1, Y2, … , Ym} and tuple transfer to the members of the set of under-load machines. X have been given as input. Using the Classifier it is predicted 3. To transfer the requests from the overload class to the that X belongs to the class with the highest posterior probability, under-load class, Naïve Bayesian Classification Method is in other words, X belongs to the class Yi if and only if the Eq. used. For this purpose, each of the machines of under-load (3-1) is true. class is considered as a class. Request tk is selected from the overload class, and then Eq. (3-5) is calculated for it. P(Yi|X) ≥ P(Yj|X) j ∈ 1, . . m (3-1) 92 P(tk = cmpatible|VMi) × P(VMi) platform, as it has the minimum makespan time and maximum P(VMi|tk = compatible) = P(t k=compatible ) load balancing degree with equitable distribution requests on the (3-5) virtual machines. Input :list of tasks output : list of Vm Id for running any task; For all virtual machines P(t k=compatible ) is constant and is 1. Allocate Tasks to VMs Base on MAX-MIN Algorithm. obtained according to Eq. (3-6). 2.Calculate Standard deviation for load of all VMs If (Standard deviation>threshold) if(Global Capacity on all VM > Requested Capacity for Task) Group VMs based on load as UVM, BVM and OVM P(t k=compatible ) = 1 if (OVM≠ 𝜑) Select VM by maximum processing Time in OVM else TaskID: Select task by minimum processing time in Selected P(t k=compatible ) = −(Global Capacity on all VM−Requested Capacity for Task) VM in OVM Requested Capacity for Task Calculate Probablity value Of any VM in UVM based on (3-6) equation (3-5) Sort Probablity value any VMs in UVM by ascending order. P(VMi) For each machine is based on the utilization of the Select VM by maximum Probablity value machine. Since the utilization of each machine is less, it is better VMID=selected VM id to be selected, to achieve this goal, the probability of the Allocate selected task to VMID Else machine with less utilization increases; therefore P(VMi) is break; obtained according to the Eq. (3-7): 3. Repeat step 2 for load of all VMs (3-7) P(VMi) = 1 − Ucpu Figure 1. pseudo code of the proposed method used CPU Capacity Where Ucpu = ALL CPU Capacity TABLE I. HOST SPECIFICATION The amount of P(tk = cmpatible|VMi) is also calculated according to Eq. (3-3) based on the product of the probability of Number of Processing Ram Hard Bandwidth HostId processing previous requests put on that machine. speed(MIPS) (MB) (MB) (Mbps) Cores As a result, using Naïve Bayesian Classification Method for transfer of requests from overload machine to under-load 1 4 50000 204800 1048576 102400 machines, a machine from under-load machines is selected which its compatibility with the considered request is higher 2 2 25000 102400 1048576 102400 than other machines and it has less utilization. This trend has 3 1 10000 51200 1048576 102400 continued until load balancing in the whole system. Figure (1) shows the pseudo-code of the proposed algorithm. IV. RESULT EVALUATION If Makespan is calculated according to the Eq. (4-1); Figure (2) In this section we will discuss the details of the simulation of the indicates a comparison between the Makespan in the Round algorithm presented in the previous section. Then through the Robin method and scheduling of paper [16] and improved charts the proposed method are evaluated. This fact that, the algorithm MAX-MIN [18] and the proposed algorithm. environment is software as a service and also our tools for simulation is CloudSim software [18], would be useful. This Makespan = max{CTij |i ∈ T, i = 1,2, … , n and j ∈ VM, j = simulator allows us to create a virtualized environment and 1,2, … , m} (4-1) supports the allocation of resources based on the request. In fact, the core of the simulator for modeling our method is extended. Figure. (2) Shows the use of proposed method has relatively Cloud computing has been simulated to assess this sector of a better Ref. [16] at Makespan time. Since in ref. [16] the data center consisting 3 hosts with virtualization capabilities. In combination of algorithms MIN-MIN and RASA are used and fact, it is assumed that the virtual instruments such as Xen have in these algorithms only the runtime of the request on the virtual been installed, which can share resources. The properties of each machine is considered but in the proposed algorithm in addition host are according to Table (1). to the selection of algorithm MAX-MIN, load balancing 16 virtual machines with different characteristics are put on this algorithm is also used for suitable primary distribution and use data center. Each virtual machine runs some applications with of its benefits; and with the displacement of requests from variable number of instructions between 500 and 4500. As overload machines and putting them on the under-load mentioned earlier, we aim to provide comprehensive and machines, based on the utilization of the machine and appropriate algorithms for request scheduling in the cloud compatibility of the request with machine with classification techniques, the amount of Makespan decreases. 93 20 1.6 1.4 15 Degree Of Iimbalance Makespan(s) 1.2 10 1 0.8 5 0.6 0 0.4 0 20 40 60 80 100 0.2 Number Of Task 0 0 20 40 60 80 100 RR Improved RASA[16] Number Of Tasks Improved Max-MIN[18] Proposed Improved RASA[16] Proposed Figure 2. Makespan Comparison Figure 4. degree of imbalance Comparison Figure (3) indicate a comparison between the average response It is clear that the use of load balancing methods reduces the time for the requests, between the proposed method and Round degree of imbalance and thus the proposed method has lower Robin method and task scheduling of paper [16]. load balancing degree than other methods. This is due to the reduction in the runtime of the longest duration and balancing 10 the requests between virtual machines in the proposed method. 9 As indicated in the Figure (4), if the number of requests is lower, 8 due to fewer machines with overload and equal longest runtime Response Time(S) 7 in both methods, the degree of load imbalance is close to each 6 other, with the increasing requests of the proposed method, and 5 decreasing runtime of the longest task and balanced distribution 4 of requests, the degree of imbalance decreases. There are many 3 classification methods available including linear classifiers, 2 support vector machines, decision trees and neural networks. Figure. (5) Shows the comparison between the makespan for 1 proposed method by using support vector machines 0 Classification and proposed method by using Naïve Bayesian 0 20 40 60 80 100 Classification. Number Of Tasks 15 RR Improved RASA[16] Makespan Improved Max-MIN[18] Proposed 10 Figure 3. Response time Comparison 5 Another criterion which is important is the degree of imbalance 0 defined as the Eq. (4-2) [5]. 0 20 40 60 80 100 Tmax −Tmin Number of Tasks DI = (4-2) Tavg In Eq. (4-2), Tmax and Tmin are the highest and lowest runtimes Proposed by using SVM Proposed between virtual machines and Tavg is the average runtime among all virtual machines. Figure. (4) Shows the comparison between Figure 5. Makespan Comparison the degree of imbalance in the method provided in algorithm of paper [16] and the proposed algorithm. The horizontal axis shows the number of requests and vertical axis shows the degree of imbalance. 94 V. Conclusions and future works [15] U. Bhoi, P. N. Ramanuj, "Enhanced Maxmin Task Scheduling Algorithm in Cloud Computing," International Journal of Application In this paper we have provided a task scheduling method with or Innovation in Engineering & Management, ISSN2319-4847, vol. Naïve Bayesian Classification Method aim at creating load 2(4), 2013. balancing. This algorithm with classification of virtual machines [16] S. Mittal, A. Katal, "An Optimized Task Scheduling Algorithm in Cloud Computing," in 2016 IEEE 6th International Conference and selection of suitable machine for the existing requests on Advanced Computing (IACC), pp. 197-202, 2016. reduces the makespan time and increases the level of load [17] H. W. Lijuan Zhou, W. Wang, "Parallel Implementation balancing. In the following, the proposed method is compared of Classification Algorithms Based on Cloud Computing with the basic Round Robin method and the scheduling method Environment," Indonesian Journal of Electrical Engineering and [16] in terms of the criteria of Makespan, response time and load Computer Science, vol. 10(5), pp. 1087-1092, 2012. balancing level. In the future we plan to do this proposed method [18] R. Kuar, P. Luthra, "Load Balancing in Cloud System using Max Min and Min Min Algorithm", International Journal of for workflow scheduling; and also consider other criteria such as Computer Applications, National Conference on Emerging reduced cost for the service providers. Trends in Computer Technology (NCETCT-2014). REFERENCES [1] K. A. Nuaimi, N. Mohamed, M. A. Nuaimi, J. Al-Jaroodi, "A survey of load balancing in cloud computing: Challenges and algorithms," in Network Cloud Computing and Applications (NCCA), 2012 Second Symposium on, pp. 137-142, 2012. [2] S. K. Chaharsooghi, A. H. M. Kermani, "An effective ant colony optimization algorithm (ACO) for multi-objective resource allocation problem (MORAP)," Applied Mathematics and Computation, vol. 200, pp. 167-177, 2008. [3] M. Ajit, G. Vidya, "VM level load balancing in cloud environment," in 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1-5, 2013. [4] L. Guo, S. Zhao, S. Shen, C. Jiang, "Task scheduling optimization in cloud computing based on heuristic algorithm," Journal of Networks, vol. 7, pp. 547-553, 2012. [5] P. V. Krishna, "Honey bee behavior inspired load balancing of tasks in cloud computing environments," Applied Soft Computing, vol. 13, pp. 2292-2303, 2013. [6] A. Nakai, E. Madeira, L. E. Buzato, "On the Use of Resource Reservation for Web Services Load Balancing," Journal of Network and Systems Management, vol. 23(3), pp. 502-538, 2014. [7] Daraghmi, E. Y. and S.-M. Yuan. "A small world based overlay network for improving dynamic load-balancing." Journal of Systems and Software, vol. 107, pp. 187-203, 2015. [8] Y. Lu, Q. Xie, G. Kliot, A. Geller, J. Larus, A. Greenberg, "Join-Idle- Queue: A novel load balancing algorithm for dynamically scalable web services," Performance Evaluation, vol. 68(11), pp. 1056-1071, 2011. [9] F. Ramezani, J. Lu, F. K. Hussain, "Task-Based System Load Balancing in Cloud Computing Using Particle Swarm Optimization," International Journal of Parallel Programming, vol. 42(5), pp. 739-754, 2013. [10] Y. Fang, F. Wang, J. Ge, "A Task Scheduling Algorithm Based on Load Balancing in Cloud Computing," Web Information Systems and Mining: International Conference, WISM 2010, Sanya, China, October 23-24, 2010. Proceedings. F. L. Wang, Z. Gong, X. Luo and J. Lei. Berlin, Heidelberg, Springer Berlin Heidelberg, pp. 271-277, 2010. [11] T. Kokilavani, G. Amalarethinam,"Load Balanced Min-Min Algorithm for static Meta-Task Scheduling in Grid Computing," International Journal of Computer Applications (0975-8887), vol. 20(2), 2011. [12] S. Parsa, R. Entezari-Maleki, "RASA: A New Grid Task Scheduling Algorithm," International Journal of Digital Content Technology and its Applications, vol. 3(4), pp. 91-99, 2009. [13] G. Amalarethinam, V. Kfatheen, "Max-min Average Algorithm for Scheduling Tasks in Grid Computing Systems," International Journal of Computer Science and Information Technologies, vol. 3(2), 3659-3663, 2012. [14] O. M. Elzeki, M. Z. Reshad, M. A. Elsoud, "Improved Max-Min Algorithm in Cloud Computing", International Journal of Computer Applications (0975 – 8887), vol. 50(12), pp. 22-27, 2012. 95