=Paper=
{{Paper
|id=Vol-1852/p15
|storemode=property
|title=Dynamic task scheduling in cloud computing
based on naïve bayesian classifier
|pdfUrl=https://ceur-ws.org/Vol-1852/p15.pdf
|volume=Vol-1852
|authors=Fatemeh Ebadifard,Seyed Morteza Babamir
}}
==Dynamic task scheduling in cloud computing
based on naïve bayesian classifier==
Dynamic task scheduling in cloud computing
based on Naïve Bayesian classifier
Fatemeh Ebadifard Seyed Morteza Babamir
Department of Computer Engineering Department of Computer Engineering
University of Kashan University of Kashan
Kashan, Iran Kashan, Iran
e-mail: ebadifard.fatemeh@gmail.com e-mail: babamir@kashanu.ac.ir
Abstract—the issue of task scheduling in a cloud environment time, increases the efficiency of resources. The advantage of the
is one of the most important issues that must be considered proposed algorithm in the above method is to reduce the
by the cloud platform providers in data centers. The use of the overload and increase resource utilization. Simulation results
right solution to solve this problem enables cloud platform show that this method performs well in completion time of the
providers to have the most use of available resources; and also
increase the customer satisfaction by providing quality
longest task and increasing the level of load balancing. In
of service parameters. In this paper it has been tried to provide summary it can be said that our main focus in this article
a dynamic scheduling algorithm using machine learning includes the following:
techniques and naïve-Bayes classifier in a cloud 1. Providing an appropriate method for scheduling the requests
environment. The proposed method is one of the dynamic task using data mining techniques to reduce makespan time and
scheduling methods and load distribution at any moment is increase resource utilization;
conducted according to the latest information from previous 2. The use of classification techniques for equitable distribution
and current server status. The distinction of this method of requests;
with previous studies is the use of data mining techniques 3. Targeted analysis to show the effectiveness of the proposed
(classification) in load distribution. Since this classification
method has higher accuracy and speed compared with
algorithm, compared to previous algorithms;
other methods, therefore this classifier helps us to achieve the The remainder of this paper consists of the following sections:
optimal solution in less time. Simulation results show that Section 2 reviews related work, in section 3 after the
the proposed method has a good improvement in terms of introduction of Naïve Bayesian classifier and formulation of the
Makespan time and load balancing degree. problem, details of the proposed method is described in detail.
In Section 4, simulation and evaluation of the proposed method
Keywords—task scheduling; cloud computing; naïve- is presented and finally, in Section 5, conclusions and future
Bayes classifier; Virtual machine; Makespan works are expressed.
I. INTRODUCTION II. RELATED WORK
Cloud environment provides a huge context of servers in the Different studies are carried out in conjunction with load
data center, so that when users request resources, provides balancing in cloud environment, load balancing algorithms are
them in shared mode. To enable the applications to use the generally divided into two categories: static and dynamic: In the
resources in accordance with their requirements, an static method, the allocation of tasks to virtual machines is based
appropriate mechanism to distribute requests to virtual on the functionality of virtual machine and initial condition of
machines is required. That is why the scheduling of the each machine; in other words, this process is only based on the
requests is one of the most important issues in cloud data of the nodes and their features. This information includes
environment and in recent years it has attracted much the amount of processing power, internal memory and storage
attention of researchers. Task scheduling in cloud capabilities and the power of communications between other
computing means optimal allocation of requests to the virtual machines. An important feature of static algorithms is
computing resources in the data center [1]. In scheduling, that, these algorithms do not consider changes occurred
tasks are allocated to different types of virtual dynamically on virtual machines at any moment. In addition,
machines with regard to the limitations specified by the they do not have the ability to adapt to changing workloads on
user and service provider. One of the major challenges in each virtual machine over time. Some static algorithms are
task scheduling is the equitable distribution requests on Round Robin (RR) algorithms or weighted RR or load balancing
the resources according to application requirements. algorithm using ant colony algorithm [2] or procedures based on
Providing scheduling algorithm with the aim of load the amount of the resources of physical machines [3].
balancing can reduce makespan time and increase Unlike static algorithms, dynamic method of distribution in
productivity of machines. addition to the basic functionality of each virtual machine,
In this paper, a dynamic task scheduling algorithm has
been proposed to increase the load balancing in the
cloud environment. Using Naïve Bayesian classifier
technique, the proposed algorithm has tried to put the requests
on the machines in a balanced way; as in addition to the
reduction of makespan
Copyright © 2017 held by the authors
91
assigns tasks to virtual machines based on the current status of In Eq. (3-11) probability is calculated using Eq. (3-2).
the machine and the workload on it. Such algorithms are
required to review each moment of machines and based on the P(X|Yi)×P(Yi)
P(Yi|X) = (3-2)
results of this review, the requests are transferred from one P(X)
machine to another. These methods, however, are of greater
complexity than the static method, but they are more efficient In the Eq. (3-2), P (X) is constant for all classes and only the
[4]. A lot of related work has been done on dynamic load values P (X | Yi) × P (Yi) should be highest value. To calculate
balancing and each has been presented with different objective P (X | Yi) under the assumptions of Naïve Bayes, it is assumed
such as response time[5,6,7], scalability[6,8], reducing that class conditional is independent and based on previous
migration time in requests [9,10] and etc. since our objective in training is obtained according to Eq. (3-3).
load balancing is reducing the response time, we mention some
new studies in this field. Nakai et al. have presented a load P(X|Yi) = ∏nk=1 P(xk |Yi) (3-3)
balancing mechanism in 2014 [6] based on reserve policy to
distribute requests between replicated servers. This allowed
overload servers to reserve a part of the capacity of remote In Eq. (3-3) if the values of P (X | Yi) × P (Yi) is more than most
servers before receiving a new request and if requests were other classes in Yi class, this class is selected.
higher than the amount of shared capacity of remote server,
some of requests were discarded. Simulation results show that B. Details of the proposed method
the proposed method reduces the response time and increases
Suppose VM = {VM1, VM2… VMm} is a set of virtual
load balancing. Even though the proposed method reduced the
machines used to host user requests. Also Task = {T1, T2... Tn}
response time, still some of the requests would be discarded and
is a set of tasks that are intended to be run on virtual machines.
that is why this method was not appropriate for our work.
Details of the proposed method for assigning requests are as
Yuan et al. [7] tried to improve the performance of load
follows:
balancing algorithm in 2015. They considered the network
1. First for initial allocation of the requests on virtual machines
structure in addition to technical factors of load balancing. Thus,
MAX-MIN method is used. As from the request queue the
they provided a method which was efficiently applicable in
request with the highest runtime is selected and put on the
network and reduced the level of overhead in network in
machine where the runtime of this request is less. Then the
addition to reducing response time. This proposed algorithm was
requests in the queue waiting and the runtime of requests on
also not suitable for cloud environment due to low productivity
each machine are updated. And it will continue until the
and lack of scalability.
completion of all requests.
Mittal et al. [16] provided a method for scheduling requests in
2. After the initial allocation of the requests to the method by
2016 with the objective of reducing makespan time using load
Max-Min, in this step, the load balancing status of the
balancing algorithm in cloud platform. They compared their
system is investigated. To do this, the standard deviation of
algorithm with some of the most well-known algorithms of load
the system load is obtained and thereby the load balancing
balancing such as MIN-MIN [13], MAX-MIN [11] and
of the system will be evaluated. To calculate the standard
improved algorithms of MAX_MIN [14, 15] and RASA [12].
deviation of the load in the system the Eq.(3-4) can be used:
Simulation results show that their algorithm has better
1
performance than other listed algorithms. We have also σ = √ ∑m
i=1(PTi − PT)
2 (3-4)
m
compared our proposed algorithm with this algorithm.
Where m is the number of machines in the system and PTi
III. PROPOSED METHOD is the processing time of virtual machine i and PT is the
processing time of the system. If the standard deviation is
A. Naïve Bayesian Classification Method greater than the desired threshold, the system is unbalanced
Naïve Bayesian is a statistical method for classification. and requires a request transfer from overloaded machines to
Research shows that although this method compared with other under-loaded machines. For this, the machines are divided
methods such as classification decision tree and selected neural into three categories: overload, balanced and under load and
network classifiers have equal efficiency; in contrast has higher the request is then removed from the overload set and then
accuracy and speed than other methods [17]. in accordance with the following procedure they will
Assume that, there is m classes called {Y1, Y2, … , Ym} and tuple transfer to the members of the set of under-load machines.
X have been given as input. Using the Classifier it is predicted 3. To transfer the requests from the overload class to the
that X belongs to the class with the highest posterior probability, under-load class, Naïve Bayesian Classification Method is
in other words, X belongs to the class Yi if and only if the Eq. used. For this purpose, each of the machines of under-load
(3-1) is true. class is considered as a class. Request tk is selected from the
overload class, and then Eq. (3-5) is calculated for it.
P(Yi|X) ≥ P(Yj|X) j ∈ 1, . . m (3-1)
92
P(tk = cmpatible|VMi) × P(VMi) platform, as it has the minimum makespan time and maximum
P(VMi|tk = compatible) =
P(t k=compatible ) load balancing degree with equitable distribution requests on the
(3-5) virtual machines.
Input :list of tasks output : list of Vm Id for running any task;
For all virtual machines P(t k=compatible ) is constant and is 1. Allocate Tasks to VMs Base on MAX-MIN Algorithm.
obtained according to Eq. (3-6). 2.Calculate Standard deviation for load of all VMs
If (Standard deviation>threshold)
if(Global Capacity on all VM > Requested Capacity for Task) Group VMs based on load as UVM, BVM and OVM
P(t k=compatible ) = 1 if (OVM≠ 𝜑)
Select VM by maximum processing Time in OVM
else
TaskID: Select task by minimum processing time in Selected
P(t k=compatible ) =
−(Global Capacity on all VM−Requested Capacity for Task)
VM in OVM
Requested Capacity for Task Calculate Probablity value Of any VM in UVM based on
(3-6) equation (3-5)
Sort Probablity value any VMs in UVM by ascending order.
P(VMi) For each machine is based on the utilization of the Select VM by maximum Probablity value
machine. Since the utilization of each machine is less, it is better VMID=selected VM id
to be selected, to achieve this goal, the probability of the Allocate selected task to VMID
Else
machine with less utilization increases; therefore P(VMi) is
break;
obtained according to the Eq. (3-7):
3. Repeat step 2 for load of all VMs
(3-7)
P(VMi) = 1 − Ucpu
Figure 1. pseudo code of the proposed method
used CPU Capacity
Where Ucpu =
ALL CPU Capacity TABLE I. HOST SPECIFICATION
The amount of P(tk = cmpatible|VMi) is also calculated
according to Eq. (3-3) based on the product of the probability of Number of
Processing Ram Hard Bandwidth
HostId processing
previous requests put on that machine. speed(MIPS) (MB) (MB) (Mbps)
Cores
As a result, using Naïve Bayesian Classification Method for
transfer of requests from overload machine to under-load
1 4 50000 204800 1048576 102400
machines, a machine from under-load machines is selected
which its compatibility with the considered request is higher 2 2 25000 102400 1048576 102400
than other machines and it has less utilization. This trend has 3 1 10000 51200 1048576 102400
continued until load balancing in the whole system. Figure (1)
shows the pseudo-code of the proposed algorithm.
IV. RESULT EVALUATION If Makespan is calculated according to the Eq. (4-1); Figure (2)
In this section we will discuss the details of the simulation of the indicates a comparison between the Makespan in the Round
algorithm presented in the previous section. Then through the Robin method and scheduling of paper [16] and improved
charts the proposed method are evaluated. This fact that, the algorithm MAX-MIN [18] and the proposed algorithm.
environment is software as a service and also our tools for
simulation is CloudSim software [18], would be useful. This Makespan = max{CTij |i ∈ T, i = 1,2, … , n and j ∈ VM, j =
simulator allows us to create a virtualized environment and 1,2, … , m} (4-1)
supports the allocation of resources based on the request. In fact,
the core of the simulator for modeling our method is extended. Figure. (2) Shows the use of proposed method has relatively
Cloud computing has been simulated to assess this sector of a better Ref. [16] at Makespan time. Since in ref. [16] the
data center consisting 3 hosts with virtualization capabilities. In combination of algorithms MIN-MIN and RASA are used and
fact, it is assumed that the virtual instruments such as Xen have in these algorithms only the runtime of the request on the virtual
been installed, which can share resources. The properties of each machine is considered but in the proposed algorithm in addition
host are according to Table (1). to the selection of algorithm MAX-MIN, load balancing
16 virtual machines with different characteristics are put on this algorithm is also used for suitable primary distribution and use
data center. Each virtual machine runs some applications with of its benefits; and with the displacement of requests from
variable number of instructions between 500 and 4500. As overload machines and putting them on the under-load
mentioned earlier, we aim to provide comprehensive and machines, based on the utilization of the machine and
appropriate algorithms for request scheduling in the cloud compatibility of the request with machine with classification
techniques, the amount of Makespan decreases.
93
20 1.6
1.4
15
Degree Of Iimbalance
Makespan(s)
1.2
10 1
0.8
5
0.6
0 0.4
0 20 40 60 80 100
0.2
Number Of Task
0
0 20 40 60 80 100
RR Improved RASA[16]
Number Of Tasks
Improved Max-MIN[18] Proposed
Improved RASA[16] Proposed
Figure 2. Makespan Comparison
Figure 4. degree of imbalance Comparison
Figure (3) indicate a comparison between the average response
It is clear that the use of load balancing methods reduces the
time for the requests, between the proposed method and Round
degree of imbalance and thus the proposed method has lower
Robin method and task scheduling of paper [16].
load balancing degree than other methods. This is due to the
reduction in the runtime of the longest duration and balancing
10 the requests between virtual machines in the proposed method.
9 As indicated in the Figure (4), if the number of requests is lower,
8 due to fewer machines with overload and equal longest runtime
Response Time(S)
7 in both methods, the degree of load imbalance is close to each
6 other, with the increasing requests of the proposed method, and
5 decreasing runtime of the longest task and balanced distribution
4 of requests, the degree of imbalance decreases. There are many
3 classification methods available including linear classifiers,
2
support vector machines, decision trees and neural networks.
Figure. (5) Shows the comparison between the makespan for
1
proposed method by using support vector machines
0
Classification and proposed method by using Naïve Bayesian
0 20 40 60 80 100
Classification.
Number Of Tasks
15
RR Improved RASA[16]
Makespan
Improved Max-MIN[18] Proposed 10
Figure 3. Response time Comparison 5
Another criterion which is important is the degree of imbalance 0
defined as the Eq. (4-2) [5]. 0 20 40 60 80 100
Tmax −Tmin Number of Tasks
DI = (4-2)
Tavg
In Eq. (4-2), Tmax and Tmin are the highest and lowest runtimes Proposed by using SVM Proposed
between virtual machines and Tavg is the average runtime among
all virtual machines. Figure. (4) Shows the comparison between
Figure 5. Makespan Comparison
the degree of imbalance in the method provided in algorithm of
paper [16] and the proposed algorithm. The horizontal axis
shows the number of requests and vertical axis shows the degree
of imbalance.
94
V. Conclusions and future works [15] U. Bhoi, P. N. Ramanuj, "Enhanced Maxmin Task Scheduling
Algorithm in Cloud Computing," International Journal of Application
In this paper we have provided a task scheduling method with or Innovation in Engineering & Management, ISSN2319-4847, vol.
Naïve Bayesian Classification Method aim at creating load 2(4), 2013.
balancing. This algorithm with classification of virtual machines [16] S. Mittal, A. Katal, "An Optimized Task Scheduling Algorithm
in Cloud Computing," in 2016 IEEE 6th International Conference
and selection of suitable machine for the existing requests on Advanced Computing (IACC), pp. 197-202, 2016.
reduces the makespan time and increases the level of load [17] H. W. Lijuan Zhou, W. Wang, "Parallel Implementation
balancing. In the following, the proposed method is compared of Classification Algorithms Based on Cloud Computing
with the basic Round Robin method and the scheduling method Environment," Indonesian Journal of Electrical Engineering and
[16] in terms of the criteria of Makespan, response time and load Computer Science, vol. 10(5), pp. 1087-1092, 2012.
balancing level. In the future we plan to do this proposed method [18] R. Kuar, P. Luthra, "Load Balancing in Cloud System using Max
Min and Min Min Algorithm", International Journal of
for workflow scheduling; and also consider other criteria such as Computer Applications, National Conference on Emerging
reduced cost for the service providers. Trends in Computer Technology (NCETCT-2014).
REFERENCES
[1] K. A. Nuaimi, N. Mohamed, M. A. Nuaimi, J. Al-Jaroodi, "A survey of
load balancing in cloud computing: Challenges and algorithms," in
Network Cloud Computing and Applications (NCCA), 2012 Second
Symposium on, pp. 137-142, 2012.
[2] S. K. Chaharsooghi, A. H. M. Kermani, "An effective ant colony
optimization algorithm (ACO) for multi-objective resource allocation
problem (MORAP)," Applied Mathematics and Computation, vol.
200, pp. 167-177, 2008.
[3] M. Ajit, G. Vidya, "VM level load balancing in cloud environment," in
2013 Fourth International Conference on Computing, Communications
and Networking Technologies (ICCCNT), pp. 1-5, 2013.
[4] L. Guo, S. Zhao, S. Shen, C. Jiang, "Task scheduling optimization in
cloud computing based on heuristic algorithm," Journal of Networks,
vol. 7, pp. 547-553, 2012.
[5] P. V. Krishna, "Honey bee behavior inspired load balancing of tasks
in cloud computing environments," Applied Soft Computing, vol.
13, pp. 2292-2303, 2013.
[6] A. Nakai, E. Madeira, L. E. Buzato, "On the Use of Resource
Reservation for Web Services Load Balancing," Journal of
Network and Systems Management, vol. 23(3), pp. 502-538, 2014.
[7] Daraghmi, E. Y. and S.-M. Yuan. "A small world based overlay
network for improving dynamic load-balancing." Journal of Systems and
Software, vol. 107, pp. 187-203, 2015.
[8] Y. Lu, Q. Xie, G. Kliot, A. Geller, J. Larus, A. Greenberg, "Join-Idle-
Queue: A novel load balancing algorithm for dynamically scalable web
services," Performance Evaluation, vol. 68(11), pp. 1056-1071, 2011.
[9] F. Ramezani, J. Lu, F. K. Hussain, "Task-Based System Load Balancing
in Cloud Computing Using Particle Swarm Optimization," International
Journal of Parallel Programming, vol. 42(5), pp. 739-754, 2013.
[10] Y. Fang, F. Wang, J. Ge, "A Task Scheduling Algorithm Based on
Load Balancing in Cloud Computing," Web Information Systems and
Mining: International Conference, WISM 2010, Sanya, China,
October 23-24, 2010. Proceedings. F. L. Wang, Z. Gong, X. Luo and
J. Lei. Berlin, Heidelberg, Springer Berlin Heidelberg, pp. 271-277,
2010.
[11] T. Kokilavani, G. Amalarethinam,"Load Balanced Min-Min Algorithm
for static Meta-Task Scheduling in Grid Computing," International
Journal of Computer Applications (0975-8887), vol. 20(2), 2011.
[12] S. Parsa, R. Entezari-Maleki, "RASA: A New Grid Task
Scheduling Algorithm," International Journal of Digital
Content Technology and its Applications, vol. 3(4), pp. 91-99, 2009.
[13] G. Amalarethinam, V. Kfatheen, "Max-min Average Algorithm for
Scheduling Tasks in Grid Computing Systems," International Journal of
Computer Science and Information Technologies, vol. 3(2), 3659-3663,
2012.
[14] O. M. Elzeki, M. Z. Reshad, M. A. Elsoud, "Improved Max-Min
Algorithm in Cloud Computing", International Journal of
Computer Applications (0975 – 8887), vol. 50(12), pp. 22-27, 2012.
95