=Paper= {{Paper |id=Vol-1852/p15 |storemode=property |title=Dynamic task scheduling in cloud computing based on naïve bayesian classifier |pdfUrl=https://ceur-ws.org/Vol-1852/p15.pdf |volume=Vol-1852 |authors=Fatemeh Ebadifard,Seyed Morteza Babamir }} ==Dynamic task scheduling in cloud computing based on naïve bayesian classifier== https://ceur-ws.org/Vol-1852/p15.pdf
       Dynamic task scheduling in cloud computing
            based on Naïve Bayesian classifier
                      Fatemeh Ebadifard                                                        Seyed Morteza Babamir
            Department of Computer Engineering                                           Department of Computer Engineering
                     University of Kashan                                                       University of Kashan
                         Kashan, Iran                                                               Kashan, Iran
            e-mail: ebadifard.fatemeh@gmail.com                                            e-mail: babamir@kashanu.ac.ir



   Abstract—the issue of task scheduling in a cloud environment          time, increases the efficiency of resources. The advantage of the
is one of the most important issues that must be considered              proposed algorithm in the above method is to reduce the
by the cloud platform providers in data centers. The use of the          overload and increase resource utilization. Simulation results
right solution to solve this problem enables cloud platform              show that this method performs well in completion time of the
providers to have the most use of available resources; and also
increase the customer satisfaction by providing quality
                                                                         longest task and increasing the level of load balancing. In
of service parameters. In this paper it has been tried to provide        summary it can be said that our main focus in this article
a dynamic scheduling algorithm using machine learning                    includes the following:
techniques      and naïve-Bayes     classifier   in    a    cloud        1. Providing an appropriate method for scheduling the requests
environment. The proposed method is one of the dynamic task              using data mining techniques to reduce makespan time and
scheduling methods and load distribution at any moment is                increase resource utilization;
conducted according to the latest information from previous              2. The use of classification techniques for equitable distribution
and current server status. The distinction of this method                of requests;
with previous studies is the use of data mining techniques               3. Targeted analysis to show the effectiveness of the proposed
(classification) in load distribution. Since this classification
method has higher accuracy and speed compared with
                                                                         algorithm, compared to previous algorithms;
other methods, therefore this classifier helps us to achieve the         The remainder of this paper consists of the following sections:
optimal solution in less time. Simulation results show that              Section 2 reviews related work, in section 3 after the
the proposed method has a good improvement in terms of                   introduction of Naïve Bayesian classifier and formulation of the
Makespan time and load balancing degree.                                 problem, details of the proposed method is described in detail.
                                                                         In Section 4, simulation and evaluation of the proposed method
   Keywords—task scheduling; cloud computing; naïve-                     is presented and finally, in Section 5, conclusions and future
Bayes classifier; Virtual machine; Makespan                              works are expressed.
                     I.    INTRODUCTION                                                        II.   RELATED WORK
Cloud environment provides a huge context of servers in the              Different studies are carried out in conjunction with load
data center, so that when users request resources, provides              balancing in cloud environment, load balancing algorithms are
them in shared mode. To enable the applications to use the               generally divided into two categories: static and dynamic: In the
resources in accordance with their requirements, an                      static method, the allocation of tasks to virtual machines is based
appropriate mechanism to distribute requests to virtual                  on the functionality of virtual machine and initial condition of
machines is required. That is why the scheduling of the                  each machine; in other words, this process is only based on the
requests is one of the most important issues in cloud                    data of the nodes and their features. This information includes
environment and in recent years it has attracted much                    the amount of processing power, internal memory and storage
attention of researchers. Task    scheduling     in    cloud             capabilities and the power of communications between other
computing means optimal allocation of requests to the                    virtual machines. An important feature of static algorithms is
computing resources in the data center [1]. In scheduling,               that, these algorithms do not consider changes occurred
tasks are allocated to different types of virtual                        dynamically on virtual machines at any moment. In addition,
machines with regard to the limitations specified by the                 they do not have the ability to adapt to changing workloads on
user and service provider. One of the major challenges in                each virtual machine over time. Some static algorithms are
task scheduling is the equitable distribution requests on                Round Robin (RR) algorithms or weighted RR or load balancing
the resources according to         application requirements.             algorithm using ant colony algorithm [2] or procedures based on
Providing scheduling algorithm with the aim of load                      the amount of the resources of physical machines [3].
balancing can reduce makespan time and increase                          Unlike static algorithms, dynamic method of distribution in
productivity of machines.                                                addition to the basic functionality of each virtual machine,
In this paper, a dynamic task scheduling algorithm has
been proposed to increase the load balancing in the
cloud environment. Using Naïve Bayesian classifier
technique, the proposed algorithm has tried to put the requests
on the machines in a balanced way; as in addition to the
reduction of makespan
Copyright © 2017 held by the authors
                                                                    91
assigns tasks to virtual machines based on the current status of           In Eq. (3-11) probability is calculated using Eq. (3-2).
the machine and the workload on it. Such algorithms are
required to review each moment of machines and based on the                                 P(X|Yi)×P(Yi)
                                                                                P(Yi|X) =                   (3-2)
results of this review, the requests are transferred from one                                   P(X)

machine to another. These methods, however, are of greater
complexity than the static method, but they are more efficient             In the Eq. (3-2), P (X) is constant for all classes and only the
[4]. A lot of related work has been done on dynamic load                   values P (X | Yi) × P (Yi) should be highest value. To calculate
balancing and each has been presented with different objective             P (X | Yi) under the assumptions of Naïve Bayes, it is assumed
such as response time[5,6,7], scalability[6,8], reducing                   that class conditional is independent and based on previous
migration time in requests [9,10] and etc. since our objective in          training is obtained according to Eq. (3-3).
load balancing is reducing the response time, we mention some
new studies in this field. Nakai et al. have presented a load                   P(X|Yi) = ∏nk=1 P(xk |Yi) (3-3)
balancing mechanism in 2014 [6] based on reserve policy to
distribute requests between replicated servers. This allowed
overload servers to reserve a part of the capacity of remote               In Eq. (3-3) if the values of P (X | Yi) × P (Yi) is more than most
servers before receiving a new request and if requests were                other classes in Yi class, this class is selected.
higher than the amount of shared capacity of remote server,
some of requests were discarded. Simulation results show that              B. Details of the proposed method
the proposed method reduces the response time and increases
                                                                           Suppose VM = {VM1, VM2… VMm} is a set of virtual
load balancing. Even though the proposed method reduced the
                                                                           machines used to host user requests. Also Task = {T1, T2... Tn}
response time, still some of the requests would be discarded and
                                                                           is a set of tasks that are intended to be run on virtual machines.
that is why this method was not appropriate for our work.
                                                                           Details of the proposed method for assigning requests are as
Yuan et al. [7] tried to improve the performance of load
                                                                           follows:
balancing algorithm in 2015. They considered the network
                                                                           1. First for initial allocation of the requests on virtual machines
structure in addition to technical factors of load balancing. Thus,
                                                                                MAX-MIN method is used. As from the request queue the
they provided a method which was efficiently applicable in
                                                                                request with the highest runtime is selected and put on the
network and reduced the level of overhead in network in
                                                                                machine where the runtime of this request is less. Then the
addition to reducing response time. This proposed algorithm was
                                                                                requests in the queue waiting and the runtime of requests on
also not suitable for cloud environment due to low productivity
                                                                                each machine are updated. And it will continue until the
and lack of scalability.
                                                                                completion of all requests.
Mittal et al. [16] provided a method for scheduling requests in
                                                                           2. After the initial allocation of the requests to the method by
2016 with the objective of reducing makespan time using load
                                                                                Max-Min, in this step, the load balancing status of the
balancing algorithm in cloud platform. They compared their
                                                                                system is investigated. To do this, the standard deviation of
algorithm with some of the most well-known algorithms of load
                                                                                the system load is obtained and thereby the load balancing
balancing such as MIN-MIN [13], MAX-MIN [11] and
                                                                                of the system will be evaluated. To calculate the standard
improved algorithms of MAX_MIN [14, 15] and RASA [12].
                                                                                deviation of the load in the system the Eq.(3-4) can be used:
Simulation results show that their algorithm has better
                                                                                        1
performance than other listed algorithms. We have also                          σ = √ ∑m
                                                                                       i=1(PTi − PT)
                                                                                                     2 (3-4)
                                                                                        m
compared our proposed algorithm with this algorithm.
                                                                                Where m is the number of machines in the system and PTi
                    III.   PROPOSED METHOD                                      is the processing time of virtual machine i and PT is the
                                                                                processing time of the system. If the standard deviation is
A. Naïve Bayesian Classification Method                                         greater than the desired threshold, the system is unbalanced
Naïve Bayesian is a statistical method for classification.                      and requires a request transfer from overloaded machines to
Research shows that although this method compared with other                    under-loaded machines. For this, the machines are divided
methods such as classification decision tree and selected neural                into three categories: overload, balanced and under load and
network classifiers have equal efficiency; in contrast has higher               the request is then removed from the overload set and then
accuracy and speed than other methods [17].                                     in accordance with the following procedure they will
Assume that, there is m classes called {Y1, Y2, … , Ym} and tuple               transfer to the members of the set of under-load machines.
X have been given as input. Using the Classifier it is predicted           3.   To transfer the requests from the overload class to the
that X belongs to the class with the highest posterior probability,             under-load class, Naïve Bayesian Classification Method is
in other words, X belongs to the class Yi if and only if the Eq.                used. For this purpose, each of the machines of under-load
(3-1) is true.                                                                  class is considered as a class. Request tk is selected from the
                                                                                overload class, and then Eq. (3-5) is calculated for it.
    P(Yi|X) ≥ P(Yj|X) j ∈ 1, . . m (3-1)




                                                                      92
                              P(tk = cmpatible|VMi) × P(VMi)                     platform, as it has the minimum makespan time and maximum
P(VMi|tk = compatible) =
                                        P(t k=compatible )                       load balancing degree with equitable distribution requests on the
                                                                    (3-5)        virtual machines.

                                                                                    Input :list of tasks output : list of Vm Id for running any task;
For all virtual machines P(t k=compatible ) is constant and is                      1. Allocate Tasks to VMs Base on MAX-MIN Algorithm.
obtained according to Eq. (3-6).                                                    2.Calculate Standard deviation for load of all VMs
                                                                                      If (Standard deviation>threshold)
if(Global Capacity on all VM > Requested Capacity for Task)                                 Group VMs based on load as UVM, BVM and OVM
          P(t k=compatible ) = 1                                                                if (OVM≠ 𝜑)
                                                                                                Select VM by maximum processing Time in OVM
else
                                                                                                TaskID: Select task by minimum processing time in Selected
 P(t k=compatible ) =
         −(Global Capacity on all VM−Requested Capacity for Task)
                                                                                    VM in OVM
                       Requested Capacity for Task                                              Calculate Probablity value Of any VM in UVM based on
                                                                    (3-6)           equation (3-5)
                                                                                                Sort Probablity value any VMs in UVM by ascending order.
P(VMi) For each machine is based on the utilization of the                                      Select VM by maximum Probablity value
machine. Since the utilization of each machine is less, it is better                            VMID=selected VM id
to be selected, to achieve this goal, the probability of the                                    Allocate selected task to VMID
                                                                                      Else
machine with less utilization increases; therefore P(VMi) is
                                                                                           break;
obtained according to the Eq. (3-7):
                                                                                     3. Repeat step 2 for load of all VMs
                                                               (3-7)
P(VMi) = 1 − Ucpu
                                                                                           Figure 1. pseudo code of the proposed method
                          used CPU Capacity
 Where          Ucpu =
                           ALL CPU Capacity                                                         TABLE I.        HOST SPECIFICATION


The amount of P(tk = cmpatible|VMi) is also calculated
according to Eq. (3-3) based on the product of the probability of                           Number of
                                                                                                         Processing     Ram        Hard        Bandwidth
                                                                                  HostId    processing
previous requests put on that machine.                                                                   speed(MIPS)    (MB)       (MB)        (Mbps)
                                                                                            Cores
As a result, using Naïve Bayesian Classification Method for
transfer of requests from overload machine to under-load
                                                                                     1          4           50000       204800     1048576       102400
machines, a machine from under-load machines is selected
which its compatibility with the considered request is higher                        2          2           25000       102400     1048576       102400
than other machines and it has less utilization. This trend has                      3          1           10000        51200     1048576       102400
continued until load balancing in the whole system. Figure (1)
shows the pseudo-code of the proposed algorithm.
                    IV.     RESULT EVALUATION                                    If Makespan is calculated according to the Eq. (4-1); Figure (2)
In this section we will discuss the details of the simulation of the             indicates a comparison between the Makespan in the Round
algorithm presented in the previous section. Then through the                    Robin method and scheduling of paper [16] and improved
charts the proposed method are evaluated. This fact that, the                    algorithm MAX-MIN [18] and the proposed algorithm.
environment is software as a service and also our tools for
simulation is CloudSim software [18], would be useful. This                       Makespan = max{CTij |i ∈ T, i = 1,2, … , n and j ∈ VM, j =
simulator allows us to create a virtualized environment and                      1,2, … , m} (4-1)
supports the allocation of resources based on the request. In fact,
the core of the simulator for modeling our method is extended.                   Figure. (2) Shows the use of proposed method has relatively
Cloud computing has been simulated to assess this sector of a                    better Ref. [16] at Makespan time. Since in ref. [16] the
data center consisting 3 hosts with virtualization capabilities. In              combination of algorithms MIN-MIN and RASA are used and
fact, it is assumed that the virtual instruments such as Xen have                in these algorithms only the runtime of the request on the virtual
been installed, which can share resources. The properties of each                machine is considered but in the proposed algorithm in addition
host are according to Table (1).                                                 to the selection of algorithm MAX-MIN, load balancing
16 virtual machines with different characteristics are put on this               algorithm is also used for suitable primary distribution and use
data center. Each virtual machine runs some applications with                    of its benefits; and with the displacement of requests from
variable number of instructions between 500 and 4500. As                         overload machines and putting them on the under-load
mentioned earlier, we aim to provide comprehensive and                           machines, based on the utilization of the machine and
appropriate algorithms for request scheduling in the cloud                       compatibility of the request with machine with classification
                                                                                 techniques, the amount of Makespan decreases.




                                                                            93
                       20                                                                                                 1.6
                                                                                                                          1.4
                       15




                                                                                                   Degree Of Iimbalance
   Makespan(s)




                                                                                                                          1.2
                       10                                                                                                  1
                                                                                                                          0.8
                       5
                                                                                                                          0.6
                       0                                                                                                  0.4
                            0       20         40        60           80    100
                                                                                                                          0.2
                                            Number Of Task
                                                                                                                           0
                                                                                                                                0         20      40        60         80        100
                            RR                          Improved RASA[16]
                                                                                                                                               Number Of Tasks
                            Improved Max-MIN[18]        Proposed
                                                                                                                                     Improved RASA[16]           Proposed
                                   Figure 2. Makespan Comparison

                                                                                                                                Figure 4. degree of imbalance Comparison
Figure (3) indicate a comparison between the average response
                                                                                          It is clear that the use of load balancing methods reduces the
time for the requests, between the proposed method and Round
                                                                                          degree of imbalance and thus the proposed method has lower
Robin method and task scheduling of paper [16].
                                                                                          load balancing degree than other methods. This is due to the
                                                                                          reduction in the runtime of the longest duration and balancing
                       10                                                                 the requests between virtual machines in the proposed method.
                        9                                                                 As indicated in the Figure (4), if the number of requests is lower,
                        8                                                                 due to fewer machines with overload and equal longest runtime
    Response Time(S)




                        7                                                                 in both methods, the degree of load imbalance is close to each
                        6                                                                 other, with the increasing requests of the proposed method, and
                        5                                                                 decreasing runtime of the longest task and balanced distribution
                        4                                                                 of requests, the degree of imbalance decreases. There are many
                        3                                                                 classification methods available including linear classifiers,
                        2
                                                                                          support vector machines, decision trees and neural networks.
                                                                                          Figure. (5) Shows the comparison between the makespan for
                        1
                                                                                          proposed method by using support vector machines
                        0
                                                                                          Classification and proposed method by using Naïve Bayesian
                            0        20        40        60           80    100
                                                                                          Classification.
                                            Number Of Tasks
                                                                                                            15
                            RR                          Improved RASA[16]
                                                                                             Makespan




                            Improved Max-MIN[18]        Proposed                                            10

                                 Figure 3. Response time Comparison                                                   5

 Another criterion which is important is the degree of imbalance                                                      0
defined as the Eq. (4-2) [5].                                                                                              0            20       40         60          80        100
           Tmax −Tmin                                                                                                                          Number of Tasks
DI =                                                                         (4-2)
                       Tavg
In Eq. (4-2), Tmax and Tmin are the highest and lowest runtimes                                                                     Proposed by using SVM             Proposed
between virtual machines and Tavg is the average runtime among
all virtual machines. Figure. (4) Shows the comparison between
                                                                                                                                      Figure 5. Makespan Comparison
the degree of imbalance in the method provided in algorithm of
paper [16] and the proposed algorithm. The horizontal axis
shows the number of requests and vertical axis shows the degree
of imbalance.




                                                                                     94
                 V.     Conclusions and future works                                  [15]  U. Bhoi, P. N. Ramanuj, "Enhanced Maxmin Task Scheduling
                                                                                          Algorithm in Cloud Computing," International Journal of Application
In this paper we have provided a task scheduling method with                              or Innovation in Engineering & Management, ISSN2319-4847, vol.
Naïve Bayesian Classification Method aim at creating load                                 2(4), 2013.
balancing. This algorithm with classification of virtual machines                     [16] S. Mittal, A. Katal, "An Optimized Task Scheduling Algorithm
                                                                                          in Cloud Computing," in 2016 IEEE 6th International Conference
and selection of suitable machine for the existing requests                               on Advanced Computing (IACC), pp. 197-202, 2016.
reduces the makespan time and increases the level of load                             [17] H. W. Lijuan Zhou, W. Wang, "Parallel Implementation
balancing. In the following, the proposed method is compared                              of Classification Algorithms Based on Cloud Computing
with the basic Round Robin method and the scheduling method                               Environment," Indonesian Journal of Electrical Engineering and
[16] in terms of the criteria of Makespan, response time and load                         Computer Science, vol. 10(5), pp. 1087-1092, 2012.
balancing level. In the future we plan to do this proposed method                     [18] R. Kuar, P. Luthra, "Load Balancing in Cloud System using Max
                                                                                          Min and Min Min Algorithm", International Journal of
for workflow scheduling; and also consider other criteria such as                         Computer Applications, National        Conference  on    Emerging
reduced cost for the service providers.                                                   Trends in Computer Technology (NCETCT-2014).

                              REFERENCES

[1]   K. A. Nuaimi, N. Mohamed, M. A. Nuaimi, J. Al-Jaroodi, "A survey of
     load balancing in cloud computing: Challenges and algorithms," in
     Network Cloud Computing and Applications (NCCA), 2012 Second
     Symposium on, pp. 137-142, 2012.
[2] S. K. Chaharsooghi, A. H. M. Kermani, "An effective ant colony
     optimization algorithm (ACO) for multi-objective resource allocation
     problem (MORAP)," Applied Mathematics and Computation, vol.
     200, pp. 167-177, 2008.
[3] M. Ajit, G. Vidya, "VM level load balancing in cloud environment," in
     2013 Fourth International Conference on Computing, Communications
     and Networking Technologies (ICCCNT), pp. 1-5, 2013.
[4] L. Guo, S. Zhao, S. Shen, C. Jiang, "Task scheduling optimization in
     cloud computing based on heuristic algorithm," Journal of Networks,
     vol. 7, pp. 547-553, 2012.
[5] P. V. Krishna, "Honey bee behavior inspired load balancing of tasks
     in cloud computing environments," Applied Soft Computing, vol.
     13, pp. 2292-2303, 2013.
[6] A. Nakai, E. Madeira, L. E. Buzato, "On the Use of Resource
     Reservation for Web Services Load Balancing," Journal of
     Network and Systems Management, vol. 23(3), pp. 502-538, 2014.
[7] Daraghmi, E. Y. and S.-M. Yuan. "A small world based overlay
     network for improving dynamic load-balancing." Journal of Systems and
     Software, vol. 107, pp. 187-203, 2015.
[8] Y. Lu, Q. Xie, G. Kliot, A. Geller, J. Larus, A. Greenberg, "Join-Idle-
     Queue: A novel load balancing algorithm for dynamically scalable web
     services," Performance Evaluation, vol. 68(11), pp. 1056-1071, 2011.
[9] F. Ramezani, J. Lu, F. K. Hussain, "Task-Based System Load Balancing
     in Cloud Computing Using Particle Swarm Optimization," International
     Journal of Parallel Programming, vol. 42(5), pp. 739-754, 2013.
[10] Y. Fang, F. Wang, J. Ge, "A Task Scheduling Algorithm Based on
     Load Balancing in Cloud Computing," Web Information Systems and
     Mining: International Conference, WISM 2010, Sanya, China,
     October 23-24, 2010. Proceedings. F. L. Wang, Z. Gong, X. Luo and
     J. Lei. Berlin, Heidelberg, Springer Berlin Heidelberg, pp. 271-277,
     2010.
[11] T. Kokilavani, G. Amalarethinam,"Load Balanced Min-Min Algorithm
     for static Meta-Task Scheduling in Grid Computing," International
     Journal of Computer Applications (0975-8887), vol. 20(2), 2011.
[12] S. Parsa, R. Entezari-Maleki, "RASA: A New Grid Task
     Scheduling      Algorithm,"     International Journal       of    Digital
     Content Technology and its Applications, vol. 3(4), pp. 91-99, 2009.
[13] G. Amalarethinam, V. Kfatheen, "Max-min Average Algorithm for
     Scheduling Tasks in Grid Computing Systems," International Journal of
     Computer Science and Information Technologies, vol. 3(2), 3659-3663,
     2012.
[14] O. M. Elzeki, M. Z. Reshad, M. A. Elsoud, "Improved Max-Min
     Algorithm in Cloud Computing", International Journal of
     Computer Applications (0975 – 8887), vol. 50(12), pp. 22-27, 2012.




                                                                                 95