=Paper= {{Paper |id=Vol-2351/paper6 |storemode=property |title=Economic Data Replication Management in the Cloud |pdfUrl=https://ceur-ws.org/Vol-2351/paper_38.pdf |volume=Vol-2351 |authors=Abdenour Lazeb,Riad Mokadem,Ghalem Belalem |dblpUrl=https://dblp.org/rec/conf/jeri/LazebMB19 }} ==Economic Data Replication Management in the Cloud== https://ceur-ws.org/Vol-2351/paper_38.pdf
    Economic data replication management in the cloud
               Abdenour Lazeb1, Riad Mokadem2 and Ghalem Belalem3
                        1 Université Oran1, Ahmed Ben Bella , Algérie

                       lazeb.abdenour@edu.univ-oran1.dz
    2 Institut de Recherche en Informatique de Toulouse (IRIT), Paul Sabatier University,

                                       Toulouse, France
                                Riad.Mokadem@irit.fr
                        3 Université Oran1, Ahmed Ben Bella , Algérie

                                 ghalem1dz@gmail.com



       Abstract. The applications produce huge volumes of data that are distributed on
       remote and heterogeneous sites. This generates problems related to access and
       sharing of such data. As a result, managing data in large-scale environments is a
       real challenge. In this context, large-scale data management systems often use
       data replication, a well-known technique that treats generated problems by stor-
       ing multiple copies of data, called replicas, across multiple nodes. Most of the
       replication strategies in these environments are difficult to adapt to cloud envi-
       ronments. They aim to achieve the best performance of the system without meet-
       ing the important objectives of the cloud provider. Our proposed approach gen-
       erates the optimal replication strategy. In theory, we show that our algorithm sig-
       nificantly improves provider gain over a wide range of cloud and SLA-conditions
       without neglecting customer satisfaction.

       Keywords: Data Management, Cloud Systems, SLA, Provider, Data Replica-
       tion, Cost Model, Business Model, Performance.




1      Introduction

   Cloud Computing could be common term utilized to portray a modern lesson of or-
ganizing based computing that takes put over the Web.
   In addition, the stage gives on-demand services, that are continuously on, anytime
and any place. Pay for utilizing and as required, elastic(scale up and down in capacity
and functionalities). The equipment and software services are accessible to com-
mon public, undertakings, organizations and businesses markets But, what commit-
ments does the cloud provider that you have chosen? How long will it take to restart
your solution in case of a problem? Can he lose your data? These are classic questions
that I am regularly asked when I talk about cloud computing the answer is in the SLA
established between a cloud provider and its tenants, i.e., consumers. which includes
2

the service level objectives (SLO) of the tenant, for example, availability and perfor-
mance, which must be met by the provider.
      For that reason, It is very important to focus on replication strategies for efficient
and fast exploitation data in cloud. These strategies address classic problems such as:
(i) which data to replicate? (ii) when to replicate these data? (iii) where to replicate
these data but also to specific issues of the cloud environment such as (iv) determine
the number of necessary replicas such as the objectives of the tenant will be satisfied
while ensuring a profit for the cloud provider.
   Some solutions can be brought to this problem:
i. The proposal of a cost model allowing replication only if it is necessary.
ii. Effective placement of data replicas.
iii. An elastic management of the number of replicas.
iv. The proposition of an economic model for the cloud provider such as information
replication is advantageous. Usually conditioned by a minimization of the punish-
ments paid by the provider which makes it possible to extend its economic profit.
     To guarantee failure tolerance, a capacity advertising copies data among differ-
ent copies. These copies store the same set of information, so in case any of cop-
ies is lost, information may still be gotten and recouped from the other replicas.
      In this paper, we will propose an algorithm that mixes all these solutions for
good replication management.
      This paper is organized as follows: Section 2 tackle Related work, Section 3 ex-
plains our approach aspects ; Positioning of our approach is presented in Section 4
And at the last section contains the conclusions and future work.



2      Related Work

  Several types of research have been dedicated to the field of dynamic replication. we
find:
        Fei Xie et al.[1] set three threshold parameters for dataset conditions among da-
tasets, get to frequencies of datasets, and the storage capacity of information centers.
Dataset reliance among datasets and get to recurrence for each dataset are calculated as
limitations of the dataset. They utilize the limit esteem of capacity space to restrain
information replication to maintain a strategic distance from flood issues and guarantee
full errand completion in the corresponding area. They moreover classify information
sorts into three categories, settled dataset, free-flexible dataset and constrained-flexible
dataset, to develop a mapping between datasets and each information center. By receiv-
ing their methodology, they endeavor to assist diminish information development and
information exchange cost. Their work I find it a little expensive compared to ours and
does not treat current state each time.
        Tadeusz et al. [2] propose a strategy for reproducing NoSQL information. The
calculation is called Lorq. The most highlights of Lorq are (a) information replication
is realized by implies of reproducing logs putting away upgrade operations, and so-
called pulse operation sent by the pioneer; (b) the preparing and replication methodol-
ogies ensure that inevitably all operations in each replica are executed within the same
                                                                                         3

arrange and no operation is misplaced. An uncommon consideration is paid to distinc-
tive sorts of consistency, which can be ensured by the framework. they propose a strat-
egy based on data put away by client administrations to ensure diverse consistency lev-
els, in this manner actualizing SLA usefulness But replicated data types specification,
verification and optimality [3] are neglecting.
      Xiuguo et al. [5] went towards accomplishing the least taken a toll copies dispersion
benchmark in a down to earth way, they propose a reproductions arrangements proce-
dure show, counting the way to distinguish the need of making reproduction, and plan
an calculation for copies situations that can effectively decrease the whole taking a toll
within the cloud and proposing information sets administration fetched models, includ-
ing capacity fetched and exchange taken a toll; showing a novel worldwide information
set copies arrangements methodology from cost-effective see named MCRP, which is
an inexact minimum-cost arrangement. They proposed a cost-effective information rep-
lication methodology with a thought of get to recurrence and the average response time
to decide whether the dataset should be imitated or not in cloud environment.
       Sathiya et al. [6] examine changes on a consistency convention called LibRe,
which acts as an in-between consistency technique between the default inevitable con-
sistency and the solid consistency choices determined from the crossing point property.
The initial LibRe convention utilized a registry, which records the list of replica nodes
containing the foremost later form of the information things. Consequently, alluding to
the registry amid examined time makes a difference to forward the studied demands to
a reproduction hub holding the foremost later form of the required information thing.
For the other side, protocol would encounter brief inconsistency.
          Tos et al. [8] propose Execution and Benefit Arranged Information Replication
Methodology (PEPR) that guarantees SLA ensures, e.g. accessibility and execution, to
the occupant whereas maximizing the financial advantage of the cloud provider. For
the degree of execution, they consider reaction time ensure as a fundamental portion of
the SLA. In PEPR, when assessing an inquiry, in the event that an assessed reaction
time esteem is more prominent than the SLO reaction time limit, this implies that a
replication prepare may be activated. At that time, economic benefit, i.e. profitability,
of the cloud provider is additionally estimated. Replication choice is made as it were
when both the reaction time and financial advantage of the provider are satisfied. The
number of copies is powerfully balanced taking after whether the SLA objectives are
fulfilled over time. Additionally, the least number of copies are continuously kept to
guarantee least availability.Response time estimation for inhabitant inquiries are calcu-
lated when the queries are arrive at the cloud. If the estimation show that a alluring
execution cannot be fulfilled, information replication is performed, but as it were when
it is financially attainable for the provider.
         Yaser et al. [9] think about is propelled by these pioneer considers as none of
them can at the same time reply around arrangements and relocation times of objects.
To address these questions, they make the taking after key commitments: To begin
with, by misusing energetic programming, they define offline taken a toll optimization
issue in which the ideal fetched of capacity, Get, Put, and movement is calculated where
the precise future workload is assumed to be known a priori. Moment, they propose
two online calculations to discover near-optimal taken a toll.
4


3        The Proposed Strategy

   We propose a replication strategy which contains dependent and independent mod-
ules in the architecture shown in Fig 1. Each module plays a role in the Work Process.
.




                          Fig. 1. Architecture of Proposed approach


3.1      Model System

For our model, we have a Master, Global System Module and data-centers. Master
composed of five modules: Solver, Repair, Springy, SLA-Violation and DC Mapping.
    Global System Module contains Min Cost Access and data matrices that are used by
all the system.

     In the end, we have a set of data-centers where:
    DCi  DC1 , DC2 , DC3 , DC4 , DC5 ,,DCN  Each DCi have two modules
Leader Election and Node-Mapping and many Group where:

      Gi.h  Gi.1 , Gi.2 , Gi.3 , Gi.4 ,, Gi.H  Each Group have Leader Li.h So
Li.h  Li.1 , Li.2 , Li.3 , Li.4 , Li.H  Without forgetting that each group contain Lot
of Node Ni.h.k   Ni.h.1 , Ni.h.2 , Ni.h.3 , Ni.h.4 ,  Ni.h. K  . And of curse , we have
                                                                                                                   5

File for each Node in Group for Data center.
We mentioned that Fi.h,k , j   Fi.h.k .1 , Fi.h.k .2 , Fi.h.k .3 ,  Fi.h.k .M  .
   For example, F5.7.2.3 means file 3 of node 2 of group 7 of data center 5.


3.2     Description

our architecture consists of three module described as follows:


Global System Module: it's a component contains three elements that are used globally
throughout the system

   Min Access Cost: for all transactions, it is necessary that the requests follow the
shortest way to arrive at the desired destination (Djikstra algorithm) without neglecting
the cost of access for each Node traversed.
                                   destination              destination
                         Min          j 1
                                                 Cacc j        C
                                                               i 1
                                                                          transferi
                                                                                                                 (1)


Cacc j : cost of access Node j.                        Ctransferi : Cost of transfer by the link i.

  Data Matrices: As his name means, it accommodate three matrices:
Matrix of Popularity𝑃𝑖ℎ𝑘𝑗 , which refers to the access frequency for each replica .
Matrix of Capacity-Node Sihk for the storage of Host (Node) .
Matrix of Size-Dataset Vj for storage space of Dataset .
Matrix of Threshold Td is compromise between quality of service, maximum
                            u, j
budget and minimum response time for each customer ( Td
                                                                                  u, j    u , j  u , j  u , j )
where alpha beta gama are mark Level (for exemple                                 1 , any response time
                                                                           u, j

with   u , j  3 , average quality and u , j  5 a very high budget) all these
parameters are established in a contract SLA-Conditions.
Master Module: We can call it the brain since it has several components we start
with:
  Solver: Since our problem is to determine the number of necessary replicas such as
the objectives of the tenant will be satisfied while ensuring a profit for the cloud pro-
vider we think about this Mathematical statement of the problem:
6

                                   n      m
                      Minimize  Cij xij
                                  i 1 j 1
                                  m
                      subject to  v j xij  Capi , i  1,                       n
                                  j 1
                                    n

                                   a x  Td , j  1,
                                   i 1
                                              j    ij         u, j               m, u  1,   r

                                  xij  N , i  1,                   n, j  1,       m
                                                                                                 (2)

  Where:
  Cij : replication cost and allocate Dataset j space in the datacenter i (Fi.h.k.j) // what-
ever the node or the cluster
  Xij: number replicas of dataset j in the datacenter i (Fi.h.k.j)
  Vj: storage space (file size) of Dataset j (Fi.h.k.j) (file size)
  Capi : Storage capacity Of DCi where
                                           K H           (3)
                                  Capi   Sihk
                                                            k 1 h 1
    aj: the Coefficient importance of Dataset j (Fi.h.k.j)
                                                                                 (4)
                                                                     1
                                       aj              n     K      H

                                                         P
                                                        i 1 k 1 h 1
                                                                          ihkj



    Tdu,j: Threshold SLA fixed by provider and consumer u for the Dataset j (Fi.h.k.j)

   Such as the resolution is done by the simplex. At the end, we replicate and delete to
reach the optimal number of replicas.
   Repair: We want the system to take into consideration consistency and fault toler-
ance so it is preferable to resolve them by the principle of quorum we start to launch a
verification request for all the replicas to obtain the correct value by the majority and
we make the update for all the data with errors and false value.

   Springy: An elasticity of the resources it can be an increase or decrease according to
the popularity of each data, it means duplicate a data 𝐹𝑖ℎ𝑘𝑗 if its popularity (frequency
of access to this data) is greater than a given threshold Treplication and delete a data 𝐹𝑖ℎ𝑘𝑗
if its popularity (frequency of access to this data) is lower than a given threshold Terasure.

The popularity of each file is calculated by the following simple formula:

                                       𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑞𝑢𝑒𝑠𝑡 𝑓𝑜𝑟 𝑡ℎ𝑒 𝐹𝑖ℎ𝑘𝑗
                         𝑃𝑖ℎ𝑘𝑗 =                                                                 (5)
                                                  𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑞𝑢𝑒𝑠𝑡
                                                                                          7

   SLA Violation: SLA Violation Use quality of service, maximum budget for each
customer and minimum response time as a constraint in a cost minimization based al-
gorithm. in cases of very strict violation, the Replication mechanism triggers an in-
crease in resources.
    DC Mapping: a module made to orientate the request to the Concerned data center
since it contains an information structure about all data centers (fast indexing).


Data-Center Module: We see that the data centers contains cluster each cluster con-
tains nodes and a leader as well as two module Node mapping and leader election..

   Leader Election: Sometimes the completion of a task requires the involvement of
multiple instances of the same cloud service. If the service consumer invoking the cloud
service instances does not have the necessary logic to coordinate them, runtime excep-
tions can occur leading to data corruption and failure to complete the task.
   These are the most available nodes in each Group in Data Center by following Algo
of selection (sorted list of the availability of the nodes and takes the first one).

Node Mapping: a module made to orientate the request to the Concerned data center
since it contains an information structure about all Node in the Data Center (fast index-
ing) . It is linked with the Leaders.


3.3    Functioning of system
As we already said before, each module has a mechanism that works in it. In the fol-
lowing section, we will describe them one by one:

   Solver: a sequence of operations to be executed:
   First, Master Ask for data matrices from Global System Module to solve the problem
of constraints. Then, Master sends an execution command according to the result ob-
tained by the solver to Datacenters. Next, Leaders ask for concerned Node and receive
Result. Finally, they Execute Commands (Replicate or Delete).

    Response and Repair: For this part, we distinguish three cases:
- 1st Case: we have: user request node for data and Get response immediately.
if the response not founded we switch to the second case

- 2nd Case: we have: First, user request node for data .then request transferred to one
leader of data-center. This last one ask for concerned Node and transfer request to
him. Finally, user receive response.
If the response not founded locally (in the data center) we switch to the third case

- 3rd Case we have: First, user request node for data .then request transferred to one
leader of data-center. This last one ask for concerned Node and receive “Not
founded”. So , he Transfer request to Master who also transfer request to leaders (one
for each datacenter).then Leader request for the value of the data after applying the
                               8

                               majority quorum (we keep the true value and we correct the rest).Then, they send the
                               value to the master who creates a copy at the nearest customer node (bring data closer
                               to the client -best client- reduce response time, cost of access and replicate to increase
                               availability).Finally , Response will be transferred to the user.

                                   Springy: the procedure starts with : First , Master ask for Popularity Matrix from
                               Global System Module sending to solve algorithm (already mentioned in description
                               section ).
                                  Next, Master sends an execution command according to the result obtained by the
                               Springy Modules to Datacenters (Leaders).
                                 Leaders ask for concerned Node and receive Result. Finally, they Execute Com-
                               mands (Replicate or Delete).

                                  SLA Violation: the success of operations begins with :
                                  2) User triggers a violation alert. Then, Master send a command to increase re-
                               sources.So Leaders take the data for replicated in the closest node to the client.




                               4          Positioning Of Our Approach

                                   We will try to position our approach in relation to other articles in the state of the
                               art by taking into consideration several characteristics described as follows:
                                                           Fig. 2 Comparison table of different approaches


                        Cost       latency     Availability   throughput    Placement    Size        Consistency   Fault toler-   Popularity   Leader
                                                              / bandwidth                replica                   ance                        Election
Fei-Xie(2017) [1]       Yes        Yes         Yes                          Yes          Yes                                      Yes
Hussam       Abu-Lib-                                                                                Yes           Yes
deh(2013) [12]
Guy-Laden-(2011)                   Yes                        Yes
[10]
Jayalakshmi-D.S                    Yes         Yes            Yes                                                                 Yes
2015 [14]
Ismaeel-AlRidhawi       Yes        Yes         Yes            Yes           Yes
2015 [13]
Yaser Mansouri 2017     Yes        Yes                                      Yes
[9]
Zhendong       Cheng                           Yes            Yes           Yes
2012 [4]
Najme        Mansouri   Yes        Yes         Yes                          Yes          Yes                       Yes
2015 [11]
Mohammad Bsoul                     Yes         Yes            Yes                                                                 Yes
2013 [7]
Ilir-Fetai 2017[15]     Yes        Yes                                                               Yes
Farouk Bouharaouaa      Yes        Yes         Yes                                                   Yes                                       Yes
2017 [16]
Our Approach            Yes              Yes         Yes            Yes           Yes          Yes         Yes           Yes            Yes          Yes
                                                                                              9



    Fig 2 shown Comparison table of different approaches compared to the features and
services studied in this experiment and before like Cost, latency, Availability, through-
put / bandwidth, Placement, Size,replica, Consistency, Fault tolerance, Popularity,
Leader Election.
   For example, if we compare article of Mohammed Bsoul (2013) [7] with our article
we find that he studied latency, avaibility and popularity but he has neglected the cost
and the consistency thus other parameters that one included them in our approach.
   In the other side, we find that Ilir Fetai (2017)[14] omit Avaibility and placement of
replica . These two parameter are important for our studies.
   On the other hand, the approach of Najme MANSOURI (2015) [11] has been very
interesting since she has studied several criteria but she has neglected the popularity of
the files which has an impact on the when and how much to replicate and delete.
   Finally, no way to compare with Hussam Abu-Libdeh(2013) [12] because it focus
on two parameters are consistency and tolerance to failure on ten parameters to study
in our approach .


5      Conclusion and Perspectives

   In this investigate work; replication of datasets has been presented with existing in-
formation arrangement procedure. It is incomprehensible to fulfill all the conditions to
put the datasets at fitting position where all assignments can get to the information with
the least information exchange cost and fulfillment of SLA goals despite the fact of
using the majority of the parameters like cost, latency, response time, popularity with
different method like simplex and quorum. In the future, we try to apply this method in
a simulation environment as cloudsim to better deflate the results of this proposed ap-
proach.


References

1.   Fei Xie ; Jun Yan ; Jun Shen, "Towards Cost Reduction in Cloud-Based Workflow Man-
     agement," Fifth International Conference on Advanced Cloud and Big Data, (2017).
2.   T. Pankowski, "Lorq: A System for Replicated NoSQL Data," Evaluation of Novel Ap-
     proaches to Software Engineering. ENASE 2015, pp. 62-79, (2016).
3.   Sebastian Burckhardt, "Replicated data types: specification, verification, optimality," ACM
     SIGPLAN-SIGACT Symposium on Principles of Programming Languages, (2014).
4.   Zhendong Cheng ; Zhongzhi Luan ; You Meng ; Yijing Xu ; Depei Qian ; Alain Roy ; Ning
     Zhang ; Gang Guan, "ERMS : An Elastic Replication Management System for HDFS,"
     IEEE International Conference on Cluster Computing Workshops, pp. 32-40, (2012).
5.   Xiuguo Wu, "Data Sets Replicas Placements Strategy from Cost-Effective View in the
     Cloud," Hindawi Publishing Corporation, pp. 1-14, (2016).
10

6.    Sathiya Prabhu Kumar , Sylvain Lefebvre ,"CaLibRe: A better Consistency-Latency
      Tradeoff for Quorum based Replication systems," Database and Expert Systems Applica-
      tions. Globe (2015), DEXA 2015. Lecture Notes in Computer Science, pp. 2-14,( 2015).
7.    Mohammad Bsoul ; Alaa E. Abdallah ; Khaled Almakadmeh ; "A Round-based Data Rep-
      lication Strategy," IEEE Transactions on Parallel and Distributed Systems, pp. 2-18, (2013).
8.    Uras Tos ; Riad Mokadem ; Abdelkader Hameurlain ; Tolga Ayav ; Sebnem Bora , "A
      Performance and Profit Oriented Data Replication Strategy for Cloud Systems," IEEE Con-
      ferences on Ubiquitous Intelligence & Computing, (2016).
9.    Yaser Mansouri ; Adel Nadjaran Toosi ; Rajkumar Buyya, "Cost Optimization for Dynamic
      Replication and Migration of Data in Cloud Data Centers," IEEE Transactions on Cloud
      Computing, pp. 1-16, (2017).
10.   Guy Laden, Roie Melamed , Ymir Vigfusson "Adaptive and Dynamic Funnel Replication
      in Clouds," ACM SIGOPS Operating Systems Review, vol. 46, pp. 40-46, (2011).
11.   N. MANSOURI, "Adaptive data replication strategy in cloud computing for performance
      improvement," Frontiers of Computer Science., pp. 1-11,( 2015).
12.   H. Abu-Libdeh, "Elastic Replication for Scalable Consistent Services" (2012).
13.   I.Al Ridhawi, Nour Mostafa, Wassim Masri «Location-Aware Data Replication in Cloud
      Computing Systems," Eight International Workshop on Selected Topics in Mobile and
      Wireless Computing, (2015).
14.   Jayalakshmi D. S. ; Rashmi Ranjana T. P. ; R. Srinivasan, "Dynamic Data Replication Strat-
      egy in Cloud Environments" Fifth International Conference on Advances in Computing and
      Communications, (2015).
15.   Ilir Fetai ; Alexander Stiemer ; Heiko Schuldt , "QuAD: A quorum protocol for adaptive
      data management in the cloud",Big Data (Big Data), IEEE International Conference on,
      405-414, (2017) .
16.   Bouharaoua, Farouk, Belalem, Ghalem "A quorum-based intelligent replicas management
      in data grids to improve performances",Multiagent and Grid Systems, 13, 143--161,( 2017).