<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A new distributed anonymization protocol to satisfy multiple data provider's privacy requirements</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Salheddine kabou</string-name>
          <email>salheddine.kabou@univ-sba.dz</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sidi Mohamed Benslimane</string-name>
          <email>benslimane@univ-sba.dz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>International Conference on Advanced Aspects of Software Engineering</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Sciences Department, Djillali Liabes University of Sidi Bel Abbes</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>EEDIS Laboratory, Djillali Liabes University of Sidi Bel Abbes</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>ICAASE</institution>
          ,
          <addr-line>November, 2-4, 2014, Constantine</addr-line>
          ,
          <country country="DZ">Algeria.</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>2</fpage>
      <lpage>4</lpage>
      <abstract>
        <p>-Privacy and security concerns are among the main obstacles facing the widespread adoption of this new technology. Data anonymization makes data worthless to anyone except the owner of the data. It is one of the methods for transforming the data in such a way that it prevents identification of key information from an unauthorized person. Most of the existing works use a k-anonymat model for preserving privacy for data subject that offers lower utility. Motivated by this, we develop a new distributed anonymization protocol to satisfy multiple data providers privacy requirements based on a k-concealment model that offers a higher utility.</p>
      </abstract>
      <kwd-group>
        <kwd>- Privacy</kwd>
        <kwd>Concealment</kwd>
        <kwd>Anonymization</kwd>
        <kwd>Cloud Computing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The confidentiality of this data must be preserved
before outsourcing to the commercial public
cloud, i.e. any sensitive information should not be
disclosed. Data anonymization is one of the
privacy preserving techniques that translate the
information, making the original data worthless to
anybody except the owners [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It has been widely
discussed in the literature such as k-anonymity
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], l-diversity [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], k-concealment [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The
objective of our work is to adopt a new distributed
anonymization protocol over cloud servers, which
can satisfy different data provider’s personal
needs and maximize the utility of the anonymous
data still remains a challenging problem using an
alternative model. In this paper, we offer the
following contributions:
2. As the output of the protocol, each
private dataset produces a local
anonymized dataset that satisfies each
data provider’s privacy constraints and
their union forms a global virtual
database that meets a global
anonymization principle (k-conealment).
The rest of the paper is organized as follows. In
Section II, we give we discuss related works to
our research. In Section III, presents our
distributed anonymization protocol, and Section
IV explain a R*-tree generalization principal to
achieve the protocol. Finally, we conclude our
discussion in Section V.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORK</title>
    </sec>
    <sec id="sec-3">
      <title>2.1 Privacy preserving data publishing for single databases</title>
      <p>
        In recent years, Privacy preserving data
publishing for centralized databases has been
studied extensively [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ](.K-anonymity,l-diversity)
K-concealment proposes to generalize the table
records so that each one of them becomes
computationally - indistinguishable from at least
k−1 others. In this study, our distributed
anonymization protocol is built on top if the
kconcealment principles.
      </p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Privacy preserving data publishing for multiple databases</title>
      <p>There are a number of possible solutions that can
be applied for anonymizing distributed data</p>
      <p>
        The integrate-and-anonymize [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] solution is
the simplest approach that assumes an existence
of third party that can be trusted by each of the
data owners. Here data owners send their data
to the trusted third party where data integration
and anonymization are performed. This approach
is not feasible for many scenarios [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Finding
such a trusted third party is not always feasible.
Compromise of the server by hackers could lead
to a complete privacy loss for all the participating
parties and data subjects.
      </p>
      <p>
        The basic idea of an alternative approach
anonymize-and-integrate is for each data
provider to perform data anonymization
independently. In case of horizontal data
distribution [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], this tends to have a negative
impact on data quality. For example, enforcing
local k-anonymity can cause the data utility to
suffer more than enforcing global k-anonymity. If
data is distributed vertically [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the integration of
the locally anonymous datasets can result in
global non-anonymous datasets.
      </p>
      <p>
        In the Virtual anonymization [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] data
providers participate in distributed protocols to
produce a virtual integrated and anonymized
database. Important to note is that the
anonymized data still resides at individual
databases and the integration and anonymization
of the data is performed through the secure
distributed protocols.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] the authors propose an adaptation of the
Mondrian generalization-based algorithm for
producing a k-anonymous release of the union of
the datasets by using a set of secure multi-party
computation protocols (i.e., secure computing
sum, secure median protocols). The execution
sequence is under the control of a single leading
site and assumes that participants are fully
available. Ding et al [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] presented a distributed
anonymization algorithm using R-tree index
structure [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Their algorithm uses a greedy
approach to recursively insert the data objects
into the quasi-identifier domain space.
      </p>
      <p>
        The above works have used a k-anonymity model
for the privacy of data subjects. They choose it
because it’s useful in several practical
applications; also it suffers in data utility. Here,
our work is aimed at outsourcing data provider’s
private dataset to cloud servers for data sharing.
More importantly, our anonymization protocol
aims to achieve anonymity for both (1) data
subjects: by using a k-concealment model which
ensures the same level of security and offers
higher utility than others models that have used
in the above works, and (2) data providers: by
designing a new distributed anonymization
algorithm that uses a R*-tree structure [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Its
offers a best choosing of appropriate insertion of
data objects into the quasi-identifier domain
space, and it find an adequate partitioning of the
quasi-identifier domain space.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3. DISTRIBUTED ANONYMIZATION</title>
    </sec>
    <sec id="sec-6">
      <title>PROTOCOL</title>
    </sec>
    <sec id="sec-7">
      <title>3.1 Algorithm</title>
      <p>There are a large number of algorithms proposed
to achieve k-anonymity. These algorithms have
been successfully used to enhance anonymity in
distributed environments. Although to our best
knowledge, the model k-concealment has not yet
been used for the cloud environment.</p>
      <p>Algorithm.1 Master
Phase 1: Reading data (slave)
Read data (B, num) from each slave node into a set r
Phase 2: Insertion on R*-tree (Generalisation)
For each rectangle B ϵ r, do</p>
      <p>Finds on every level of the tree, the most suitable subtree
to accommodate the new entry (see process of R*-tree
generalisation)</p>
      <p>If split is possible, do split (see process of split)
End for
Phase 3: Modify the local ki-concealment databases
For each equivalence class Ei of the K-concealment table, do
Get rectangles set P contained in R* of Ei</p>
      <p>For each rectangle B ϵ P, do send B and R* to slave
node</p>
      <p>End for
End for
Algorithm.2 Slave node i (i&gt;0)
Phase 1: Sending data to the master
For each equivalence class Ej of ki concealment table, do</p>
      <p>Send information of Ej to the Master in form (B, num)
end For
Phase 2: Receive modification from the master
Read the data B and R* from the Master
For each equivalence class Ej of ki-concealment table, do
if rectangle of Ej equals B</p>
      <p>Enlarge the rectangle of Ej to R*
end if
end For
We assume that a master node is selected from
cloud server for the main protocol, and all the
other local databases are consider as slave
nodes. The protocols for the master node and
other slave nodes are presented in Algorithm1
and 2. In the first, the slave node sends a data
that is abstract information of each equivalence
class to the Master node in form (B, num). Where
B refers a d-dimensional rectangle which is the
bounding box of the equivalence class’s spacial
QI values [l1,u1], ...,[ld,ud], and num is the total
number of data subjects in the equivalence class.
In Phase 1, the master node reads data (B, num)
of each equivalence class that had sent from
each slave node into a set r. In phase 2, the
algorithm inserts each rectangle B from r into an
R*-tree: for the first time, it finds on every level of
the tree, the most suitable subtree to
accommodate the new entry (minimum
overlapping), and if split is possible, do split.
In phase 3, the master node modifies all the initial
local ki-concealment databases by traversing
each equivalence class Ei of the K-concealment
table, to find the rectangles set P contained in it.
For each rectangle B ϵ P that contained in the
bounding box R* of Ei, we send data B and R*
back to a corresponding slave node. When
receive the data B and R* from the master, the
slave node finds out the equivalence class whose
rectangle equals B and then enlarges its
rectangle to R*.</p>
    </sec>
    <sec id="sec-8">
      <title>3.2. Split process</title>
      <p>The objective is to split the data as much as
possible while satisfying the privacy constraints to
maximize the utility of anonymized data.
The R*-tree uses the following steps to find a
better split. In a first step, the split axis has to be
chosen. Along each axis, the entries are first
sorted by the lower value, and then sorted by the
upper value of their rectangles (rectangles of
equivalence classes). For each sort M-2m+2
distributions of the M+1 (node a) entries into two
groups are determined (M is the maximum
number of entries that will fit in one node and m
is a minimum number of entries), where the k-th
distribution (k=1, …., M-2m+2) is determined as
follows: The first group contains the first (m–1) +
k entries, the second group contains the
remaining entries.</p>
      <p>
        For each of the M–2m+2 distributions, the
margin-value is determined by summarizing the
margin length of the two minimum bounding
boxes (rectangles of equivalence classes) of both
distributions. Finally, the axis that returns the
minimum margin value is selected a split axis. In
the next step, an adequate partitioning of the
entries along the split axis determined. The
overlap-value for each of the 2(M–2m+2)
distributions is considered where the
overlapvalue denotes the size of the area of the overlap
between the two minimum bounding boxes of
both partitions. Here, the secure sum protocols
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] can be used to securely compute the
minimum area-value denoting the sum of the size
of the areas covered by both minimum bounding
boxes of both partitions [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
    </sec>
    <sec id="sec-9">
      <title>4. R*-TREE GENERALIZATION</title>
      <p>Our algorithm uses an R*-tree. It generalizes the
data by inserting it into the minimum overlapping
quasi-identifier domain space and when overflow
occurs, the quasi-identifier domain space will be
split into two parts along the best axis chosen. It
recursively chooses the best branch to inset the
data objects for gathering the closest data
objects. When all the data tuples are inserted into
the R*-tree, the generalization table was built.
A non-leaf node contains entries of the form (cp,
B) where cp is the address of the child node in the
R*-tree and B is the minimum bounding rectangle
of all rectangles which are entries in that child
node. A leaf node contains entries of the form
(Oid, B) where Oid refers to a record in the
database, and B = (B1, B2,…, Bd) is a
ddimensional rectangle which is the bounding box
of one equivalence class’s spacial QI values. at
this point, d is the number of dimensions and Bi is
a bounded interval [x,y] describing the QI value
along dimension.</p>
      <p>The R*-tree construction is based on the insertion
algorithm
which
focuses
on:
(1)
Choossubtree: Beginning in the root, descending
to a leaf, it finds on every level the most suitable
subtree to accommodate the new object. (2) The
split: If Choosesubtree ends in a node filled with
the maximum number of objects M, split should
distribute M+ 1 rectangles into two nodes in the
most appropriate manner.</p>
      <p>
        We create the R*-tree by inserting every object
into the indexing structure. Given a new object,
the authors in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] consider only the area. Our
algorithm tests the parameters area and overlap
in different combinations. The overlap of an entry
is defined as follows. Let E1,...,Ep be the entries
in the current node. Then,
For choosing the best non-leaf node, we can use
the method used in R-tree [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. For the leaf
nodes, minimizing the overlap performed slightly
better.
      </p>
    </sec>
    <sec id="sec-10">
      <title>5. CONCLUSION</title>
      <p>
        We have presented a distributed anonymization
protocol for privacy-preserving data publishing
from
multiple
data
providers
in
a
cloud
environment. Our work addresses two important
issues, privacy of data subjects and privacy of
data providers. For the privacy of data subjects
(individuals), we have used a k-concealment
model that offers a
higher utility
with less
generalisation than that which is required by
kanonymity used in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. For the privacy of
data providers, we have adopted a bottom-up
algorithm instead of top-down approach used in
while
using
      </p>
      <p>R*-tree
index
for
better
generalization. We also illustrated that the R*-tree
strategy leads to a more effective insertions and
splitting than that of the R-tree. We are now
working
on implementing this algorithm
204</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Sedayao</surname>
          </string-name>
          , “
          <article-title>Enhancing cloud security using Data Anonymization”, Intel white paper</article-title>
          ,
          <year>June 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Samarati</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Sweeney</surname>
          </string-name>
          . “
          <article-title>Generalizing data to provide anonymity when disclosing information”</article-title>
          .
          <source>In ACM-SIGMOD Symposium on Principles of Database Systems (PODS)</source>
          ,
          <source>page 188</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Machanavajjhala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehrke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kifer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Venkitasubramaniam</surname>
          </string-name>
          .” l-diversity:
          <article-title>Privacy beyond k-anonymity”</article-title>
          .
          <source>In International Conference on Data Engineering (ICDE)</source>
          ,
          <source>page 24</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Tassa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mazza</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          .Gionis, “
          <article-title>kConcealment: An Alternative Model of k-Type Anonymity”</article-title>
          ,
          <source>Transactions on Data Privacy</source>
          <volume>5</volume>
          ,
          <fpage>pp189</fpage>
          -
          <lpage>222</lpage>
          ,
          <year>2012</year>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chen</surname>
          </string-name>
          , “
          <article-title>Privacypreserving data publishing”, A survey of recent developments</article-title>
          .
          <source>ACM Computing Surveys, CSUR</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Jurczyk</surname>
          </string-name>
          , , L. Xiong, “Distributed Anonymization:
          <article-title>Achieving Privacy for Both Data Subjects and Data Providers”</article-title>
          . In: Gudes,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Vaidya</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.)
          <source>Data and Applications Security</source>
          <year>2009</year>
          . LNCS, vol.
          <volume>5645</volume>
          , pp.
          <fpage>191</fpage>
          -
          <lpage>207</lpage>
          . Springer, Heidelberg (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Kohlmayer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Prasser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Eckert</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Kuhn, “
          <article-title>A flexible approach to distributed data anonymization”</article-title>
          ,
          <source>In Journal of Biomedical Informatics</source>
          ,
          <year>August 2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mohammed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fung</surname>
          </string-name>
          , “
          <article-title>Centralized and distributed anonymization for highdimensional healthcare data”</article-title>
          .
          <source>ACM Trans Knowl Discovery Data</source>
          <year>2010</year>
          ;
          <volume>4</volume>
          (
          <issue>4</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>W.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , , C.Clifton, “
          <article-title>A secure distributed framework for achieving k-anonymity”</article-title>
          .
          <source>VLDB Journal</source>
          <volume>15</volume>
          (
          <issue>4</issue>
          ),
          <fpage>316</fpage>
          -
          <lpage>333</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.LI</given-names>
            ,
            <surname>J.Liu</surname>
          </string-name>
          , H.Jin, “
          <article-title>Distributed Anonymization for Multiple Data Providers in a Cloud System”</article-title>
          , In: W. Meng et al. (Eds.):
          <source>DASFAA</source>
          <year>2013</year>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , LNCS
          <volume>7825</volume>
          , pp.
          <fpage>346</fpage>
          -
          <lpage>360</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A .</given-names>
            <surname>Guttman</surname>
          </string-name>
          ,.
          <article-title>“R-trees: a dynamic index structure for spatial searching”</article-title>
          .
          <source>In: SIGMOD</source>
          (
          <year>1984</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.</given-names>
            <surname>Beckmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kriegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Seeger</surname>
          </string-name>
          , “The R*
          <article-title>-tree: An efficient and robust access method for points and rectan-gles”</article-title>
          .
          <source>In: Proceedings of the International Conference on Manage-ment of Data (SIGMOD'90)</source>
          , Atlantic City, NJ (
          <year>1990</year>
          ) pp.
          <fpage>322</fpage>
          -
          <lpage>331</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B</given-names>
            <surname>¨ ottcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Obermeier</surname>
          </string-name>
          ,
          <string-name>
            <surname>S</surname>
          </string-name>
          , “
          <article-title>Secure set union and bag union computation for guaranteeing anonymity of distrustful participants”</article-title>
          ,
          <source>JSW</source>
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <fpage>9</fpage>
          -
          <lpage>17</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>