<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>-Means SubClustering: A</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Quality</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Devvrat Joshi</string-name>
          <email>devvrat.joshi@iitgn.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Janvi Thakkar</string-name>
          <email>janvi.thakkar@iitgn.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Conference on Information and Knowledge Management</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Indian Institute of Technology Gandhinagar</institution>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <abstract>
        <p>In today's data-driven world, the sensitivity of information has been a significant concern. With this data and additional information on the person's background, one can easily infer an individual's private data. Many diferentially private iterative algorithms have been proposed in interactive settings to protect an individual's privacy from these inference attacks. The existing approaches adapt the method to compute diferentially private(DP) centroids by iterative Llyod's algorithm and perturbing the centroid with various DP mechanisms. These DP mechanisms do not guarantee convergence of diferentially private iterative algorithms and degrade the quality of the cluster. Thus, in this work, we further extend the previous work on 'Diferentially Private  -Means Clustering With Convergence Guarantee' by taking it as our baseline. The novelty of our approach is to sub-cluster the clusters and then select the centroid which has a higher probability of moving in the direction of the future centroid. At every Lloyd's step, the centroids are injected with the noise using the exponential DP mechanism. The results of the experiments indicate that our approach outperforms the current state-of-the-art method, i.e., the baseline algorithm, in terms of clustering quality while maintaining the same diferential privacy requirements. The clustering quality significantly improved by 4.13 and 2.83 times than baseline for the Wine and Breast_Cancer dataset, respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>diferential privacy</kwd>
        <kwd>-means clustering</kwd>
        <kwd>convergence guarantee</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Achieving extraordinary results is dependent on the data
on which the machine learning models are trained. Data
curators have a responsibility to provide datasets such
that the privacy of data is not compromised. However,
attackers use other public datasets to perform inference
and adversarial attacks to get information about an
individual in the dataset. Diferential privacy is a potential
technique for giving customers a mathematical guarantee
tal settings in which diferential privacy is used on data:
in interactive setting data curator holds the data and
returns the response based on the queries requested by
third parties; while in non-interactive setting the curator
sanitized the data before publishing[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
sights about the dataset, which helps in a large number of
applications. They are prone to privacy threats because
ditional knowledge. Existing approaches obtain the set
of centroids using Lloyd’s K-means algorithm, then
perturb them with a diferentially private mechanism to add
CIKM-PAS’22: PRIVACY ALGORITHMS IN SYSTEMS (PAS) Workshop,
ulating it using DP strategy.
      </p>
      <sec id="sec-1-1">
        <title>Our main contribution includes: 1. We proposed SubClustering approach which has better clustering quality than the baseline (which is the current SOTA in terms of clustering qual</title>
        <p>Iterative clustering algorithms provide important in- in the direction where there is a higher number of data
they can reveal information about an individual with ad- than the existing approaches. We have tested our
apity). For the Wine and Breastcancer dataset, the
clustering quality improved by 4.13 and 2.83 times
respectively.
2. In addition to improving the clustering quality,
our algorithm used same privacy budget as that
of the existing work.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The concept of diferential privacy has inspired a plethora
of studies, particularly in the area of diferentially
private k-means clustering [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ][
        <xref ref-type="bibr" rid="ref7">7</xref>
        ][
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] in an interactive setting.
The important mechanisms of DP in the literature include:
the Laplace mechanisms (LapDP) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the exponential
mechanisms (ExpDP) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and the sample and aggregate
framework [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. To achieve diferential privacy, many
implementations included infusing Laplace noise into each
iteration of Lloyd’s algorithm. The proportion of noise
added was based on a fixed privacy budget. Some of the
strategies for allocating privacy budget included splitting
the overall privacy budget uniformly to each iteration
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. However, this requires us to calculate the number of
iterations for the convergence, prior to the execution of
algorithm, thus increasing the computational cost.
Further, researchers overcome this weakness by allocating
theoretically guaranteed optimal allocation method [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
but the major assumption taken in this approach was
that every cluster has the same size, which does not align
with the real-world datasets. In another work, Mohan
et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] proposed GUPT, which uses Lloyd’s algorithm
for local clustering of each bucket where the items were
uniformly sampled to diferent buckets. The final result
was the mean of locally sampled points in each bucket
with added Laplace noise. But, the clustering quality of
GUPT was unsatisfying because a large amount of noise
was added in the aggregation stage.
      </p>
      <p>
        Based on the study of past literature on diferentially
private k-means clustering, Zhigang et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] concluded
that convergence of an iterative algorithm is important to
the clustering quality. To solve this, they introduced the
concept of the convergent zone and orientation controller.
With the help of a convergent zone and orientation
controller, they further create a sampling zone for selecting
a potential centroid for the  ℎ iteration. The approach
iteratively adds noise with an exponential mechanism
(ExpDP) by using prior and future knowledge of the
potential centroid at every step of Lloyd’s algorithm. The
approach maintains the same DP requirements as existing
literature, with guaranteed convergence and
improvement in clustering quality. However, their algorithm
perturbs the centroids in a random direction from the
center of the cluster, degrading the quality of clustering.
Thus, in this work, we further build upon the approach
and significantly improve the clustering quality with the
same epsilon privacy.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Preliminaries</title>
      <p>The definitions used in this work are briefly discussed
in this section. The following is a formal definition of
Diferential Privacy:</p>
      <p>
        Definition 1 (  -DP [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). A randomised mechanism T
is  - diferentially private if for all neighbouring datasets
 and  ′ and for an arbitrary answer  ∈ ( ) , T
satisfies
  [ ( ) = ] ≤ () ⋅   [ (
′) = ],
where  is the privacy budget.
      </p>
      <p>Here,  and  ′ difer by only one item. Smaller
values of  imply a better privacy guarantee. It is because
the diference between the two neighboring datasets is
reflected by the privacy budget. In this work, we use the
ExpDP and LapDP. In exponential DP for non-numeric
computation, they introduce the concept of scoring
function ( , ) , which represents the efectiveness of the
pair ( , ) . Here  is the dataset and  is the response to
the ( , ) on X.</p>
      <p>The formal definition of Exponential DP mechanism
is defined as follow:</p>
      <p>
        Definition 2 (Exponential Mechanism [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]).
Given a scoring function of a dataset  , ( , ), which
reflects the quality of query respond x. The
exponential mechanism T provides  -diferential privacy,
if  ( ) = {  [] ∝ ( ⋅( ,2)Δ )}, where Δ is the
sensitivity of scoring function q(X,x),  is the privacy
budget.
      </p>
      <p>
        Definition 3 (Convergent &amp; Sampling Zones[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]).
A region whose points satisfies the condition: { Node S:
‖ −   () ‖ &lt; ‖  (−1) −   () ‖} is the convergent zone.   () is
defined as the mean of   () . A sub-region inside convergent
zone is defined as a sampling zone.
      </p>
      <p>
        Definition 4 (Orientation Controller[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]).   () is a
direction from the center of the convergent zone to a point
on its circumference. This is the direction along which the
center of the sampling zone will be sampled, defined as the
orientation controller.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Approach</title>
      <p>In this section, we explain our proposed approach and
the baseline approach.</p>
      <sec id="sec-4-1">
        <title>4.1. Overview - KMeans Guarantee (Baseline)</title>
        <p>
          We took ”Diferentially Private K-Means Clustering with
Convergence Guarantee” [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] as our baseline and
im
        </p>
        <p>1. Let the diferentially private centroid at iteration
2. Using   () and   (−1) , generate a convergent zone
for each cluster  as described in   3
() for each cluster
respectively.
() with ExpDP
i as defined in   3  4
and an orientation controller  
4. Sample a diferentially private  ̂

in the sampling zone generated in step 3.</p>
        <p>The definition for the convergent zone (for convergence
guarantee) and sampling zone (for centroid updating) is
defined in Definition 3.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Overview - SubCluster Guarantee</title>
        <p>We build upon the KMeans Guarantee algorithm to
achieve better clustering quality. Our idea difers from
the baseline in terms of creating a sampling zone. For
each cluster, we execute Lloyd’s algorithm over its
convergent zone to generate its sub-clustering. Further, we
assign each sub-cluster with a probability linearly
proportional to the number of points it contains. Finally, we
sample the sub-cluster based on the assigned probability
and define it as the sampling zone of the convergent zone.</p>
        <p>Drawing analogy from the KMeans Guarantee algorithm,
.</p>
        <p>11 Publish:  

() ,  ,  
,   ()
3. Generate a sampling zone in the convergence zone 12 S ← add laplace noise with   to S() ;
points
Algorithm 1: Diferentially Private − Means</p>
        <sec id="sec-4-2-1">
          <title>SubClustering Algorithm</title>
          <p>Input: X = { 1,  2, ....,   }: Dataset with N data
 : Laplacian privacy budget for the converged


 
centroids.
k: number of clusters
 : ExpDP privacy budget</p>
          <p>: number of sub-clusters per cluster
uniformly from X.</p>
          <p>Output: S: Final clustering centroids
1 Select  centroids S(0) = ( 1
(0),  2</p>
          <p>(0), ...,  (0))
2</p>
          <p>algorithm.
3 for iters i in     
= number of iterations to run the</p>
          <p>do
for each Cluster i at Iteration t do</p>
          <p>() ← assign each   to its closest
4
5
6
7
8
9
10

inside the spherical region having    and
−1 as the endpoints of its radius.</p>
          <p>()
← run Algorithm 2 using

() ,</p>
          <p>;
← sample from  
using ExpDP with  and   ;

()
 () ←  ̂

()



Algorithm 2: SubClusterSamplingAlgorithm
Input: ConvergentZone: Convergent Zone
internalK: Subclustering K</p>
          <p>Output:  
1 S() : Mean of    
2 ConvergentZoneClusters ← Cluster</p>
          <p>ConvergentZone using Lloyd’s algorithm and

()
 
cluster.
3 ConvergentZoneProbability ← Assign
probabilities to the ConvergentZoneClusters
proportional to the number of points inside each
4 SamplingZonei(t) ← Sample a cluster from the</p>
          <p>ConvergentZoneClusters using</p>
          <p>ConvergentZoneProbability
5 Return: SamplingZonei(t);
3. SubCluster the convergence zone and sample one
of the sub-cluster as our sampling zone based on
the probability assigned to each sub-cluster. The
probability assignment is directly proportional to
the number of points in each sub-cluster.
4. Sample a diferentially private  ̂</p>
          <p>in the sampling zone generated in step 3.</p>
          <p>() with EXpDP</p>
          <p>Our approach surpasses the baseline approach in terms
of clustering quality while maintaining the same DP
requirements as that of the KMeans Guarantee approach,
which is evident from the results obtained (Figure :
3). The better clustering quality is a result of our
subclustering strategy to perturb centroid with a higher
probability than the baseline approach towards the direction
of the actual centroid generated by Lloyd’s algorithm.</p>
          <p>The pseudo-code of our approach is shown in the
Algorithm 1 and Algorithm 2.</p>
          <p>
            Lemma 1: [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] A randomised iterative algorithm  is
convergent if, in  
centroid using  ), 
 () (centroid of  


() (Cluster i at iteration t),  ̂
          </p>
          <p>(sampled
(−1) (centroid before recentering) and
() ) satisfies the invariant, || ̂</p>
          <p>−   () || &lt;
()
()
|| 
() −   (−1) || in Euclidean distance, ∀ ,  .</p>
          <p>
            We reproduce this lemma from our baseline approach
[
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. Lemma1 and Lemma 2 together provides the
completeness and proof for the convergence of our approach.
iteration to move in the direction of a more populated re- If the distance between the sampled centroid  ̂() from

1. Let the diferentially private centroid at iteration
          </p>
          <p>. Using this centroid,
run one iteration of Lloyd’s algorithm to get the
current Lloyd’s centroid   () for each cluster  .
2. Using   () and   (−1) , generate a convergent zone</p>
          <p>() , then the loss will increase. However, if we can
ensure that any sampled point from  
loss than   (−1) , thus, resulting into convergence of the
randomised iterative algorithm. For the mathematical
() −   (−1) ||, it will lead to a lesser</p>
          <p>
            () fulfills the
condiproof, refer [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ].
          </p>
          <p>Lemma 2: Diferentially Private − Means
SubClustering approach (SubClustering) is a randomised
itera|| 
tive algorithm that satisfies the invariant || ̂
Proof: SubClustering is an iterative algorithm that
samples a set of centroids for each iteration with
ExpDP mechanism, thus, making it a randomised
iterative algorithm. It subclusters the points lying inside
()</p>
          <p>−   () || &lt;
subcluster (sampling zone) with the assigned
probabilities (linearly proportional to the number of data points in
subcluster). Finally, it samples a datapoint from the
sam
() . After subclustering, it samples one
() . Therefore, the sampled point</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Setup</title>
      <sec id="sec-5-1">
        <title>5.1. Dataset Used</title>
        <p>
          We used following four datasets to test our work
SubCluster Guarantee upon the baseline:
1. Iris [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] dataset comprises total of 150 datapoints
with four features and three classes.
2. Wine[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] dataset comprises total of 178
datapoints with 13 features and three classes.
3. Breast Cancer[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] dataset comprises total of
569 datapoints with 30 features and two classes.
4. Digits[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] dataset comprises of 1797 datapoints
with 64 dimensions and 10 classes.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Metric for Clustering Quality</title>
        <p>
          To evaluate the clustering quality, we used the
following equation to calculate the normalised diference
between the diferentially private algorithms (here,
Sub|


− 


|
(1)
The smaller CostGap [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] represents the better quality of
clustering. In the experiments, we compare the clustering
quality of SubCluster Guarantee with KMeans Guarantee.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results and Discussion</title>
      <p>We tested our algorithm on four datasets. All the datasets
have diferent dimensions ranging from 4 to 64
dimensions and training sets ranging from 150 to 1800. As
defined in metric smaller gap represents the better
clustering quality. From the (Figure : 3) we can observe
that, cost gap for all the dataset is smaller or equal to
the baseline. Thus, it is evident that our algorithm has
better clustering quality than the existing work for all the
datasets experimented. We varied internalK (parameter
for number of sub-clusters) from 2 to 5.</p>
      <p>Each experiment was conducted 30 times in the case
of the Iris, Wine, and Breast cancer dataset and 10 times
for digits dataset due to computational constraints.
Finally, for each dataset, we took the average of all the
experiments as our final result for plotting the graphs.</p>
      <p>Comparing the SubCluster Guarantee (proposed
approach) and K-means Guarantee approach (baseline) by
taking an average of all the cost gaps for varied epsilon,
and finally taking the ratio between K-means and
SubCluster approach:
1. In case of Iris dataset, the cost gap is 1.1 times
smaller than baseline algorithm.
2. In case of Wine dataset, the cost gap is 4.13 times
smaller than baseline algorithm.
3. In case of Breast_Cancer dataset, the cost gap
is 2.83 times smaller than baseline algorithm.
4. In case of Digits dataset, the cost gap is almost
same as that of baseline algorithm.</p>
      <sec id="sec-6-1">
        <title>6.1. Detailed Analysis</title>
        <p>
          1. Iris: Iris dataset has four dimensions and a very
small training set of 150 data points. Our
algorithm achieves better clustering quality than
the baseline algorithm for smaller epsilon values.
Since the number of data points is less in Iris, the
impact of sub-clustering reduces, resulting in its
performance similar to that of the baseline
approach. From (Figure : 4), we can observe that
changing the value of intenalK has a small impact
on the costGap due to a small number of points
in each sub-cluster. This is because there is a
possibility that a sub-cluster has no data point when
internalK is increased causing zero probability
sub-cluster regions.
2. Wine: The wine dataset has 13 dimensions and
178 data points in the training set. Our algorithm
performs significantly better than the baseline, as
observed in (Figure : 3). It is because the
baseline algorithm is constrained to choose a theta in
any abrupt direction ranging from [− /2,  /2 ] as
shown in (Figure : 1). In contrast, our algorithm
shifts the centroids in the direction where the
future centroid of Lloyd’s algorithm is more likely
to move (in the expected case). From (Figure : 4),
it is evident that internalK=4 for the wine dataset
performs better than the rest of the internalK
values. Here, the number of dimensions is more than
Iris. Therefore, the spatial arrangement will be in
an n-sphere which allows better sub-clustering.
3. Breast_Cancer: Breast_Cancer dataset has 569
data points in its training set and 30 dimensions.
Our algorithm performs exceptionally better than
the baseline, with internalK equal to 4. From
(Figure : 3), we can observe that there is no
monotonous trend for the costGap. Trends are
visible in other datasets due to the larger
number of classification classes, whereas this dataset
has only two classes. Thus, adding Laplace noise
does not have a relation to the clustering quality.
Increasing the internalK improves the clustering
quality, with internalK being 4 having the least
loss. It is because this dataset has a high number
of dimensions and a larger number of training
points than other datasets.
4. Digits: It has 64 dimensions and 1797 data points
in the training dataset. Although it has a large
number of dimensions, our algorithm has a very
small improvement over the baseline algorithm as
seen in (Figure : 3). Because of the higher time
complexity of our algorithm, it is hard to tune
the internalK parameter. As the number of
samples in a dataset increases, the internalK should
increase because a single cluster can contain a
large number of data points. But, due to limited
computational resources, we were not able to
experiment with it further. We took internalK to
be 5 for our experiments as it performed best in
the range [
          <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
          ] as in the (Figure : 4). One of the
intriguing findings in the dataset’s results is that
the curves based on the internalK have a clearly
evident trend, which is a result of the large
number of training data points.
        </p>
        <p>Our proposed algorithm significantly improves over the
baseline in terms of clustering quality, especially for the
wine and breast cancer dataset. In addition our algorithm
maintains the same DP requirements as that of existing
works.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>This work presents a novel method for improving the
clustering quality of diferentially private k-means
algorithms while ensuring convergence. The novelty of
our approach is the sub-clustering of the cluster to select
the diferentially private centroid, which has a higher
probability of moving in the direction of the next
centroid. We proved that our work surpasses the current
state-of-the-art algorithms in terms of clustering quality.
Especially for the Wine and Breast_Cancer dataset, the
clustering quality was significantly improved by 4.13 and
2.83 times than the baseline. In addition, we maintain
the same DP requirements as that of baseline and other
existing approaches.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Future Work</title>
      <p>
        • In this work, we proved our claim using empirical
results. We further plan to validate the results
by providing mathematical bounds for the
convergence degree and rate of the SubClustering
Lloyd’s algorithm. In terms of clustering
quality, the proposed algorithm in this work is
compared with k-means guarantee clustering only;
to prove the efectiveness of our work, we plan
to experiment with other algorithms in the
literature including, PrivGene [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], GUPT [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and
DWork [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
• The DP requirements in this work are the same
as that of past literature, but in the future, we
plan to explore ways to improve the current DP
guarantees while maintaining the same clustering
quality as in this work.
• We used Exponential and Laplace mechanisms
of DP in the proposed approach; we further plan
to explore the third mechanisms, i.e., sample and
aggregate framework, by integrating it with the
current algorithm.
• In our algorithm, the number of data points inside
a cluster is variable. Thus we plan to choose an
internalK, custom to the size of the cluster to
improve the clustering quality.
      </p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgement</title>
      <p>We would like to thank Prof. Anirban Dasgupta
(IIT Gandhinagar) for his continuous support and
guidance throughout the research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <article-title>Diferential privacy: A survey of results</article-title>
          ,
          <source>in: International conference on theory and applications of models of computation</source>
          , Springer,
          <year>2008</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <article-title>Data privacy: The noninteractive setting</article-title>
          , The University of Texas at Austin,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <article-title>Diferentially private kmeans clustering with convergence guarantee</article-title>
          ,
          <source>IEEE Transactions on Dependable and Secure Computing</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bertino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <article-title>Differentially private k-means clustering</article-title>
          ,
          <source>in: Proceedings of the sixth ACM conference on data and application security and privacy</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>26</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <surname>Diferentially private</surname>
          </string-name>
          m-estimators,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>24</volume>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bertino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <article-title>Diferentially private k-means clustering and a hybrid approach to private optimization</article-title>
          ,
          <source>ACM Transactions on Privacy and Security (TOPS) 20</source>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <article-title>A firm foundation for private data analysis</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>54</volume>
          (
          <year>2011</year>
          )
          <fpage>86</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Thakurta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Culler</surname>
          </string-name>
          ,
          <article-title>Gupt: privacy preserving data analysis made easy</article-title>
          ,
          <source>in: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>349</fpage>
          -
          <lpage>360</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>McSherry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nissim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>Calibrating noise to sensitivity in private data analysis</article-title>
          ,
          <source>in: Theory of cryptography conference</source>
          , Springer,
          <year>2006</year>
          , pp.
          <fpage>265</fpage>
          -
          <lpage>284</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>McSherry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Talwar</surname>
          </string-name>
          ,
          <article-title>Mechanism design via diferential privacy</article-title>
          ,
          <source>in: 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07)</source>
          , IEEE,
          <year>2007</year>
          , pp.
          <fpage>94</fpage>
          -
          <lpage>103</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Nissim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Raskhodnikova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>Smooth sensitivity and sampling in private data analysis</article-title>
          ,
          <source>in: Proceedings of the thirtyninth annual ACM symposium on Theory of computing</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>75</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Blum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>McSherry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nissim</surname>
          </string-name>
          ,
          <article-title>Practical privacy: the sulq framework</article-title>
          ,
          <source>in: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>128</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>A. Asuncion,</surname>
          </string-name>
          <article-title>Uci machine learning repository</article-title>
          , university of california, irvine, school of information and computer sciences, http://www. ics. uci. edu/~ mlearn/MLRepository. html (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , M. Winslett,
          <article-title>Privgene: diferentially private model fitting using genetic algorithms</article-title>
          ,
          <source>in: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>665</fpage>
          -
          <lpage>676</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>