=Paper=
{{Paper
|id=Vol-3163/paper7
|storemode=property
|title=Applying the Shuffle Model of Differential Privacy to Vector Aggregation
|pdfUrl=https://ceur-ws.org/Vol-3163/BICOD21_paper_4.pdf
|volume=Vol-3163
|authors=Mary Scott,Graham Cormode,Carsten Maple
|dblpUrl=https://dblp.org/rec/conf/bncod/ScottCM21
}}
==Applying the Shuffle Model of Differential Privacy to Vector Aggregation==
Applying the Shuffle Model of Differential Privacy to Vector Aggregation Mary Scott1 , Graham Cormode1 and Carsten Maple2 1 Department of Computer Science, University of Warwick, Coventry, CV4 7AL, UK 2 WMG, University of Warwick, Coventry, CV4 7AL, UK Abstract In this work we introduce a new protocol for vector aggregation in the context of the Shuffle Model, a recent model within Differential Privacy (DP). It sits between the Centralized Model, which prioritizes the level of accuracy over the secrecy of the data, and the Local Model, for which an improvement in trust is counteracted by a much higher noise requirement. The Shuffle Model was developed to provide a good balance between these two models through the addition of a shuffling step, which unbinds the users from their data whilst maintaining a moderate noise requirement. We provide a single message protocol for the summation of real vectors in the Shuffle Model, using advanced composition results. Our contribution provides a mechanism to enable private aggregation and analysis across more sophisticated structures such as matrices and higher-dimensional tensors, both of which are reliant on the functionality of the vector case. Keywords Differential privacy (DP), single-message shuffle model, local randomizer, randomized response, mean squared error (MSE). 1. Introduction There are many practical applications of the Single- Message Shuffle Model in Federated Learning, where Differential Privacy (DP) [1] is a strong, mathematical multiple users collaboratively solve a Machine Learning definition of privacy that guarantees a measurable level of problem, the results of which simultaneously improves confidentiality for any data subject in the dataset to which the model for the next round [3]. The updates generated it is applied. In this way, useful collective information by the users after each round are high-dimensional vec- can be learned about a population, whilst simultaneously tors, so this data type will prove useful in applications protecting the personal information of each data subject. such as training a Deep Neural Network to predict the In particular, DP guarantees that the impact on any next word that a user types [4]. Additionally, aggrega- particular individual as a result of analysis on a dataset tion is closely related to Secure Aggregation, which can is the same, whether or not the individual is included in be used to compute the outputs of Machine Learning the dataset. This guarantee is quantified by a parameter problems such as the one above [5]. ๐, which represents good privacy if it is small. However, Our contribution is a protocol in the Single-Message finding an algorithm that achieves DP often requires a Shuffle Model for the private summation of vector-valued trade-off between privacy and accuracy, as a smaller ๐ messages, extending an existing result from Balle et al. [2] sacrifices accuracy for better privacy, and vice versa. DP by permitting the ๐ users to each submit a vector of real enables data analyses such as the statistical analysis of numbers instead of a scalar. The resulting estimator is the salaries of a population. This allows useful collec- unbiased and has normalized mean squared error (MSE) tive information to be studied, as long as ๐ is adjusted ๐๐,๐ฟ (๐ 8/3 ๐โ5/3 ), where ๐ is the dimension of each vector. appropriately to satisfy the definition of DP. This vector summation protocol above can be extended In this work we focus on protocols in the Single-Message to produce a similar protocol for the linearization of ma- Shuffle Model [2], a one-time data collection model where trices. It is important to use matrix reduction to en- each of ๐ users is permitted to submit a single message. sure that the constituent vectors are linearly indepen- We have chosen to apply the Single-Message Shuffle dent. This problem can be extended further to higher- Model to the problem of vector aggregation, as there are dimensional tensors, which are useful for the representa- links to Federated Learning and Secure Aggregation. tion of multi-dimensional data in Neural Networks. BICOD21: British International Conference on Databases, March 28, 2022, London, UK Envelope-Open mary.p.scott@warwick.ac.uk (M. Scott); 2. Related Work g.cormode@warwick.ac.uk (G. Cormode); cm@warwick.ac.uk (C. Maple) The earliest attempts at protecting the privacy of users in Orcid 0000-0003-0799-5840 (M. Scott); 0000-0002-0698-0922 a dataset focused on simple ways of suppressing or gen- (G. Cormode); 0000-0002-4715-212X (C. Maple) eralising the data. Examples include ๐-anonymity [6], ๐- ยฉ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). diversity [7] and ๐ก-closeness [8]. However, such attempts CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 1 Mary Scott et al. CEUR Workshop Proceedings 1โ10 have been shown to be insufficient, as proved by numer- Multi-Message Shuffle Model, an extension of the Single- ous examples [9]. Message Shuffle Model that permits each of the ๐ users This harmful leakage of sensitive information can be to submit more than one message, using several inde- easily prevented through the use of DP, as this mathe- pendent shufflers to securely compute the sum. In this matically guarantees that the chance of a linkage attack work, Balle et al. contributed a recursive construction on an individual in the dataset is almost identical to that based on the protocol from [2], as well as an alternative on an individual not in the dataset. mechanism which implements a discretized distributed Ever since DP was first conceptualized in 2006 by noise addition technique using the result from Ishai et Dwork et al. [1], the majority of research in the field al. [16]. has focused on two opposing models. In the Centralized Also relevant to our research is the work of Ghazi et Model, users submit their sensitive personal informa- al. [18], which explored the related problems of private tion directly to a trusted central data collector, who adds frequency estimation and selection in a similar context, random noise to the raw data to provide DP, before as- drawing comparisons between the errors achieved in the sembling and analyzing the aggregated results. Single-Message and Multi-Message Shuffle Models. A In the Local Model, DP is guaranteed when each user similar team of authors produced a follow-up paper [19] applies a local randomizer to add random noise to their describing a more efficient protocol for private summa- data before it is submitted. The Local Model differs from tion in the Single-Message Shuffle Model, using the โin- the Centralized Model in that the central entity does not visibility cloakโ technique to facilitate the addition of see the usersโ raw data at any point, and therefore does zero-sum noise without coordination between the users. not have to be trusted. However, the level of noise re- quired per user for the same privacy guarantee is much higher, which limits the usage of Local Differential Pri- 3. Preliminaries vacy (LDP) to major companies such as Google [10], Ap- We consider randomized mechanisms [9] โณ, โ under ple [11] and Microsoft [12]. โ ๐ทโโฒ domains ๐, ๐, and apply them to input datasets ๐ท, Neither of these two extensively studied models can to generate (vector-valued) messages ๐ฅโ๐ , ๐ฅโ๐โฒ . We write provide a good balance between the trust of the central [๐] = {1, โฆ , ๐} and โ for the set of natural numbers. entity and the level of noise required to guarantee DP. Hence, in recent years researchers have tried to create intermediate models that reap the benefits of both. 3.1. Models of Differential Privacy In 2017, Bittau et al. [13] introduced the Encode, Shuf- The essence of Differential Privacy (DP) is the require- fle, Analyze (ESA) model, which provides a general frame- ment that the contribution ๐ฅโ๐ of a user ๐ to a dataset work for the addition of a shuffling step in a private pro- ๐ทโ = (๐ฅโ1 , โฆ , ๐ฅโ๐ ) does not have much effect on the out- tocol. After the data from each user is encoded, it is come of the mechanism applied to that dataset. randomly permuted to unbind each user from their data In the centralized model of DP, random noise is only in- before analysis takes place. In 2019, Cheu et al. [14] for- troduced after the usersโ inputs are gathered by a (trusted) malized the Shuffle Model as a special case of the ESA โ โฒ that differs from ๐ทโ aggregator. Consider a dataset ๐ท model, which connects this additional shuffling step to โ โ only in the contribution of a single user, denoted ๐ท โ ๐ท โฒ . the Local Model. In the Shuffle Model, the local random- Also let ๐ โฅ 0 and ๐ฟ โ (0, 1). We say that a randomized izer applies a randomized mechanism on a per-element mechanism โณ โถ ๐๐ โ ๐ is (๐, ๐ฟ)-differentially private basis, potentially replacing a truthful value with another โ โ๐ท if โ๐ท โ โฒ , โ๐ธ โ ๐: randomly selected domain element. The role of these independent reports is to create what we call a privacy โ โ ๐ธ] โค ๐ ๐ โ Pr[โณ(๐ท Pr[โณ(๐ท) โ โฒ ) โ ๐ธ] + ๐ฟ [9]. blanket, which masks the outputs which are reported truthfully. In this definition, we assume that the trusted aggrega- As well as the result on the private summation of scalar- tor obtains the raw data from all users and introduces valued messages in the Single-Message Shuffle Model the necessary perturbations. that we will be using [2], Balle et al. have published In the local model of DP, each user ๐ independently uses two more recent works that solve related problems. The randomness on their input ๐ฅโ๐ โ ๐ by using a local ran- first paper [15] improved the distributed ๐-party summa- domizer โ โถ ๐ โ ๐ to obtain a perturbed result โ(๐ฅโ๐ ). tion protocol from Ishai et al. [16] in the context of the We say that the local randomizer is (๐, ๐ฟ)-differentially Single-Message Shuffle Model to require ๐(1 + ๐/ log ๐) โ ๐ท private if โ๐ท, โ โฒ , โ๐ธ โ ๐: scalar-valued messages, instead of a logarithmic depen- Pr[โ(๐ฅโ๐ ) โ ๐ธ] โค ๐ ๐ โ Pr[โ(๐ฅโ๐โฒ ) โ ๐ธ] + ๐ฟ [2], dency of ๐(log ๐ + ๐), to achieve statistical security 2โ๐ . The second paper [17] introduced two new protocols for where ๐ฅโ๐โฒ โ ๐ is some other valid input vector that ๐ could the private summation of scalar-valued messages in the hold. The Local Model guarantees that any observer will 2 Mary Scott et al. CEUR Workshop Proceedings 1โ10 not have access to the raw data from any of the users. Algorithm 1: Local Randomizer โ๐พ๐๐ป ,๐,๐ That is, it removes the requirement for trust. The price is that this requires a higher level of noise per user to Public Parameters: achieve the same privacy guarantee. ๐พ โ [0, 1], domain size ๐, and number of parties ๐ Input: ๐ฅ๐ โ [๐] 3.2. Single-Message Shuffle Model Output: ๐ฆ๐ โ [๐] The Single-Message Shuffle Model sits in between the Sample ๐ โ Ber (๐พ ) Centralized and Local Models of DP [2]. Let a protocol if ๐ = 0 then let ๐ฆ๐ โ ๐ฅ๐ ๐ซ in the Single-Message Shuffle Model be of the form else sample ๐ฆ๐ โ Unif ([๐]) ๐ซ = (โ, ๐ ), where โ โถ ๐ โ ๐ is the local randomizer, return ๐ฆ๐ and ๐ โถ ๐๐ โ โค is the analyzer of ๐ซ. Overall, ๐ซ im- plements a mechanism ๐ซ โถ ๐๐ โ โค as follows. Each user ๐ independently applies the local randomizer to their 4.1. Basic Randomizer message ๐ฅโ๐ to obtain a message ๐ฆโ๐ = โ(๐ฅโ๐ ). Subsequently, the messages (โ ๐ฆ1 , โฆ , ๐ฆโ๐ ) are randomly permuted by a First, we describe a basic local randomizer applied by each trusted shuffler ๐ฎ โถ ๐๐ โ ๐๐ . The random permutation user ๐ to an input ๐ฅ๐ โ [๐]. The output of this protocol is a ๐ฎ (โ ๐ฆ1 , โฆ , ๐ฆโ๐ ) is submitted to an untrusted data collector, (private) histogram of shuffled messages over the domain who applies the analyzer ๐ to obtain an output for the [๐]. mechanism. In summary, the output of ๐ซ (๐ฅโ1 , โฆ , ๐ฅโ๐ ) is The Local Randomizer โ๐พ๐๐ป ,๐,๐ , shown in Algorithm 1, given by: applies a generalized randomized response mechanism that returns the true message ๐ฅ๐ with probability 1 โ ๐ โ ๐ฎ โ โ ๐ (๐ฅ) โ = ๐ (๐ฎ (โ(๐ฅโ1 ), โฆ , โ(๐ฅโ๐ ))). ๐พ and a uniformly random message with probability ๐พ. Such a basic randomizer is used by Balle et al. [2] in the Note that the data collector observing the shuffled mes- Single-Message Shuffle Model for scalar-valued messages, sages ๐ฎ (โ ๐ฆ1 , โฆ , ๐ฆโ๐ ) obtains no information about which as well as in several other previous works in the Local user generated each of the messages. Therefore, the pri- Model [20, 21, 22]. In Section 4.3, we find an appropriate vacy of ๐ซ relies on the indistinguishability between the ๐พ to optimize the proportion of random messages that are shuffles ๐ฎ โ โ ๐ (๐ท) โ and ๐ฎ โ โ ๐ (๐ทโ โฒ ) for datasets ๐ท โ โ๐ท โ โฒ. submitted, and therefore guarantee DP. The analyzer can represent the shuffled messages as a We now describe how the presence of these random histogram, which counts the number of occurrences of messages can form a โprivacy blanketโ to protect against the possible outputs of ๐. a difference attack on a particular user. Suppose we apply Algorithm 1 to the messages from all ๐ users. Note that 3.3. Measuring Accuracy a subset ๐ต of approximately ๐พ ๐ of these users returned a uniformly random message, while the remaining users In Section 4 we use the mean squared error to compare returned their true message. Following Balle et al. [2], the overall output of a private summation protocol in the the analyzer can represent the messages sent by users in Single-Message Shuffle Model with the original dataset. ๐ต by a histogram ๐1 of uniformly random messages, and The MSE is used to measure the average squared differ- can form a histogram ๐2 of truthful messages from users ence in the comparison between a fixed input ๐ (๐ท) โ to not in ๐ต. As these subsets are mutually exclusive and โ In this the randomized protocol ๐ซ, and its output ๐ซ (๐ท). collectively exhaustive, the information represented by context, MSE(๐ซ , ๐ท)โ = E[(๐ซ (๐ท) โ โ ๐ (๐ท)) โ 2 ], where the the analyzer is equivalent to the histogram ๐ = ๐1 โช ๐2 . expectation is taken over the randomness of ๐ซ. Note Consider two neighbouring datasets, each consisting โ = ๐ (๐ท),โ MSE is equivalent to variance, of ๐ messages from ๐ users, that differ only on the in- when E[๐ซ (๐ท)] put from the ๐th user. To simplify the discussion and i.e.: subsequent proof, we temporarily omit the action of the โ = E[(๐ซ (๐ท) MSE(๐ซ , ๐ท) โ โ E[๐ซ (๐ท)]) โ 2 ] = Var[๐ซ (๐ท)]. โ shuffler. By the post-processing property of DP, this can be reintroduced later on without adversely affecting the privacy guarantees. To achieve DP we need to find an 4. Vector Sum in the Shuffle Model appropriate ๐พ such that when Algorithm 1 is applied, the change in ๐ is appropriately bounded. As the knowledge In this section we introduce our protocol for vector sum- of either the set ๐ต or the messages from the first ๐ โ 1 mation in the Shuffle Model and tune its parameters to users does not affect DP, we can assume that the ana- optimize accuracy. lyzer knows both of these details. This lets the analyzer remove all of the truthful messages associated with the 3 Mary Scott et al. CEUR Workshop Proceedings 1โ10 first ๐ โ 1 users from ๐. Algorithm 2: Local Randomizer โ๐,๐,๐,๐ก If the ๐th user is in ๐ต, this means their submission is Public Parameters: ๐, ๐ก, dimension ๐, and independent of their input, so we trivially satisfy DP. number of parties ๐ Otherwise, the (curious) analyzer knows that the ๐th user (1) (๐) has submitted their true message ๐ฅ๐ . The analyzer can Input: ๐ฅโ๐ = (๐ฅ๐ , โฆ , ๐ฅ๐ ) โ [0, 1]๐ (๐ผ ) (๐ผ ) remove all of the truthful messages associated with the Output: ๐ฆโ๐ = (๐ฆ๐ 1 , โฆ , ๐ฆ๐ ๐ก ) โ {0, 1, โฆ , ๐}๐ก first ๐โ1 users from ๐, and obtain ๐1 โช{๐ฅ๐ }. The subsequent Sample (๐ผ1 , โฆ , ๐ผ๐ก ) โ Unif ([๐]) privacy analysis will argue that this does not reveal ๐ฅ๐ (๐ผ ) (๐ผ ) (๐ผ ) (๐ผ ) if ๐พ is set so that ๐1 , the histogram of random messages, Let ๐ฅ๐ฬ ๐ โ โ๐ฅ๐ ๐ ๐โ + Ber (๐ฅ๐ ๐ ๐ โ โ๐ฅ๐ ๐ ๐โ) (๐ผ๐ ) (๐ผ๐ ) appropriately โhidesโ ๐ฅ๐ . โท ๐ฅ๐ฬ : encoding of ๐ฅ๐ with precision ๐ (๐ผ๐ ) (๐ผ๐ ) โท ๐ฆ๐ : apply Algorithm 1 to each ๐ฅ๐ฬ 4.2. Private Summation of Vectors (๐ผ ) (๐ผ ) return ๐ฆโ๐ = (๐ฆ๐ 1 , โฆ , ๐ฆ๐ ๐ก ) Here, we extend the protocol from Section 4.1 to ad- dress the problem of computing the sum of ๐ real vec- (1) (๐) tors, each of the form ๐ฅโ๐ = (๐ฅ๐ , โฆ , ๐ฅ๐ ) โ [0, 1]๐ , in the Algorithm 3: Analyzer ๐๐,๐,๐ก Single-Message Shuffle Model. Specifically, we analyze Public Parameters: ๐, ๐ก, and dimension ๐ the utility of a protocol ๐ซ๐,๐,๐,๐ก = (โ๐,๐,๐,๐ก , ๐๐,๐,๐ก ) for this Input: Multiset {โ ๐ฆ๐ }๐โ[๐] , with purpose, by using the MSE from Section 3.3 as the accu- (๐ผ ) (๐ผ ) (๐ฆ๐ 1 , โฆ , ๐ฆ๐ ๐ก ) โ {0, 1, โฆ , ๐}๐ก racy measure. In the scalar case, each user applies the Output: ๐งโ = (๐ง (1) , โฆ , ๐ง (๐) ) โ [0, 1]๐ protocol to their entire input [2]. Moving to the vector case, we allow each user to independently sample a set (๐) (๐ผ ) Let ๐ฆ๐ โ ๐ฆ๐ ๐ of 1 โค ๐ก โค ๐ coordinates from their vector to report. Our (๐ผ๐ ) (๐) analysis allows us to optimize the parameter ๐ก. โท ๐ฆ๐ : submission corresponding to ๐ฅ๐ Hence, the first step of the Local Randomizer โ๐,๐,๐,๐ก , (1) (๐) presented in Algorithm 2, is to uniformly sample ๐ก co- Let (๐ง ฬ(1) , โฆ , ๐ง ฬ(๐) ) โ ( 1๐ โ๐ ๐ฆ๐ , โฆ , 1๐ โ๐ ๐ฆ๐ ) ordinates (๐ผ1 , โฆ , ๐ผ๐ก ) โ [๐] (without replacement) from Let each vector ๐ฅโ๐ . To compute a differentially private ap- (๐ง (1) , โฆ , ๐ง (๐) ) โ (DeBias (๐ง ฬ(1) ), โฆ ,DeBias (๐ง ฬ(๐) )) (๐) โท DeBias (๐ง ฬ ) = (๐ง ฬ (๐) โ ๐พ โ |๐ฆ (๐) |)/(1 โ ๐พ ) proximation of โ๐ ๐ฅโ๐ , we fix a quantization level ๐. Then 2 ๐ (๐ผ ) (๐ผ ) we randomly round each ๐ฅ๐ ๐ to obtain ๐ฅ๐ฬ ๐ as either return ๐งโ = (๐ง (1) , โฆ , ๐ง (๐) ) (๐ผ ) (๐ผ ) โ๐ฅ๐ ๐ ๐โ or โ๐ฅ๐ ๐ ๐โ. Next, we apply the randomized re- (๐ผ ) sponse mechanism from Algorithm 1 to each ๐ฅ๐ฬ ๐ , which 4.3. Privacy Analysis of Algorithms (๐ผ ) (๐ผ ) sets each output ๐ฆ๐ ๐ independently to be equal to ๐ฅ๐ฬ ๐ In this section, we will find an appropriate ๐พ that en- with probability 1 โ ๐พ, or a random value in {0, 1, โฆ , ๐} (๐ผ๐ ) sures that the mechanism described in Algorithms 2 and with probability ๐พ. Each ๐ฆ๐ will contribute to a his- 3 satisfies (๐, ๐ฟ)-DP for vector-valued messages in the (๐ผ ) (๐ผ ) Single-Message Shuffle Model. To achieve this, we prove togram of the form (๐ฆ1 ๐ , โฆ , ๐ฆ๐ ๐ ) as in Section 4.1. The Analyzer ๐๐,๐,๐ก , shown in Algorithm 3, aggregates the following theorem, where we initially assume ๐ < 1 the histograms to approximate โ๐ ๐ฅโ๐ by post-processing to simplify our computations. At the end of this section, the vectors coordinate-wise. More precisely, the analyzer we discuss how to cover the additional case 1 โค ๐ < 6 to (๐ผ ) (๐) suit our experimental study. sets each output ๐ฆ๐ ๐ to ๐ฆ๐ , where the new label ๐ is from (๐) its corresponding input ๐ฅ๐ of the original ๐-dimensional Theorem 4.1. The shuffled mechanism โณ = ๐ฎ โ โ๐,๐,๐,๐ก (๐) is (๐, ๐ฟ)-DP for any ๐, ๐, ๐ โ โ, {๐ก โ โ | ๐ก โ [๐]}, ๐ < 6 and vector ๐ฅโ๐ . For all inputs ๐ฅ๐ that were not sampled, we (๐) ๐ฟ โ (0, 1] such that: set ๐ฆ๐ = 0. Subsequently, the analyzer aggregates the 56๐๐ log(1/๐ฟ) log(2๐ก/๐ฟ) sets of outputs from all users corresponding to each of , when ๐ < 1 (๐โ1)๐ 2 those ๐ coordinates in turn, so that a ๐-dimensional vector ๐พ = { 2016๐๐ log(1/๐ฟ) log(2๐ก/๐ฟ) is formed. Finally, a standard debiasing step is applied to (๐โ1)๐ 2 , when 1 โค ๐ < 6. this vector to remove the scaling and rounding applied to โ โโฒ โฒ each submission. DeBias returns an unbiased estimator, Proof. Let ๐ท = (๐ฅโ1 , โฆ , ๐ฅโ๐ ) and ๐ท = (๐ฅโ1 , โฆ , ๐ฅโ๐ ) be the ๐งโ, which calculates an estimate of the true sum of the two neighbouring th datasets differing only in the input of vectors by subtracting the expected uniform noise from the ๐ user, as used in Section 4.1. (1)Here each (๐) vector- the randomized sum of the vectors. valued message ๐ฅโ๐ is of the form (๐ฅ๐ , โฆ , ๐ฅ ๐ ). Recall from Section 4.1 that we assume that the analyzer can 4 Mary Scott et al. CEUR Workshop Proceedings 1โ10 see the users in ๐ต (i.e., the subset of users that returned a sufficient to derive (1) from: uniformly random message), as well as the inputs from (๐ผ ) the first ๐ โ 1 users. โ = V๐ผ ] Pr[Viewโณ๐ (๐ท) ๐ โฒ We now introduce the vector view VViewโณ (๐ท) โ as the Pr (๐ผ๐ ) โ [ โฅ ๐๐ ] โค ๐ฟ โฒ, V๐ผ๐ โผViewโณ (๐ท) (๐ผ ) โฒ collection of information that the analyzer is able to see Pr[Viewโณ๐ (๐ทโ ) = V๐ผ ] ๐ after the mechanism โณ is applied to all vector-valued (2) messages in the dataset ๐ท. โ VViewโณ (๐ท)โ is defined as ฬ = โ ๐ผ V๐ผ , ๐ โฒ = where V ๐ and ๐ฟ โฒ = ๐ฟ๐ก . ๐ ๐ 2โ2๐ก log(1/๐ฟ) โ โ โ โ the tuple (๐ , ๐ทโฉ , ๐), where ๐ is the multiset containing the outputs {โ ๐ฆ1 , โฆ , ๐ฆโ๐ } of the mechanism โณ(๐ท),โ ๐ท โ โฉ is Lemma 4.4. Condition (2) implies condition (1). the vector containing the inputs (๐ฅโ1 , โฆ , ๐ฅโ๐โ1 ) from the Proof. We can express VViewโณ (๐ท) โ as the composition first ๐ โ 1 users, and โ๐ contains binary vectors (โ๐1 , โฆ , โ๐๐ ) (๐ผ1 ) (๐ผ ) which indicate for which coordinates each user reports of the ๐ก scalar views Viewโณ , โฆ , Viewโณ๐ก , as: truthful information. This vector view can be projected to โ = V] Pr[VViewโณ (๐ท) ฬ ๐ก overlapping scalar views by applying Algorithm 2 only to the ๐ th uniformly sampled coordinate ๐ผ๐ โ [๐] from (๐ผ ) โ = V๐ผ โง โฏ โง View ๐ก (๐ท) = Pr[Viewโณ1 (๐ท) โ = V๐ผ ] (๐ผ ) (๐ผ ) 1 โณ ๐ก each user, where ๐ โ [๐ก]. The ๐ th scalar view Viewโณ๐ (๐ท) โ (๐ผ ) โ (๐ผ ) โ (๐ผ ) = Pr[Viewโณ1 (๐ท) = V๐ผ1 ] โ โฏ โ Pr[Viewโณ๐ก (๐ท) = V๐ผ๐ก ]. (๐ผ โ is defined as the tuple (๐โ , ๐ท of VViewโณ (๐ท) ๐ ) ๐ โ โ โฉ , ๐ ) ), (๐ผ๐ where: Our desired result is immediate by applying Corol- lary 4.3, which states that the use of ๐ก overlapping (๐ โฒ , ๐ฟ โฒ )- ๐โ (๐ผ๐ ) = โณ(๐ท โ (๐ผ๐ ) ) = {๐ฆ1(๐ผ๐ ) , โฆ , ๐ฆ๐(๐ผ๐ ) }, DP mechanisms, when taken together, is (๐, ๐ฟ)-DP. This applies in our setting, since we have assumed that โ โฉ(๐ผ๐ ) = (๐ฅ1(๐ผ๐ ) , โฆ , ๐ฅ๐โ1 ๐ท (๐ผ๐ ) ) โ satisfies the requirements of (๐, ๐ฟ)-DP, and VViewโณ (๐ท) (๐ผ๐ ) (๐ผ๐ ) that each of the ๐ก overlapping scalar views is formed iden- and โ๐ ๐ = (๐ , โฆ , ๐๐ ) (๐ผ ) 1 tically but for a different uniformly sampled coordinate โ โฉ and โ๐, but contain- of the vector-valued messages. โ ๐ท are the analogous definitions of ๐, ing only the information referring to the ๐ th uniformly To complete the proof of Theorem 4.1 for ๐ < 1, it sampled coordinate of each vector-valued message. remains to show that for a uniformly sampled coordinate The following advanced composition results will be (๐ผ ) ๐ผ๐ โ [๐], Viewโณ๐ (๐ท) โ satisfies (๐ โฒ , ๐ฟ โฒ )-DP. used in our setting to get a tight upper bound: Theorem 4.2 (Dwork et al. [9]). For all ๐ โฒ , ๐ฟ โฒ , ๐ฟ โฅ 0, Lemma 4.5. Condition (2) holds. the class of (๐ โฒ , ๐ฟ โฒ )-differentially private mechanisms satis- Proof. See Appendix A. fies (๐, ๐๐ฟ โฒ + ๐ฟ)-differential privacy under ๐-fold adaptive composition for: We now show that the above proof can be adjusted to โฒ cover the additional case 1 โค ๐ < 6. This will be sufficient ๐ = โ2๐ log(1/๐ฟ)๐ โฒ + ๐๐ โฒ (๐ ๐ โ 1). to complete the proof of our main Theorem 4.1. First, we scale the setting of ๐ โฒ by a multiple of 6 in Corollary 4.3. Given target privacy parameters 0 < ๐ < 1 Corollary 4.3 so that the advanced composition property and ๐ฟ > 0, to ensure (๐, ๐๐ฟ โฒ +๐ฟ) cumulative privacy loss over holds for all 1 โค ๐ < 6. We now insert ๐ โฒ = ๐ ๐ mechanisms, it suffices that each mechanism is (๐ โฒ , ๐ฟ โฒ )-DP, 12โ2๐ log(1/๐ฟ) into the proof of Theorem 4.1, resulting in a change of where: ๐ constant from 56 to 2016. ๐โฒ = . 2โ2๐ log(1/๐ฟ) โ satisfies (๐, ๐ฟ)-DP it suffices to 4.4. Accuracy Bounds for Shuffled Vector To show that VViewโณ (๐ท) prove that: Sum We now formulate an upper bound for the MSE of our Pr[VViewโณ (๐ท) ฬ โ = V] protocol, and then identify the value(s) of ๐ก that minimize PrVโผVView ฬ โ [ โฅ ๐ ๐ ] โค ๐ฟ. (1) โณ (๐ท) โ โฒ ฬ this upper bound. Pr[VViewโณ (๐ท ) = V] (๐ผ ) (๐ผ ) First, note that encoding the coordinate ๐ฅ๐ ๐ as ๐ฅ๐ฬ ๐ = By considering this vector view as a union of overlap- (๐ผ๐ ) (๐ผ๐ ) (๐ผ๐ ) ping scalar views, and letting ๐ = ๐ก in Corollary 4.3, it is โ๐ฅ๐ ๐โ + Ber (๐ฅ๐ ๐ โ โ๐ฅ๐ ๐โ) in Algorithm 2 ensures (๐ผ ) (๐ผ ) that ๐ผ[๐ฅ๐ฬ ๐ /๐] = ๐ผ[๐ฅ๐ ๐ ]. This means that our protocol is unbiased. For any unbiased random variable ๐ with 5 Mary Scott et al. CEUR Workshop Proceedings 1โ10 ๐ < ๐ < ๐ then Var[๐ ] โค (๐ โ ๐)2 /4, and so the MSE To obtain the error between the estimated average per coordinate due to the fixed-point approximation of vector and the true average vector, we simply take the the true vector in โ๐,๐,๐,๐ก is at most 4๐1 2 . Meanwhile, the square root of the result obtained in Theorem 4.6. MSE when โ๐,๐,๐,๐ก submits a random vector is at most 12 Corollary 4.7. For every statistical query ๐ โถ ๐ณ โฆ per coordinate. [0, 1]๐ , ๐, ๐ โ โ, {๐ก โ โ | ๐ก โ [๐]}, ๐ < 6 and ๐ฟ โ (0, 1], We now use the unbiasedness of our protocol to obtain there is an (๐, ๐ฟ)-DP ๐-party unbiased protocol for estimat- a result for estimating the squared error between the ing ๐๐ โ๐ ๐(๐ฅโ๐ ) in the Single-Message Shuffle Model with estimated average vector and the true average vector. standard deviation When calculating the MSE, each coordinate location is (2๐ก)1/2 ๐ 4/3 (14 log(1/๐ฟ) log(2๐ก/๐ฟ))1/3 used with expectation ๐/๐. Therefore, we define the โง , (1โ๐พ )๐5/6 ๐ 2/3 ฬ as the normalization of the โช โช normalized MSE, or MSE, when ๐ < 1 MSE by a factor of (๐/๐)2 . ๐(๐ซ ฬ ๐,๐,๐,๐ก ) = 1/2 4/3 1/3 โจ (8๐ก) ๐ (63 log(1/๐ฟ) log(2๐ก/๐ฟ)) , โช โช (1โ๐พ )๐ 5/6 ๐ 2/3 Theorem 4.6. For any ๐, ๐ โ โ, {๐ก โ โ | ๐ก โ [๐]}, ๐ < 6 and ๐ฟ โ (0, 1], there exists a parameter ๐ such that ๐ซ๐,๐,๐,๐ก โฉ when 1 โค ๐ < 6, is (๐, ๐ฟ)-DP and where ๐ฬ denotes the error between the estimated average 2๐ก๐ 8/3 (14 log(1/๐ฟ) log(2๐ก/๐ฟ))2/3 vector and the true average vector. โง , (1โ๐พ )2 ๐5/3 ๐ 4/3 To summarize, we have produced an unbiased pro- โช โช when ๐ < 1 ฬ ๐,๐,๐,๐ก ) = MSE(๐ซ tocol for the computation of the sum of ๐ real vectors 8/3 2/3 โจ 8๐ก๐ (63 log(1/๐ฟ) log(2๐ก/๐ฟ)) , in the Single-Message Shuffle Model with normalized โช (1โ๐พ ) 2 ๐5/3 ๐ 4/3 โช MSE ๐๐,๐ฟ (๐ 8/3 ๐ก๐โ5/3 ), using advanced composition re- โฉ when 1 โค ๐ < 6, sults from Dwork et al. [9]. Minimizing this bound as a ฬ denotes the squared error between the estimated function of ๐ก leads us to choose ๐ก = 1, but any choice of ๐ก where MSE average vector and the true average vector. that is small and not dependent on ๐ produces a bound of the same order. In our experimental study, we determine ๐ Proof. We consider the โ๐=1 DeBias (๐ง ฬ(๐) ) of ๐ซ๐,๐,๐,๐ก com- that the best choice of ๐ก in practice is indeed ๐ก = 1. ๐ก ๐ (๐ผ ) pared to the corresponding input โ๐=1 โ๐=1 ๐ฅ๐ ๐ over โ We use the bounds on the variance of the 4.5. Improved bounds for t=1 the dataset ๐ท. randomized response mechanism from Theorem 4.6 to We observe that in the optimal case in which ๐ก = 1, we give us an upper bound for this comparison. can tighten the bounds further, as we do not need to 2 invoke the advanced composition results when each user ๐ ๐ก ๐ (๐) (๐ผ๐ ) samples only a single coordinate. This changes the value MSE(๐ซ๐,๐,๐,๐ก ) = sup E[(โ DeBias (๐ง ฬ ) โ โ โ ๐ฅ๐ ) ] โ ๐ท ๐=1 ๐=1 ๐=1 of ๐พ by a factor of ๐(log(1/๐ฟ)), which propagates through 2 to the expression for the MSE. That is, we can more ๐ก ๐ (๐ผ๐ ) (๐ผ๐ ) simply set ๐ โฒ = ๐ and ๐ฟ โฒ = ๐ฟ in the proof of Theorem 4.1. = sup E[(โ โ(DeBias (๐ฆ๐ /๐) โ ๐ฅ๐ )) ] When ๐ < 1, the computation is straightforward, with โ ๐ท ๐=1 ๐=1 ๐ก ๐ ๐ โฅ ๐14โฒ2 log(2๐ก/๐ฟ) being chosen as before. However, when (๐ผ ) (๐ผ ) 2 = sup โ โ E[(DeBias (๐ฆ๐ ๐ /๐) โ ๐ฅ๐ ๐ ) ] 1 โค ๐ < 6, a tighter ๐ โฅ ๐80โฒ2 log(2๐ก/๐ฟ) must be selected, as โ ๐=1 ๐=1 ๐ท the condition ๐ โฒ < 1 no longer holds. ๐ก ๐ (๐ผ ) Using ๐ โฒ < 6, we have: = sup โ โ Var[DeBias (๐ฆ๐ ๐ /๐)] โ ๐=1 ๐=1 ๐ท 2 ๐โฒ (1 โ exp (โ๐ โฒ /2)) โฅ (1 โ exp (โ )) ๐ โฒ โฅ . ๐ก๐ (๐ผ ) ๐ก๐ 1โ๐พ ๐พ 3โ15 2โ10 = sup Var[๐ฆ1 1 /๐] โค ( + ) (1 โ ๐พ )2 (๐ผ1) (1 โ ๐พ )2 4๐ 2 2 Thus, we have: ๐ฅ1 ๐ก๐ 1 ๐ด๐ ๐๐ log(1/๐ฟ) log(2๐ก/๐ฟ) N๐ โฒ ๐ ๐ ๐โฒ 2 โค ( 2+ ), Pr[ โฅ ๐ ๐ ] โค exp ( โ (๐ โฒ /2)2 ) + exp ( โ ( ) ) 2 (1 โ ๐พ ) 4๐ (๐ โ 1)๐ 2 N๐ 3 2 2โ10 where ๐ด๐ = 28 when ๐ < 1, and ๐ด๐ = 1008 when 1 โค ๐ < 80 ๐ โฒ2 โค 2 exp (โ log(2๐ก/๐ฟ)) โค ๐ฟ/๐ก, 6. In other words, ๐ด๐ is equal to half the constant term 2๐ โฒ2 40 in the expression of ๐พ stated in Theorem 4.1. The choice which yields: (๐โ1)๐ 2 ๐ = 4๐ด ๐ log(1/๐ฟ) log(2๐ก/๐ฟ) minimizes the bracketed sum 14๐๐ log(2/๐ฟ) 27๐๐ ๐ max{ (๐โ1)๐ 2 , (๐โ1)๐ }, when ๐ < 1 above and the bounds in the statement of the theorem ๐พ ={ 80๐๐ log(2/๐ฟ) 36๐๐ follow. max{ (๐โ1)๐ 2 , 11(๐โ1)๐ }, when 1 โค ๐ < 6. 6 Mary Scott et al. CEUR Workshop Proceedings 1โ10 (a) Experimental error by number of coordinates ๐ก re- (b) Experimental error by number of buckets ๐ used tained ฬ for the ECG Figure 1: Bar charts confirming that the choices ๐ก = 1 (a) and ๐ = 3 (b) minimize the total experimental MSE Heartbeat Categorization Dataset. Note that the above expression for ๐พ in the case ๐ < 1 //www.kaggle.com/shayanfazeli/heartbeat. We analyse coincides with the result obtained by Balle et al. in the the effect of changing one key parameter at a time, whilst scalar case [2]. Putting this expression for ๐พ in the proof the others remain the same. Our default settings are of Theorem 4.6, with the choice vector dimension ๐ = 100, rounding parameter ๐ = 3, number of users ๐ = 50000, number of coordinates to 1/3 ๐๐ 2 ๐๐ 1/3 sample ๐ก = 1, and differential privacy parameters ๐ = 0.95 โงmin{( ) , ( ) }, when ๐ < 1 28๐ log(2/๐ฟ) 54๐ and ๐ฟ = 0.5. The ranges of all parameters have been ๐= 1/3 โจ ๐๐ 2 11๐๐ 1/3 adjusted to best display the dependencies, whilst simul- โฉ min{( 160๐ log(2/๐ฟ) ) , ( 72๐ ) }, when 1 โค ๐ < 6, taneously ensuring that the parameter ๐พ of the random- causes the upper bound on the normalized MSE to reduce ized response mechanism is always within its permitted to: range of [0, 1]. The Python code is available at https: //github.com/mary-python/dft/blob/master/shuffle. 981/3 ๐ 8/3 log2/3 (2/๐ฟ) 18๐ 8/3 We first confirm that the choice of ๐ก = 1 is optimal, โงmax{ (1โ๐พ )2 ๐5/3 ๐ 4/3 , (1โ๐พ )2 ๐5/3 (4๐)2/3 }, โช as predicted by the results of Section 4.5. Indeed, Fig. 1 โช when ๐ < 1 ฬ = MSE (a) shows that the total experimental MSE ฬ for the ECG โจmax{ 2๐ 8/3 (20 log(2/๐ฟ))2/3 , 2(92/3 )๐ 8/3 }, โช 2 5/3 (1โ๐พ ) ๐ ๐ 4/3 2 5/3 (1โ๐พ ) ๐ (11๐) 2/3 Heartbeat Categorization Dataset is significantly smaller โช when ๐ก = 1, compared to any other small value of ๐ก. โฉ when 1 โค ๐ < 6. Similarly, Fig. 1 (b) suggests that the total experimental By updating Corollary 4.7 in the same way, we can MSE ฬ is lowest when ๐ = 3, which is sufficiently close to conclude that for the optimal choice ๐ก = 1, the normal- the choice of ๐ selected in the proof of Theorem 4.6, with ized standard deviation of our unbiased protocol can be all other default parameter values substituted in. Observe further tightened to: that the absolute value of the observed MSE is below 0.3 in this case, meaning that the vector is reconstructed to a 981/6 ๐ 4/3 log1/3 (2/๐ฟ) 181/2 ๐ 4/3 high degree of accuracy, sufficient for many applications. โง max{ , }, (1โ๐พ )๐5/6 ๐ 2/3 (1โ๐พ )๐5/6 (4๐)1/3 โช โช Next, we verify the bounds of ๐ 8/3 and ๐โ5/3 from when ๐ < 1 Theorem 4.6. Fig. 2 (a) is plotted with a best fit curve ๐ฬ = โจmax{ 21/2 ๐ 4/3 (20 log(2/๐ฟ))1/3 , 21/2 91/3 ๐ 4/3 }, with equation a multiple of ๐ 8/3 , exactly as desired. Un- โช (1โ๐พ )๐5/6 ๐ 2/3 (1โ๐พ )๐5/6 (11๐)1/3 โช surprisingly, the MSE increases as ๐ goes up according โฉ when 1 โค ๐ < 6. to this superlinear dependence. Meanwhile, Fig. 2 (b) fits a curve dependent on ๐โ7/6 , sufficiently close to the required result. We see the benefit of increasing ๐: as ๐ 5. Experimental Evaluation increases by a factor of 10 across the plot, the error de- creases by more than two orders of magnitude. In Fig. 3, In this section we present and compare the bounds gen- we verify the dependency ๐ โ4/3 in the two ranges ๐ < 1 erated by applying Algorithms 2 and 3 to an ECG Heart- and 1 โค ๐ < 6. The behavior for ๐ < 1 is quite smooth, beat Categorization Dataset in Python, available at https: 7 Mary Scott et al. CEUR Workshop Proceedings 1โ10 (a) Experimental error by vector dimension ๐ (b) Experimental error by number of vectors ๐ used Figure 2: Bar charts with best fit curves confirming the dependencies ๐ 8/3 (a) and ๐โ5/3 (b) from Theorem 4.6. (a) Experimental error by value of ๐ where ๐ < 1 (b) Experimental error by value of ๐ where 1 โค ๐ < 6 Figure 3: Bar charts with best fit curves confirming the dependency ๐ โ4/3 from Theorem 4.6 in the two ranges ๐ < 1 (a) and 1 โค ๐ < 6 (b). but becomes more variable for larger ๐ values. addition of a new dimension ๐ introduces a new depen- In conclusion, these experiments confirm that picking dency for the bound, as well as the possibility of sampling ๐ก = 1 and ๐ = 3 serves to minimize the error. The lines ๐ก coordinates from each ๐-dimensional vector. For this of best fit confirm the dependencies on the other param- extension, we formally defined the vector view as the eters from Section 4 for ๐, ๐ and ๐, by implementing and knowledge of the analyzer upon receiving the random- applying Algorithms 2 and 3 to an ECG Heartbeat Catego- ized vectors, and expressed it as a union of overlapping rization Dataset in Python. The experiments demonstrate scalar views. Through the use of advanced composition that the MSE observed in practice is sufficiently small to results from Dwork et al. [9], we showed that the estima- allow effective reconstruction of average vectors for a tor now has normalized MSE ๐๐,๐ฟ (๐ 8/3 ๐ก๐โ5/3 ) which can suitably large cohort of users. be further improved to ๐๐,๐ฟ (๐ 8/3 ๐โ5/3 ) by setting ๐ก = 1. Our contribution has provided a stepping stone be- tween the summation of the scalar case discussed by 6. Conclusion Balle et al. [2] and the linearization of more sophisticated Our results extend a result from Balle et al. [2] for scalar structures such as matrices and higher-dimensional ten- sums to provide a protocol ๐ซ๐,๐,๐,๐ก in the Single-Message sors, both of which are reliant on the functionality of the Shuffle Model for the private summation of vector-valued vector case. As mentioned in Section 2, there is poten- messages (๐ฅโ1 , โฆ , ๐ฅโ๐ ) โ ([0, 1]๐ )๐ . It is not surprising that tial for further exploration in the Multi-Message Shuffle the normalized MSE of the resulting estimator has a de- Model to gain additional privacy, echoing the follow-up pendence on ๐โ5/3 , as this was the case for scalars, but the paper of Balle et al. [17]. 8 Mary Scott et al. CEUR Workshop Proceedings 1โ10 A. Proof of Lemma 4.5 Thus we have: Lemma 4.5. Condition (2) holds. N๐ โฒ ๐ ๐ Pr[ โฅ ๐ ๐ ] โค exp ( โ (๐ โฒ /2)2 ) + exp ( โ (๐ โฒ /โ7)2 ) N๐ 3 2 Proof. The way in which we split the vector view (i.e., to consider a single uniformly sampled coordinate of each 14 ๐ โฒ2 โค 2 exp (โ โฒ2 log(2๐ก/๐ฟ)) โค ๐ฟ/๐ก. vector-valued message in turn), means that we can apply 2๐ 7 a proof that is analogous to the scalar-valued case [2]. We now apply another Chernoff bound to show that We work through the key steps needed. ๐ โค 2E[๐ ], which can be used to give a bound on ๐พ. Recall from Section 4.1 that the case where the ๐th user The following calculation proves that Pr[๐ โฅ 2E(๐ )] โค submits a uniformly random message independent of exp(โE(๐ )/3), using E(๐ ) = (๐ โ 1)๐ก/๐: their input satisfies DP trivially. Otherwise, the ๐th user submits their true message, and we assume that analyzer ๐โ1 ๐ Pr[๐ โฅ 2E(๐ )] โค exp ( โ ๐ก/๐) โค exp ( โ ) < ๐ฟ/3๐ก, removes from ๐โ (๐ผ๐ ) any truthful messages associated with 3 3 (๐ผ ) the first ๐ โ 1 users. Denote ๐๐ ๐ to be the count of ๐ th for all reasonable values of ๐ฟ. coordinates remaining with a particular value ๐ โ [๐]. If Substituting these bounds on ๐ and ๐ into ๐พ ๐ /๐ = ๐ (๐ผ ) โฒ(๐ผ ) ๐ ๐ฅโ๐ ๐ = ๐ and ๐ฅโ๐ ๐ = ๐, we obtain the relationship along with ๐ โฒ = gives: 2โ2๐ก log(1/๐ฟ) (๐ผ ) (๐ผ ) โ = ๐๐ผ ] Pr[Viewโณ๐ (๐ท) ๐๐ ๐ 112๐๐ก log(1/๐ฟ) log(2๐ก/๐ฟ) 56๐๐ log(1/๐ฟ) log(2๐ก/๐ฟ) ๐ = . ๐พโฅ โฅ . (๐ผ ) (๐ผ ) ๐ ๐ 2 (๐ โ 1)๐ 2 โ โฒ ) = ๐๐ผ ] Pr[Viewโณ๐ (๐ท ๐๐ ๐ ๐ (๐ผ ) (๐ผ ) We observe that the counts ๐๐ ๐ and ๐๐ ๐ follow the bino- ๐พ ๐พ mial distributions N๐ โผ Bin (๐ , ๐ ) + 1 and N๐ โผ Bin (๐ , ๐ ) respectively, where ๐ denotes the number of times that the coordinate ๐ is sampled. In expectation, ๐ = (๐ โ 1)๐ก/๐, and below we will show that it is close to its expectation: (๐ผ ) โ = V๐ผ ] Pr[Viewโณ๐ (๐ท) ๐ โฒ Pr (๐ผ ) ๐ โ [ โฅ ๐๐ ] V๐ผ๐ โผViewโณ (๐ท) (๐ผ ) โฒ Pr[Viewโณ๐ (๐ทโ ) = V๐ผ ] ๐ N๐ โฒ = Pr[ โฅ ๐๐ ] . N๐ ๐พ We define ๐ โถ= E[N๐ ] = ๐ โ ๐ and split this into the union of two events, ๐๐ โฅ ๐๐ ๐ โฒ /2 โฒ and ๐๐ โค ๐๐ โ๐ /2 . Applying a Chernoff bound gives: N๐ โฒ ๐ โฒ 1 2 Pr[ โฅ ๐ ๐ ] โค exp(โ (๐ ๐ /2 โ 1 โ ) ) N๐ 3 ๐ ๐ โ๐ โฒ /2 2 + exp(โ (1 โ ๐ ) ). 2 We will choose ๐ โฅ ๐14โฒ2 log(2๐ก/๐ฟ) so that we have: 1 ๐ โฒ ๐ โฒ2 ๐ โฒ2 ๐โฒ exp (๐ โฒ /2) โ 1 โ โฅ + โ โฅ . ๐ 2 8 14 log(2๐ก/๐ฟ) 2 Using ๐ โฒ < 1, we have: ๐โฒ (1 โ exp (โ๐ โฒ /2)) โฅ (1 โ exp (โ1/2))๐ โฒ โฅ . โ7 9 Mary Scott et al. CEUR Workshop Proceedings 1โ10 References J. Tinnes, B. Seefeld, PROCHLO: Strong privacy for analytics in the crowd, in: Proceedings of the [1] C. Dwork, Differential privacy, in: Proceedings 26th Symposium on Operating Systems Principles, of the 33rd International Colloquium on Automata, ACM, New York City, 2017, pp. 441โ459. Languages and Programming (ICALP), Springer, [14] A. Cheu, A. Smith, J. Ullman, D. Zeber, M. Zhilyaev, Cham, 2006, pp. 1โ12. Distributed differential privacy via shuffling, in: [2] B. Balle, J. Bell, A. Gascรณn, K. Nissim, The privacy Annual International Conference on the Theory blanket of the shuffle model, in: Annual Inter- and Applications of Cryptographic Techniques, national Cryptology Conference, Springer, Cham, Springer, Cham, 2019, pp. 375โ403. 2019, pp. 638โ667. [15] B. Balle, J. Bell, A. Gascรณn, K. Nissim, Im- [3] B. McMahan, E. Moore, D. Ramage, S. Hampson, proved summation from shuffling, arXiv preprint B. A. y Arcas, Communication-efficient learning of arXiv:1909.11225, 2019. deep networks from decentralized data, in: Artifi- [16] Y. Ishai, E. Kushilevitz, R. Ostrovsky, A. Sahai, Cryp- cial Intelligence and Statistics Conference, PMLR, tography from anonymity, in: 47th Annual IEEE New York City, 2017, pp. 1273โ1282. Symposium on Foundations of Computer Science, [4] M. Abadi, A. Chu, I. Goodfellow, Deep learning IEEE, New York City, 2006, pp. 239โ248. with differential privacy, in: Proceedings of the [17] B. Balle, J. Bell, A. Gascรณn, K. Nissim, Private sum- 2016 ACM SIGSAC Conference on Computer and mation in the multi-message shuffle model, in: Communications Security, ACM, New York City, Proceedings of the 2020 ACM SIGSAC Conference 2016, pp. 308โ318. on Computer Communications and Security, ACM, [5] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, New York City, 2020, pp. 657โ676. B. McMahan, S. Patel, D. Ramage, A. Segal, K. Seth, [18] B. Ghazi, N. Golowich, R. Kumar, R. Pagh, A. Vel- Practical secure aggregation for privacy-preserving ingker, On the power of multiple anonymous mes- machine learning, in: Proceedings of the 2017 ACM sages, in: Advances in CryptologyโEUROCRYPT SIGSAC Conference on Computer and Communi- 2021, Springer, Cham, 2021, pp. 463โ488. cations Security, ACM, New York City, 2017, pp. [19] B. Ghazi, P. Manurangsi, R. Pagh, A. Velingker, Pri- 1175โ1191. vate aggregation from fewer anonymous messages, [6] L. Sweeney, k-anonymity: A model for protect- in: Advances in CryptologyโEUROCRYPT 2020, ing privacy, International Journal of Uncertainty, Springer, Cham, 2020, pp. 798โ827. Fuzziness and Knowledge-Based Systems 10 (2002) [20] P. Kairouz, S. Oh, P. Viswanath, Extremal mecha- 557โ570. nisms for local differential privacy, The Journal of [7] A. Machanavajjhala, D. Kifer, J. Gehrke, M. Venki- Machine Learning Research 17 (2016) 492โ542. tasubramaniam, l-diversity: Privacy beyond k- [21] P. Kairouz, K. Bonawitz, D. Ramage, Discrete dis- anonymity, in: ACM Transactions on Knowledge tribution estimation under local privacy, in: Pro- Discovery from Data (TKDD), ACM, New York City, ceedings of the 33rd International Conference on 2007, pp. 3โes. Machine Learning, volume 48, ACM, New York City, [8] N. Li, T. Li, S. Venkatasubramanian, t-closeness: Pri- 2016, pp. 2436โ2444. vacy beyond k-anonymity and l-diversity, in: 2007 [22] A. Bhowmick, J. Duchi, J. Freudiger, G. Kapoor, IEEE 23rd International Conference on Data Engi- R. Rogers, Protection against reconstruction and neering, IEEE, New York City, 2007, pp. 106โ115. its applications in private federated learning, arXiv [9] C. Dwork, A. Roth, The algorithmic foundations preprint arXiv:1812.00984, 2018. of differential privacy, Foundations and Trends in Theoretical Computer Science 9 (2014) 211โ407. [10] ร. Erlingsson, V. Pihur, A. Korolova, RAPPOR: Ran- domized aggregatable privacy-preserving ordinal response, in: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Se- curity, ACM, New York City, 2014, pp. 1054โ1067. [11] A. D. P. Team, Learning with privacy at scale, 2017. [12] B. Ding, J. Kulkarni, S. Yekhanin, Collecting teleme- try data privately, in: Advances in Neural Infor- mation Processing Systems, ACM, New York City, 2017, pp. 3571โ3580. [13] A. Bittau, ร. Erlingsson, P. Maniatis, I. Mironov, A. Raghunathan, D. Lie, M. Rudominer, U. Kode, 10