=Paper= {{Paper |id=Vol-3163/paper7 |storemode=property |title=Applying the Shuffle Model of Differential Privacy to Vector Aggregation |pdfUrl=https://ceur-ws.org/Vol-3163/BICOD21_paper_4.pdf |volume=Vol-3163 |authors=Mary Scott,Graham Cormode,Carsten Maple |dblpUrl=https://dblp.org/rec/conf/bncod/ScottCM21 }} ==Applying the Shuffle Model of Differential Privacy to Vector Aggregation== https://ceur-ws.org/Vol-3163/BICOD21_paper_4.pdf
Applying the Shuffle Model of Differential Privacy to Vector
Aggregation
Mary Scott1 , Graham Cormode1 and Carsten Maple2
1
    Department of Computer Science, University of Warwick, Coventry, CV4 7AL, UK
2
    WMG, University of Warwick, Coventry, CV4 7AL, UK


                                             Abstract
                                             In this work we introduce a new protocol for vector aggregation in the context of the Shuffle Model, a recent model within
                                             Differential Privacy (DP). It sits between the Centralized Model, which prioritizes the level of accuracy over the secrecy of the
                                             data, and the Local Model, for which an improvement in trust is counteracted by a much higher noise requirement. The
                                             Shuffle Model was developed to provide a good balance between these two models through the addition of a shuffling step,
                                             which unbinds the users from their data whilst maintaining a moderate noise requirement. We provide a single message
                                             protocol for the summation of real vectors in the Shuffle Model, using advanced composition results. Our contribution
                                             provides a mechanism to enable private aggregation and analysis across more sophisticated structures such as matrices and
                                             higher-dimensional tensors, both of which are reliant on the functionality of the vector case.

                                             Keywords
                                             Differential privacy (DP), single-message shuffle model, local randomizer, randomized response, mean squared error (MSE).



1. Introduction                                                                                                              There are many practical applications of the Single-
                                                                                                                          Message Shuffle Model in Federated Learning, where
Differential Privacy (DP) [1] is a strong, mathematical                                                                   multiple users collaboratively solve a Machine Learning
definition of privacy that guarantees a measurable level of                                                               problem, the results of which simultaneously improves
confidentiality for any data subject in the dataset to which                                                              the model for the next round [3]. The updates generated
it is applied. In this way, useful collective information                                                                 by the users after each round are high-dimensional vec-
can be learned about a population, whilst simultaneously                                                                  tors, so this data type will prove useful in applications
protecting the personal information of each data subject.                                                                 such as training a Deep Neural Network to predict the
   In particular, DP guarantees that the impact on any                                                                    next word that a user types [4]. Additionally, aggrega-
particular individual as a result of analysis on a dataset                                                                tion is closely related to Secure Aggregation, which can
is the same, whether or not the individual is included in                                                                 be used to compute the outputs of Machine Learning
the dataset. This guarantee is quantified by a parameter                                                                  problems such as the one above [5].
๐œ€, which represents good privacy if it is small. However,                                                                    Our contribution is a protocol in the Single-Message
finding an algorithm that achieves DP often requires a                                                                    Shuffle Model for the private summation of vector-valued
trade-off between privacy and accuracy, as a smaller ๐œ€                                                                    messages, extending an existing result from Balle et al. [2]
sacrifices accuracy for better privacy, and vice versa. DP                                                                by permitting the ๐‘› users to each submit a vector of real
enables data analyses such as the statistical analysis of                                                                 numbers instead of a scalar. The resulting estimator is
the salaries of a population. This allows useful collec-                                                                  unbiased and has normalized mean squared error (MSE)
tive information to be studied, as long as ๐œ€ is adjusted                                                                  ๐‘‚๐œ€,๐›ฟ (๐‘‘ 8/3 ๐‘›โˆ’5/3 ), where ๐‘‘ is the dimension of each vector.
appropriately to satisfy the definition of DP.                                                                               This vector summation protocol above can be extended
   In this work we focus on protocols in the Single-Message                                                               to produce a similar protocol for the linearization of ma-
Shuffle Model [2], a one-time data collection model where                                                                 trices. It is important to use matrix reduction to en-
each of ๐‘› users is permitted to submit a single message.                                                                  sure that the constituent vectors are linearly indepen-
We have chosen to apply the Single-Message Shuffle                                                                        dent. This problem can be extended further to higher-
Model to the problem of vector aggregation, as there are                                                                  dimensional tensors, which are useful for the representa-
links to Federated Learning and Secure Aggregation.                                                                       tion of multi-dimensional data in Neural Networks.
BICOD21: British International Conference on Databases, March 28,
2022, London, UK
Envelope-Open mary.p.scott@warwick.ac.uk (M. Scott);
                                                                                                                          2. Related Work
g.cormode@warwick.ac.uk (G. Cormode); cm@warwick.ac.uk
(C. Maple)                                                                                                                The earliest attempts at protecting the privacy of users in
Orcid 0000-0003-0799-5840 (M. Scott); 0000-0002-0698-0922                                                                 a dataset focused on simple ways of suppressing or gen-
(G. Cormode); 0000-0002-4715-212X (C. Maple)                                                                              eralising the data. Examples include ๐‘˜-anonymity [6], ๐‘™-
                                       ยฉ 2021 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).                         diversity [7] and ๐‘ก-closeness [8]. However, such attempts
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)




                                                                                                                      1
Mary Scott et al. CEUR Workshop Proceedings                                                                                   1โ€“10



have been shown to be insufficient, as proved by numer-            Multi-Message Shuffle Model, an extension of the Single-
ous examples [9].                                                  Message Shuffle Model that permits each of the ๐‘› users
   This harmful leakage of sensitive information can be            to submit more than one message, using several inde-
easily prevented through the use of DP, as this mathe-             pendent shufflers to securely compute the sum. In this
matically guarantees that the chance of a linkage attack           work, Balle et al. contributed a recursive construction
on an individual in the dataset is almost identical to that        based on the protocol from [2], as well as an alternative
on an individual not in the dataset.                               mechanism which implements a discretized distributed
   Ever since DP was first conceptualized in 2006 by               noise addition technique using the result from Ishai et
Dwork et al. [1], the majority of research in the field            al. [16].
has focused on two opposing models. In the Centralized                Also relevant to our research is the work of Ghazi et
Model, users submit their sensitive personal informa-              al. [18], which explored the related problems of private
tion directly to a trusted central data collector, who adds        frequency estimation and selection in a similar context,
random noise to the raw data to provide DP, before as-             drawing comparisons between the errors achieved in the
sembling and analyzing the aggregated results.                     Single-Message and Multi-Message Shuffle Models. A
   In the Local Model, DP is guaranteed when each user             similar team of authors produced a follow-up paper [19]
applies a local randomizer to add random noise to their            describing a more efficient protocol for private summa-
data before it is submitted. The Local Model differs from          tion in the Single-Message Shuffle Model, using the โ€˜in-
the Centralized Model in that the central entity does not          visibility cloakโ€™ technique to facilitate the addition of
see the usersโ€™ raw data at any point, and therefore does           zero-sum noise without coordination between the users.
not have to be trusted. However, the level of noise re-
quired per user for the same privacy guarantee is much
higher, which limits the usage of Local Differential Pri-          3. Preliminaries
vacy (LDP) to major companies such as Google [10], Ap-
                                                                   We consider randomized mechanisms [9] โ„ณ, โ„› under
ple [11] and Microsoft [12].                                                                                            โƒ— ๐ทโƒ—โ€ฒ
                                                                   domains ๐•, ๐•, and apply them to input datasets ๐ท,
   Neither of these two extensively studied models can
                                                                   to generate (vector-valued) messages ๐‘ฅโƒ—๐‘– , ๐‘ฅโƒ—๐‘–โ€ฒ . We write
provide a good balance between the trust of the central
                                                                   [๐‘˜] = {1, โ€ฆ , ๐‘˜} and โ„• for the set of natural numbers.
entity and the level of noise required to guarantee DP.
Hence, in recent years researchers have tried to create
intermediate models that reap the benefits of both.                3.1. Models of Differential Privacy
   In 2017, Bittau et al. [13] introduced the Encode, Shuf-        The essence of Differential Privacy (DP) is the require-
fle, Analyze (ESA) model, which provides a general frame-          ment that the contribution ๐‘ฅโƒ—๐‘– of a user ๐‘– to a dataset
work for the addition of a shuffling step in a private pro-        ๐ทโƒ— = (๐‘ฅโƒ—1 , โ€ฆ , ๐‘ฅโƒ—๐‘› ) does not have much effect on the out-
tocol. After the data from each user is encoded, it is             come of the mechanism applied to that dataset.
randomly permuted to unbind each user from their data                 In the centralized model of DP, random noise is only in-
before analysis takes place. In 2019, Cheu et al. [14] for-        troduced after the usersโ€™ inputs are gathered by a (trusted)
malized the Shuffle Model as a special case of the ESA                                                  โƒ— โ€ฒ that differs from ๐ทโƒ—
                                                                   aggregator. Consider a dataset ๐ท
model, which connects this additional shuffling step to                                                                  โƒ—   โƒ—
                                                                   only in the contribution of a single user, denoted ๐ท โ‰ƒ ๐ท โ€ฒ .
the Local Model. In the Shuffle Model, the local random-
                                                                   Also let ๐œ€ โ‰ฅ 0 and ๐›ฟ โˆˆ (0, 1). We say that a randomized
izer applies a randomized mechanism on a per-element
                                                                   mechanism โ„ณ โˆถ ๐•๐‘› โ†’ ๐• is (๐œ€, ๐›ฟ)-differentially private
basis, potentially replacing a truthful value with another
                                                                       โƒ— โ‰ƒ๐ท
                                                                   if โˆ€๐ท     โƒ— โ€ฒ , โˆ€๐ธ โІ ๐•:
randomly selected domain element. The role of these
independent reports is to create what we call a privacy                       โƒ— โˆˆ ๐ธ] โ‰ค ๐‘’ ๐œ€ โ‹… Pr[โ„ณ(๐ท
                                                                         Pr[โ„ณ(๐ท)                  โƒ— โ€ฒ ) โˆˆ ๐ธ] + ๐›ฟ [9].
blanket, which masks the outputs which are reported
truthfully.                                                          In this definition, we assume that the trusted aggrega-
   As well as the result on the private summation of scalar-       tor obtains the raw data from all users and introduces
valued messages in the Single-Message Shuffle Model                the necessary perturbations.
that we will be using [2], Balle et al. have published               In the local model of DP, each user ๐‘– independently uses
two more recent works that solve related problems. The             randomness on their input ๐‘ฅโƒ—๐‘– โˆˆ ๐• by using a local ran-
first paper [15] improved the distributed ๐‘›-party summa-           domizer โ„› โˆถ ๐• โ†’ ๐• to obtain a perturbed result โ„›(๐‘ฅโƒ—๐‘– ).
tion protocol from Ishai et al. [16] in the context of the         We say that the local randomizer is (๐œ€, ๐›ฟ)-differentially
Single-Message Shuffle Model to require ๐‘‚(1 + ๐œ‹/ log ๐‘›)                        โƒ— ๐ท
                                                                   private if โˆ€๐ท, โƒ— โ€ฒ , โˆ€๐ธ โІ ๐•:
scalar-valued messages, instead of a logarithmic depen-
                                                                          Pr[โ„›(๐‘ฅโƒ—๐‘– ) โˆˆ ๐ธ] โ‰ค ๐‘’ ๐œ€ โ‹… Pr[โ„›(๐‘ฅโƒ—๐‘–โ€ฒ ) โˆˆ ๐ธ] + ๐›ฟ [2],
dency of ๐‘‚(log ๐‘› + ๐œ‹), to achieve statistical security 2โˆ’๐œ‹ .
The second paper [17] introduced two new protocols for             where ๐‘ฅโƒ—๐‘–โ€ฒ โˆˆ ๐• is some other valid input vector that ๐‘– could
the private summation of scalar-valued messages in the             hold. The Local Model guarantees that any observer will



                                                               2
Mary Scott et al. CEUR Workshop Proceedings                                                                                  1โ€“10



not have access to the raw data from any of the users.                 Algorithm 1: Local Randomizer โ„›๐›พ๐‘ƒ๐ป
                                                                                                        ,๐‘˜,๐‘›
That is, it removes the requirement for trust. The price
is that this requires a higher level of noise per user to               Public Parameters:
achieve the same privacy guarantee.                                      ๐›พ โˆˆ [0, 1], domain size ๐‘˜, and number of
                                                                         parties ๐‘›
                                                                        Input: ๐‘ฅ๐‘– โˆˆ [๐‘˜]
3.2. Single-Message Shuffle Model                                       Output: ๐‘ฆ๐‘– โˆˆ [๐‘˜]
The Single-Message Shuffle Model sits in between the                    Sample ๐‘ โ† Ber (๐›พ )
Centralized and Local Models of DP [2]. Let a protocol                  if ๐‘ = 0 then let ๐‘ฆ๐‘– โ† ๐‘ฅ๐‘–
๐’ซ in the Single-Message Shuffle Model be of the form                    else sample ๐‘ฆ๐‘– โ† Unif ([๐‘˜])
๐’ซ = (โ„›, ๐’œ ), where โ„› โˆถ ๐• โ†’ ๐• is the local randomizer,                   return ๐‘ฆ๐‘–
and ๐’œ โˆถ ๐•๐‘› โ†’ โ„ค is the analyzer of ๐’ซ. Overall, ๐’ซ im-
plements a mechanism ๐’ซ โˆถ ๐•๐‘› โ†’ โ„ค as follows. Each
user ๐‘– independently applies the local randomizer to their            4.1. Basic Randomizer
message ๐‘ฅโƒ—๐‘– to obtain a message ๐‘ฆโƒ—๐‘– = โ„›(๐‘ฅโƒ—๐‘– ). Subsequently,
the messages (โƒ—       ๐‘ฆ1 , โ€ฆ , ๐‘ฆโƒ—๐‘› ) are randomly permuted by a       First, we describe a basic local randomizer applied by each
trusted shuffler ๐’ฎ โˆถ ๐•๐‘› โ†’ ๐•๐‘› . The random permutation                 user ๐‘– to an input ๐‘ฅ๐‘– โˆˆ [๐‘˜]. The output of this protocol is a
๐’ฎ (โƒ—
   ๐‘ฆ1 , โ€ฆ , ๐‘ฆโƒ—๐‘› ) is submitted to an untrusted data collector,        (private) histogram of shuffled messages over the domain
who applies the analyzer ๐’œ to obtain an output for the                [๐‘˜].
mechanism. In summary, the output of ๐’ซ (๐‘ฅโƒ—1 , โ€ฆ , ๐‘ฅโƒ—๐‘› ) is               The Local Randomizer โ„›๐›พ๐‘ƒ๐ป     ,๐‘˜,๐‘› , shown in Algorithm 1,
given by:                                                             applies a generalized randomized response mechanism
                                                                      that returns the true message ๐‘ฅ๐‘– with probability 1 โˆ’
       ๐’œ โˆ˜ ๐’ฎ โˆ˜ โ„› ๐‘› (๐‘ฅ)
                    โƒ— = ๐’œ (๐’ฎ (โ„›(๐‘ฅโƒ—1 ), โ€ฆ , โ„›(๐‘ฅโƒ—๐‘› ))).                 ๐›พ and a uniformly random message with probability ๐›พ.
                                                                      Such a basic randomizer is used by Balle et al. [2] in the
   Note that the data collector observing the shuffled mes-           Single-Message Shuffle Model for scalar-valued messages,
sages ๐’ฎ (โƒ—
         ๐‘ฆ1 , โ€ฆ , ๐‘ฆโƒ—๐‘› ) obtains no information about which            as well as in several other previous works in the Local
user generated each of the messages. Therefore, the pri-              Model [20, 21, 22]. In Section 4.3, we find an appropriate
vacy of ๐’ซ relies on the indistinguishability between the              ๐›พ to optimize the proportion of random messages that are
shuffles ๐’ฎ โˆ˜ โ„› ๐‘› (๐ท) โƒ— and ๐’ฎ โˆ˜ โ„› ๐‘› (๐ทโƒ— โ€ฒ ) for datasets ๐ท
                                                        โƒ— โ‰ƒ๐ท
                                                           โƒ— โ€ฒ.       submitted, and therefore guarantee DP.
The analyzer can represent the shuffled messages as a                    We now describe how the presence of these random
histogram, which counts the number of occurrences of                  messages can form a โ€˜privacy blanketโ€™ to protect against
the possible outputs of ๐•.                                            a difference attack on a particular user. Suppose we apply
                                                                      Algorithm 1 to the messages from all ๐‘› users. Note that
3.3. Measuring Accuracy                                               a subset ๐ต of approximately ๐›พ ๐‘› of these users returned a
                                                                      uniformly random message, while the remaining users
In Section 4 we use the mean squared error to compare                 returned their true message. Following Balle et al. [2],
the overall output of a private summation protocol in the             the analyzer can represent the messages sent by users in
Single-Message Shuffle Model with the original dataset.               ๐ต by a histogram ๐‘Œ1 of uniformly random messages, and
The MSE is used to measure the average squared differ-                can form a histogram ๐‘Œ2 of truthful messages from users
ence in the comparison between a fixed input ๐‘“ (๐ท)  โƒ— to              not in ๐ต. As these subsets are mutually exclusive and
                                               โƒ— In this
the randomized protocol ๐’ซ, and its output ๐’ซ (๐ท).                      collectively exhaustive, the information represented by
context, MSE(๐’ซ , ๐ท)โƒ— = E[(๐’ซ (๐ท)  โƒ— โˆ’ ๐‘“ (๐ท))
                                        โƒ— 2 ], where the              the analyzer is equivalent to the histogram ๐‘Œ = ๐‘Œ1 โˆช ๐‘Œ2 .
expectation is taken over the randomness of ๐’ซ. Note                      Consider two neighbouring datasets, each consisting
             โƒ— = ๐‘“ (๐ท),โƒ— MSE is equivalent to variance,               of ๐‘› messages from ๐‘› users, that differ only on the in-
when E[๐’ซ (๐ท)]
                                                                      put from the ๐‘›th user. To simplify the discussion and
i.e.:
                                                                      subsequent proof, we temporarily omit the action of the
         โƒ— = E[(๐’ซ (๐ท)
 MSE(๐’ซ , ๐ท)        โƒ— โˆ’ E[๐’ซ (๐ท)])
                            โƒ— 2 ] = Var[๐’ซ (๐ท)].
                                           โƒ—                          shuffler. By the post-processing property of DP, this can
                                                                      be reintroduced later on without adversely affecting the
                                                                      privacy guarantees. To achieve DP we need to find an
4. Vector Sum in the Shuffle Model                                    appropriate ๐›พ such that when Algorithm 1 is applied, the
                                                                      change in ๐‘Œ is appropriately bounded. As the knowledge
In this section we introduce our protocol for vector sum-             of either the set ๐ต or the messages from the first ๐‘› โˆ’ 1
mation in the Shuffle Model and tune its parameters to                users does not affect DP, we can assume that the ana-
optimize accuracy.                                                    lyzer knows both of these details. This lets the analyzer
                                                                      remove all of the truthful messages associated with the




                                                                  3
Mary Scott et al. CEUR Workshop Proceedings                                                                                                 1โ€“10



first ๐‘› โˆ’ 1 users from ๐‘Œ.                                                Algorithm 2: Local Randomizer โ„›๐‘‘,๐‘˜,๐‘›,๐‘ก
   If the ๐‘›th user is in ๐ต, this means their submission is
                                                                          Public Parameters: ๐‘˜, ๐‘ก, dimension ๐‘‘, and
independent of their input, so we trivially satisfy DP.
                                                                           number of parties ๐‘›
Otherwise, the (curious) analyzer knows that the ๐‘›th user                                (1)      (๐‘‘)
has submitted their true message ๐‘ฅ๐‘› . The analyzer can                    Input: ๐‘ฅโƒ—๐‘– = (๐‘ฅ๐‘– , โ€ฆ , ๐‘ฅ๐‘– ) โˆˆ [0, 1]๐‘‘
                                                                                                 (๐›ผ )       (๐›ผ )
remove all of the truthful messages associated with the                   Output: ๐‘ฆโƒ—๐‘– = (๐‘ฆ๐‘– 1 , โ€ฆ , ๐‘ฆ๐‘– ๐‘ก ) โˆˆ {0, 1, โ€ฆ , ๐‘˜}๐‘ก
first ๐‘›โˆ’1 users from ๐‘Œ, and obtain ๐‘Œ1 โˆช{๐‘ฅ๐‘› }. The subsequent
                                                                          Sample (๐›ผ1 , โ€ฆ , ๐›ผ๐‘ก ) โ† Unif ([๐‘‘])
privacy analysis will argue that this does not reveal ๐‘ฅ๐‘›                        (๐›ผ )          (๐›ผ )             (๐›ผ )      (๐›ผ )
if ๐›พ is set so that ๐‘Œ1 , the histogram of random messages,                Let ๐‘ฅ๐‘–ฬ„ ๐‘— โ† โŒŠ๐‘ฅ๐‘– ๐‘— ๐‘˜โŒ‹ + Ber (๐‘ฅ๐‘– ๐‘— ๐‘˜ โˆ’ โŒŠ๐‘ฅ๐‘– ๐‘— ๐‘˜โŒ‹)
                                                                                (๐›ผ๐‘— )                   (๐›ผ๐‘— )
appropriately โ€˜hidesโ€™ ๐‘ฅ๐‘› .                                                 โ–ท ๐‘ฅ๐‘–ฬ„      : encoding of ๐‘ฅ๐‘–        with precision ๐‘˜
                                                                                (๐›ผ๐‘— )                                    (๐›ผ๐‘— )
                                                                           โ–ท ๐‘ฆ๐‘–       : apply Algorithm 1 to each ๐‘ฅ๐‘–ฬ„
4.2. Private Summation of Vectors                                                           (๐›ผ )     (๐›ผ )
                                                                          return ๐‘ฆโƒ—๐‘– = (๐‘ฆ๐‘– 1 , โ€ฆ , ๐‘ฆ๐‘– ๐‘ก )
Here, we extend the protocol from Section 4.1 to ad-
dress the problem of computing the sum of ๐‘› real vec-
                                 (1)    (๐‘‘)
tors, each of the form ๐‘ฅโƒ—๐‘– = (๐‘ฅ๐‘– , โ€ฆ , ๐‘ฅ๐‘– ) โˆˆ [0, 1]๐‘‘ , in the           Algorithm 3: Analyzer ๐’œ๐‘‘,๐‘˜,๐‘ก
Single-Message Shuffle Model. Specifically, we analyze                    Public Parameters: ๐‘˜, ๐‘ก, and dimension ๐‘‘
the utility of a protocol ๐’ซ๐‘‘,๐‘˜,๐‘›,๐‘ก = (โ„›๐‘‘,๐‘˜,๐‘›,๐‘ก , ๐’œ๐‘‘,๐‘˜,๐‘ก ) for this        Input: Multiset {โƒ—
                                                                                           ๐‘ฆ๐‘– }๐‘–โˆˆ[๐‘›] , with
purpose, by using the MSE from Section 3.3 as the accu-                                (๐›ผ )          (๐›ผ )
                                                                                (๐‘ฆ๐‘– 1 , โ€ฆ , ๐‘ฆ๐‘– ๐‘ก ) โˆˆ {0, 1, โ€ฆ , ๐‘˜}๐‘ก
racy measure. In the scalar case, each user applies the
                                                                          Output: ๐‘งโƒ— = (๐‘ง (1) , โ€ฆ , ๐‘ง (๐‘‘) ) โˆˆ [0, 1]๐‘‘
protocol to their entire input [2]. Moving to the vector
case, we allow each user to independently sample a set                           (๐‘™)      (๐›ผ )
                                                                          Let ๐‘ฆ๐‘–      โ† ๐‘ฆ๐‘– ๐‘—
of 1 โ‰ค ๐‘ก โ‰ค ๐‘‘ coordinates from their vector to report. Our                       (๐›ผ๐‘— )                                (๐‘™)
analysis allows us to optimize the parameter ๐‘ก.                             โ–ท ๐‘ฆ๐‘–      : submission corresponding to ๐‘ฅ๐‘–
   Hence, the first step of the Local Randomizer โ„›๐‘‘,๐‘˜,๐‘›,๐‘ก ,                                                        (1)          (๐‘‘)
presented in Algorithm 2, is to uniformly sample ๐‘ก co-                    Let (๐‘ง ฬ‚(1) , โ€ฆ , ๐‘ง ฬ‚(๐‘‘) ) โ† ( 1๐‘˜ โˆ‘๐‘– ๐‘ฆ๐‘– , โ€ฆ , 1๐‘˜ โˆ‘๐‘– ๐‘ฆ๐‘– )
ordinates (๐›ผ1 , โ€ฆ , ๐›ผ๐‘ก ) โˆˆ [๐‘‘] (without replacement) from                 Let
each vector ๐‘ฅโƒ—๐‘– . To compute a differentially private ap-                  (๐‘ง (1) , โ€ฆ , ๐‘ง (๐‘‘) ) โ† (DeBias (๐‘ง ฬ‚(1) ), โ€ฆ ,DeBias (๐‘ง ฬ‚(๐‘‘) ))
                                                                                            (๐‘™)
                                                                           โ–ท DeBias (๐‘ง ฬ‚ ) = (๐‘ง ฬ‚
                                                                                                      (๐‘™) โˆ’ ๐›พ โ‹… |๐‘ฆ (๐‘™) |)/(1 โˆ’ ๐›พ )
proximation of โˆ‘๐‘– ๐‘ฅโƒ—๐‘– , we fix a quantization level ๐‘˜. Then                                                  2    ๐‘–
                                (๐›ผ )              (๐›ผ )
we randomly round each ๐‘ฅ๐‘– ๐‘— to obtain ๐‘ฅ๐‘–ฬ„ ๐‘— as either                     return ๐‘งโƒ— = (๐‘ง (1) , โ€ฆ , ๐‘ง (๐‘‘) )
  (๐›ผ )         (๐›ผ )
โŒŠ๐‘ฅ๐‘– ๐‘— ๐‘˜โŒ‹ or โŒˆ๐‘ฅ๐‘– ๐‘— ๐‘˜โŒ‰. Next, we apply the randomized re-
                                                     (๐›ผ )
sponse mechanism from Algorithm 1 to each ๐‘ฅ๐‘–ฬ„ ๐‘— , which                  4.3. Privacy Analysis of Algorithms
                      (๐›ผ )                                    (๐›ผ )
sets each output ๐‘ฆ๐‘– ๐‘— independently to be equal to ๐‘ฅ๐‘–ฬ„ ๐‘—
                                                            In this section, we will find an appropriate ๐›พ that en-
with probability 1 โˆ’ ๐›พ, or a random value in {0, 1, โ€ฆ , ๐‘˜}
                                    (๐›ผ๐‘— )                   sures that the mechanism described in Algorithms 2 and
with probability ๐›พ. Each ๐‘ฆ๐‘– will contribute to a his- 3 satisfies (๐œ€, ๐›ฟ)-DP for vector-valued messages in the
                          (๐›ผ )           (๐›ผ )               Single-Message Shuffle Model. To achieve this, we prove
togram of the form (๐‘ฆ1 ๐‘— , โ€ฆ , ๐‘ฆ๐‘› ๐‘— ) as in Section 4.1.
    The Analyzer ๐’œ๐‘‘,๐‘˜,๐‘ก , shown in Algorithm 3, aggregates the following theorem, where we initially assume ๐œ€ < 1
the histograms to approximate โˆ‘๐‘– ๐‘ฅโƒ—๐‘– by post-processing to simplify our computations. At the end of this section,
the vectors coordinate-wise. More precisely, the analyzer we discuss how to cover the additional case 1 โ‰ค ๐œ€ < 6 to
                     (๐›ผ )      (๐‘™)                          suit our experimental study.
sets each output ๐‘ฆ๐‘– ๐‘— to ๐‘ฆ๐‘– , where the new label ๐‘™ is from
                               (๐‘™)
its corresponding input ๐‘ฅ๐‘– of the original ๐‘‘-dimensional Theorem 4.1. The shuffled mechanism โ„ณ = ๐’ฎ โˆ˜ โ„›๐‘‘,๐‘˜,๐‘›,๐‘ก
                                 (๐‘™)                        is (๐œ€, ๐›ฟ)-DP for any ๐‘‘, ๐‘˜, ๐‘› โˆˆ โ„•, {๐‘ก โˆˆ โ„• | ๐‘ก โˆˆ [๐‘‘]}, ๐œ€ < 6 and
vector ๐‘ฅโƒ—๐‘– . For all inputs ๐‘ฅ๐‘– that were not sampled, we
      (๐‘™)                                                   ๐›ฟ โˆˆ (0, 1] such that:
set ๐‘ฆ๐‘– = 0. Subsequently, the analyzer aggregates the
                                                                          56๐‘‘๐‘˜ log(1/๐›ฟ) log(2๐‘ก/๐›ฟ)
sets of outputs from all users corresponding to each of                                           ,     when ๐œ€ < 1
                                                                                  (๐‘›โˆ’1)๐œ€ 2
those ๐‘™ coordinates in turn, so that a ๐‘‘-dimensional vector        ๐›พ  = { 2016๐‘‘๐‘˜ log(1/๐›ฟ) log(2๐‘ก/๐›ฟ)
is formed. Finally, a standard debiasing step is applied to                        (๐‘›โˆ’1)๐œ€ 2
                                                                                                    , when 1 โ‰ค ๐œ€ < 6.
this vector to remove the scaling and rounding applied to
                                                                           โƒ—                         โƒ—โ€ฒ                โ€ฒ
each submission. DeBias returns an unbiased estimator, Proof. Let ๐ท = (๐‘ฅโƒ—1 , โ€ฆ , ๐‘ฅโƒ—๐‘› ) and ๐ท = (๐‘ฅโƒ—1 , โ€ฆ , ๐‘ฅโƒ—๐‘› ) be the
๐‘งโƒ—, which calculates an estimate of the true sum of the two neighbouring
                                                                   th
                                                                                     datasets differing only in the input of
vectors by subtracting the expected uniform noise from the ๐‘› user, as used in Section 4.1. (1)Here each            (๐‘‘)
                                                                                                                          vector-
the randomized sum of the vectors.                          valued    message    ๐‘ฅโƒ—๐‘–  is of the form   (๐‘ฅ๐‘– , โ€ฆ , ๐‘ฅ ๐‘–   ).  Recall
                                                            from Section 4.1 that we assume that the analyzer can



                                                                     4
Mary Scott et al. CEUR Workshop Proceedings                                                                                                       1โ€“10



see the users in ๐ต (i.e., the subset of users that returned a              sufficient to derive (1) from:
uniformly random message), as well as the inputs from
                                                                                                                 (๐›ผ )
the first ๐‘› โˆ’ 1 users.                                                                                              โƒ— = V๐›ผ ]
                                                                                                         Pr[Viewโ„ณ๐‘— (๐ท)    ๐‘—                  โ€ฒ
   We now introduce the vector view VViewโ„ณ (๐ท)       โƒ— as the                Pr            (๐›ผ๐‘— )
                                                                                                 โƒ—
                                                                                                     [                                โ‰ฅ ๐‘’๐œ€ ] โ‰ค ๐›ฟ โ€ฒ,
                                                                                  V๐›ผ๐‘— โˆผViewโ„ณ (๐ท)                (๐›ผ ) โ€ฒ
collection of information that the analyzer is able to see                                               Pr[Viewโ„ณ๐‘— (๐ทโƒ— ) = V๐›ผ ]
                                                                                                                             ๐‘—
after the mechanism โ„ณ is applied to all vector-valued                                                                                               (2)
messages in the dataset ๐ท.       โƒ— VViewโ„ณ (๐ท)โƒ— is defined as                     ฬƒ = โ‹ƒ ๐›ผ V๐›ผ , ๐œ€ โ€ฒ =
                                                                           where V                                   ๐œ€
                                                                                                                            and ๐›ฟ โ€ฒ = ๐›ฟ๐‘ก .
                                                                                        ๐‘—  ๐‘—                  2โˆš2๐‘ก log(1/๐›ฟ)
             โƒ— โƒ—    โƒ—             โƒ—
the tuple (๐‘Œ , ๐ทโˆฉ , ๐‘), where ๐‘Œ is the multiset containing
the outputs {โƒ— ๐‘ฆ1 , โ€ฆ , ๐‘ฆโƒ—๐‘› } of the mechanism โ„ณ(๐ท),โƒ— ๐ท   โƒ— โˆฉ is           Lemma 4.4. Condition (2) implies condition (1).
the vector containing the inputs (๐‘ฅโƒ—1 , โ€ฆ , ๐‘ฅโƒ—๐‘›โˆ’1 ) from the
                                                                           Proof. We can express VViewโ„ณ (๐ท)  โƒ— as the composition
first ๐‘› โˆ’ 1 users, and โƒ—๐‘ contains binary vectors (โƒ—๐‘1 , โ€ฆ , โƒ—๐‘๐‘› )                                   (๐›ผ1 )       (๐›ผ )
which indicate for which coordinates each user reports                     of the ๐‘ก scalar views Viewโ„ณ , โ€ฆ , Viewโ„ณ๐‘ก , as:
truthful information. This vector view can be projected to
                                                                                      โƒ— = V]
                                                                           Pr[VViewโ„ณ (๐ท)  ฬƒ
๐‘ก overlapping scalar views by applying Algorithm 2 only
to the ๐‘— th uniformly sampled coordinate ๐›ผ๐‘— โˆˆ [๐‘‘] from                                        (๐›ผ )
                                                                                           โƒ— = V๐›ผ โˆง โ‹ฏ โˆง View ๐‘ก (๐ท)
                                                                              = Pr[Viewโ„ณ1 (๐ท)                   โƒ— = V๐›ผ ]       (๐›ผ )
                                                         (๐›ผ )                                    1          โ„ณ         ๐‘ก
each user, where ๐‘— โˆˆ [๐‘ก]. The ๐‘— th scalar view Viewโ„ณ๐‘— (๐ท)      โƒ—                       (๐›ผ ) โƒ—                      (๐›ผ ) โƒ—
                                                    (๐›ผ  )                     = Pr[Viewโ„ณ1 (๐ท) = V๐›ผ1 ] โ‹… โ‹ฏ โ‹… Pr[Viewโ„ณ๐‘ก (๐ท) = V๐›ผ๐‘ก ].
                                            (๐›ผ
            โƒ— is defined as the tuple (๐‘Œโƒ— , ๐ท
of VViewโ„ณ (๐ท)                                 ๐‘— )     ๐‘—   โƒ—
                                                  โƒ— โˆฉ , ๐‘ ) ),
                                                            (๐›ผ๐‘—

where:                                                          Our desired result is immediate by applying Corol-
                                                             lary 4.3, which states that the use of ๐‘ก overlapping (๐œ€ โ€ฒ , ๐›ฟ โ€ฒ )-
               ๐‘Œโƒ— (๐›ผ๐‘— ) = โ„ณ(๐ท  โƒ— (๐›ผ๐‘— ) ) = {๐‘ฆ1(๐›ผ๐‘— ) , โ€ฆ , ๐‘ฆ๐‘›(๐›ผ๐‘— ) },
                                                             DP mechanisms, when taken together, is (๐œ€, ๐›ฟ)-DP. This
                                                             applies in our setting, since we have assumed that
               โƒ— โˆฉ(๐›ผ๐‘— ) = (๐‘ฅ1(๐›ผ๐‘— ) , โ€ฆ , ๐‘ฅ๐‘›โˆ’1
               ๐ท
                                           (๐›ผ๐‘— )
                                                 )                       โƒ— satisfies the requirements of (๐œ€, ๐›ฟ)-DP, and
                                                             VViewโ„ณ (๐ท)
                            (๐›ผ๐‘— )         (๐›ผ๐‘— )              that each  of the ๐‘ก overlapping scalar views is formed iden-
        and     โƒ—๐‘ ๐‘— = (๐‘ , โ€ฆ , ๐‘๐‘› )
                  (๐›ผ  )
                            1
                                                             tically but for a different uniformly sampled coordinate
                                    โƒ— โˆฉ and โƒ—๐‘, but contain- of the vector-valued messages.
                                  โƒ— ๐ท
are the analogous definitions of ๐‘Œ,
ing only the information referring to the ๐‘— th uniformly        To complete the proof of Theorem 4.1 for ๐œ€ < 1, it
sampled coordinate of each vector-valued message.            remains to show that for a uniformly sampled coordinate
  The following advanced composition results will be                         (๐›ผ )
                                                             ๐›ผ๐‘— โˆˆ [๐‘‘], Viewโ„ณ๐‘— (๐ท) โƒ— satisfies (๐œ€ โ€ฒ , ๐›ฟ โ€ฒ )-DP.
used in our setting to get a tight upper bound:
Theorem 4.2 (Dwork et al. [9]). For all ๐œ€ โ€ฒ , ๐›ฟ โ€ฒ , ๐›ฟ โ‰ฅ 0, Lemma 4.5. Condition (2) holds.
the class of (๐œ€ โ€ฒ , ๐›ฟ โ€ฒ )-differentially private mechanisms satis-
                                                                   Proof. See Appendix A.
fies (๐œ€, ๐‘Ÿ๐›ฟ โ€ฒ + ๐›ฟ)-differential privacy under ๐‘Ÿ-fold adaptive
composition for:                                                      We now show that the above proof can be adjusted to
                                                 โ€ฒ
                                                                   cover  the additional case 1 โ‰ค ๐œ€ < 6. This will be sufficient
               ๐œ€ = โˆš2๐‘Ÿ log(1/๐›ฟ)๐œ€ โ€ฒ + ๐‘Ÿ๐œ€ โ€ฒ (๐‘’ ๐œ€ โˆ’ 1).               to complete the proof of our main Theorem 4.1.
                                                                      First, we scale the setting of ๐œ€ โ€ฒ by a multiple of 6 in
Corollary 4.3. Given target privacy parameters 0 < ๐œ€ < 1 Corollary 4.3 so that the advanced composition property
and ๐›ฟ > 0, to ensure (๐œ€, ๐‘Ÿ๐›ฟ โ€ฒ +๐›ฟ) cumulative privacy loss over holds for all 1 โ‰ค ๐œ€ < 6. We now insert ๐œ€ โ€ฒ =              ๐œ€
๐‘Ÿ mechanisms, it suffices that each mechanism is (๐œ€ โ€ฒ , ๐›ฟ โ€ฒ )-DP,                                                 12โˆš2๐‘Ÿ log(1/๐›ฟ)
                                                                   into the proof of Theorem 4.1, resulting in a change of
where:
                                       ๐œ€                           constant from 56 to 2016.
                          ๐œ€โ€ฒ =                 .
                                2โˆš2๐‘Ÿ log(1/๐›ฟ)
                     โƒ— satisfies (๐œ€, ๐›ฟ)-DP it suffices to
                                                                           4.4. Accuracy Bounds for Shuffled Vector
To show that VViewโ„ณ (๐ท)
prove that:                                                                     Sum
                                                                           We now formulate an upper bound for the MSE of our
                   Pr[VViewโ„ณ (๐ท)   ฬƒ
                              โƒ— = V]                                       protocol, and then identify the value(s) of ๐‘ก that minimize
  PrVโˆผVView
    ฬƒ          โƒ— [                     โ‰ฅ ๐‘’ ๐œ€ ] โ‰ค ๐›ฟ. (1)
            โ„ณ (๐ท)             โƒ— โ€ฒ   ฬƒ                                      this upper bound.
                   Pr[VViewโ„ณ (๐ท ) = V]
                                                                                                                                      (๐›ผ )       (๐›ผ )
                                                                First, note that encoding the coordinate ๐‘ฅ๐‘– ๐‘— as ๐‘ฅ๐‘–ฬ„ ๐‘— =
By considering this vector view as a union of overlap-          (๐›ผ๐‘— )          (๐›ผ๐‘— )      (๐›ผ๐‘— )
ping scalar views, and letting ๐‘Ÿ = ๐‘ก in Corollary 4.3, it is โŒŠ๐‘ฅ๐‘– ๐‘˜โŒ‹ + Ber (๐‘ฅ๐‘– ๐‘˜ โˆ’ โŒŠ๐‘ฅ๐‘– ๐‘˜โŒ‹) in Algorithm 2 ensures
                                                                       (๐›ผ )          (๐›ผ )
                                                             that ๐”ผ[๐‘ฅ๐‘–ฬ„ ๐‘— /๐‘˜] = ๐”ผ[๐‘ฅ๐‘– ๐‘— ]. This means that our protocol
                                                             is unbiased. For any unbiased random variable ๐‘‹ with



                                                                       5
Mary Scott et al. CEUR Workshop Proceedings                                                                                                    1โ€“10



๐‘Ž < ๐‘‹ < ๐‘ then Var[๐‘‹ ] โ‰ค (๐‘ โˆ’ ๐‘Ž)2 /4, and so the MSE                     To obtain the error between the estimated average
per coordinate due to the fixed-point approximation of                 vector and the true average vector, we simply take the
the true vector in โ„›๐‘‘,๐‘˜,๐‘›,๐‘ก is at most 4๐‘˜1 2 . Meanwhile, the          square root of the result obtained in Theorem 4.6.
MSE when โ„›๐‘‘,๐‘˜,๐‘›,๐‘ก submits a random vector is at most 12                Corollary 4.7. For every statistical query ๐‘ž โˆถ ๐’ณ โ†ฆ
per coordinate.                                                        [0, 1]๐‘‘ , ๐‘‘, ๐‘› โˆˆ โ„•, {๐‘ก โˆˆ โ„• | ๐‘ก โˆˆ [๐‘‘]}, ๐œ€ < 6 and ๐›ฟ โˆˆ (0, 1],
   We now use the unbiasedness of our protocol to obtain               there is an (๐œ€, ๐›ฟ)-DP ๐‘›-party unbiased protocol for estimat-
a result for estimating the squared error between the                  ing ๐‘›๐‘‘ โˆ‘๐‘– ๐‘ž(๐‘ฅโƒ—๐‘– ) in the Single-Message Shuffle Model with
estimated average vector and the true average vector.                  standard deviation
When calculating the MSE, each coordinate location is                                             (2๐‘ก)1/2 ๐‘‘ 4/3 (14 log(1/๐›ฟ) log(2๐‘ก/๐›ฟ))1/3
used with expectation ๐‘›/๐‘‘. Therefore, we define the                                           โŽง                                            ,
                                                                                                                 (1โˆ’๐›พ )๐‘›5/6 ๐œ€ 2/3
                      ฬ‚ as the normalization of the                                           โŽช
                                                                                              โŽช
normalized MSE, or MSE,                                                                            when ๐œ€ < 1
MSE by a factor of (๐‘›/๐‘‘)2 .                                                  ๐œŽ(๐’ซ
                                                                              ฬ‚ ๐‘‘,๐‘˜,๐‘›,๐‘ก ) =         1/2 4/3                   1/3
                                                                                              โŽจ (8๐‘ก) ๐‘‘ (63 log(1/๐›ฟ) log(2๐‘ก/๐›ฟ)) ,
                                                                                              โŽช
                                                                                              โŽช             (1โˆ’๐›พ )๐‘› 5/6 ๐œ€ 2/3
Theorem 4.6. For any ๐‘‘, ๐‘› โˆˆ โ„•, {๐‘ก โˆˆ โ„• | ๐‘ก โˆˆ [๐‘‘]}, ๐œ€ < 6
and ๐›ฟ โˆˆ (0, 1], there exists a parameter ๐‘˜ such that ๐’ซ๐‘‘,๐‘˜,๐‘›,๐‘ก                                 โŽฉ when 1 โ‰ค ๐œ€ < 6,
is (๐œ€, ๐›ฟ)-DP and                                                       where ๐œŽฬ‚ denotes the error between the estimated average
                         2๐‘ก๐‘‘ 8/3 (14 log(1/๐›ฟ) log(2๐‘ก/๐›ฟ))2/3            vector and the true average vector.
                       โŽง                                    ,
                                   (1โˆ’๐›พ )2 ๐‘›5/3 ๐œ€ 4/3            To summarize, we have produced an unbiased pro-
                       โŽช
                       โŽช when ๐œ€ < 1
     ฬ‚ ๐‘‘,๐‘˜,๐‘›,๐‘ก ) =
     MSE(๐’ซ                                                    tocol for the computation of the sum of ๐‘› real vectors
                            8/3                     2/3
                       โŽจ 8๐‘ก๐‘‘ (63 log(1/๐›ฟ) log(2๐‘ก/๐›ฟ)) ,        in the Single-Message Shuffle Model with normalized
                       โŽช        (1โˆ’๐›พ ) 2 ๐‘›5/3 ๐œ€ 4/3
                       โŽช                                      MSE ๐‘‚๐œ€,๐›ฟ (๐‘‘ 8/3 ๐‘ก๐‘›โˆ’5/3 ), using advanced composition re-
                       โŽฉ when 1 โ‰ค ๐œ€ < 6,                      sults from Dwork et al. [9]. Minimizing this bound as a
       ฬ‚ denotes the squared error between the estimated function of ๐‘ก leads us to choose ๐‘ก = 1, but any choice of ๐‘ก
where MSE
average vector and the true average vector.                   that is small and not dependent on ๐‘‘ produces a bound of
                                                              the same order. In our experimental study, we determine
                          ๐‘‘
Proof. We consider the โˆ‘๐‘™=1 DeBias (๐‘ง ฬ‚(๐‘™) ) of ๐’ซ๐‘‘,๐‘˜,๐‘›,๐‘ก com- that the best choice of ๐‘ก in practice is indeed ๐‘ก = 1.
                                              ๐‘ก      ๐‘›      (๐›ผ )
pared to the corresponding input โˆ‘๐‘—=1 โˆ‘๐‘–=1 ๐‘ฅ๐‘– ๐‘— over
            โƒ— We use the bounds on the variance of the                 4.5. Improved bounds for t=1
the dataset ๐ท.
randomized response mechanism from Theorem 4.6 to           We observe that in the optimal case in which ๐‘ก = 1, we
give us an upper bound for this comparison.                 can tighten the bounds further, as we do not need to
                                                          2 invoke the advanced composition results when each user
                         ๐‘‘                  ๐‘ก ๐‘›
                                     (๐‘™)            (๐›ผ๐‘— )   samples only a single coordinate. This changes the value
MSE(๐’ซ๐‘‘,๐‘˜,๐‘›,๐‘ก ) = sup E[(โˆ‘ DeBias (๐‘ง ฬ‚ ) โˆ’ โˆ‘ โˆ‘ ๐‘ฅ๐‘– ) ]
                   โƒ—
                   ๐ท    ๐‘™=1               ๐‘—=1 ๐‘–=1           of ๐›พ by a factor of ๐‘‚(log(1/๐›ฟ)), which propagates through
                                                  2
                                                            to the expression for the MSE. That is, we can more
                  ๐‘ก ๐‘›
                              (๐›ผ๐‘— )        (๐›ผ๐‘— )            simply set ๐œ€ โ€ฒ = ๐œ€ and ๐›ฟ โ€ฒ = ๐›ฟ in the proof of Theorem 4.1.
    = sup E[(โˆ‘ โˆ‘(DeBias (๐‘ฆ๐‘– /๐‘˜) โˆ’ ๐‘ฅ๐‘– )) ]                   When ๐œ€ < 1, the computation is straightforward, with
       โƒ—
       ๐ท        ๐‘—=1 ๐‘–=1
              ๐‘ก ๐‘›                                           ๐‘ โ‰ฅ ๐œ€14โ€ฒ2 log(2๐‘ก/๐›ฟ) being chosen as before. However, when
                                     (๐›ผ )          (๐›ผ ) 2
     = sup โˆ‘ โˆ‘ E[(DeBias (๐‘ฆ๐‘– ๐‘— /๐‘˜) โˆ’ ๐‘ฅ๐‘– ๐‘— ) ]                          1 โ‰ค ๐œ€ < 6, a tighter ๐‘ โ‰ฅ ๐œ€80โ€ฒ2 log(2๐‘ก/๐›ฟ) must be selected, as
         โƒ— ๐‘—=1 ๐‘–=1
         ๐ท                                                             the condition ๐œ€ โ€ฒ < 1 no longer holds.
            ๐‘ก ๐‘›
                                       (๐›ผ )                              Using ๐œ€ โ€ฒ < 6, we have:
     = sup โˆ‘ โˆ‘ Var[DeBias (๐‘ฆ๐‘– ๐‘— /๐‘˜)]
         โƒ— ๐‘—=1 ๐‘–=1
         ๐ท                                                                                                                2            ๐œ€โ€ฒ
                                                                         (1 โˆ’ exp (โˆ’๐œ€ โ€ฒ /2)) โ‰ฅ (1 โˆ’ exp (โˆ’                   )) ๐œ€ โ€ฒ โ‰ฅ      .
            ๐‘ก๐‘›             (๐›ผ )         ๐‘ก๐‘›      1โˆ’๐›พ ๐›พ                                                                   3โˆš15          2โˆš10
     =            sup Var[๐‘ฆ1 1 /๐‘˜] โ‰ค          (     + )
         (1 โˆ’ ๐›พ )2 (๐›ผ1)              (1 โˆ’ ๐›พ )2 4๐‘˜ 2  2 Thus, we have:
                  ๐‘ฅ1

          ๐‘ก๐‘›       1  ๐ด๐œ€ ๐‘‘๐‘˜ log(1/๐›ฟ) log(2๐‘ก/๐›ฟ)                               N๐œƒ      โ€ฒ           ๐‘                     ๐‘ ๐œ€โ€ฒ 2
     โ‰ค           ( 2+                          ),                      Pr[      โ‰ฅ ๐‘’ ๐œ€ ] โ‰ค exp ( โˆ’ (๐œ€ โ€ฒ /2)2 ) + exp ( โˆ’ (     ) )
               2
       (1 โˆ’ ๐›พ ) 4๐‘˜            (๐‘› โˆ’ 1)๐œ€ 2                                     N๐œ™                  3                     2 2โˆš10
where ๐ด๐œ€ = 28 when ๐œ€ < 1, and ๐ด๐œ€ = 1008 when 1 โ‰ค ๐œ€ <                                                       80 ๐œ€ โ€ฒ2
                                                                                        โ‰ค 2 exp (โˆ’                 log(2๐‘ก/๐›ฟ)) โ‰ค ๐›ฟ/๐‘ก,
6. In other words, ๐ด๐œ€ is equal to half the constant term                                                  2๐œ€ โ€ฒ2 40
in the expression of ๐›พ stated in Theorem 4.1. The choice               which yields:
            (๐‘›โˆ’1)๐œ€ 2
๐‘˜ = 4๐ด ๐‘‘ log(1/๐›ฟ) log(2๐‘ก/๐›ฟ) minimizes the bracketed sum                                14๐‘‘๐‘˜ log(2/๐›ฟ)
                                                                                               27๐‘‘๐‘˜
        ๐œ€                                                                    max{ (๐‘›โˆ’1)๐œ€ 2 , (๐‘›โˆ’1)๐œ€  },                    when ๐œ€ < 1
above and the bounds in the statement of the theorem                     ๐›พ ={    80๐‘‘๐‘˜ log(2/๐›ฟ)  36๐‘‘๐‘˜
follow.                                                                      max{ (๐‘›โˆ’1)๐œ€ 2 , 11(๐‘›โˆ’1)๐œ€ },                   when 1 โ‰ค ๐œ€ < 6.



                                                                   6
Mary Scott et al. CEUR Workshop Proceedings                                                                                                1โ€“10




        (a) Experimental error by number of coordinates ๐‘ก re-                  (b) Experimental error by number of buckets ๐‘˜ used
            tained

                                                                                                         ฬ‚ for the ECG
Figure 1: Bar charts confirming that the choices ๐‘ก = 1 (a) and ๐‘˜ = 3 (b) minimize the total experimental MSE
Heartbeat Categorization Dataset.



   Note that the above expression for ๐›พ in the case ๐œ€ < 1                           //www.kaggle.com/shayanfazeli/heartbeat. We analyse
coincides with the result obtained by Balle et al. in the                           the effect of changing one key parameter at a time, whilst
scalar case [2]. Putting this expression for ๐›พ in the proof                         the others remain the same. Our default settings are
of Theorem 4.6, with the choice                                                     vector dimension ๐‘‘ = 100, rounding parameter ๐‘˜ = 3,
                                                                                    number of users ๐‘› = 50000, number of coordinates to
                               1/3
                   ๐‘›๐œ€ 2                    ๐‘›๐œ€ 1/3                                   sample ๐‘ก = 1, and differential privacy parameters ๐œ€ = 0.95
      โŽงmin{(                 )       , (        )     },       when     ๐œ€ <  1
              28๐‘‘ log(2/๐›ฟ)                54๐‘‘                                       and ๐›ฟ = 0.5. The ranges of all parameters have been
๐‘˜=                              1/3
      โŽจ             ๐‘›๐œ€ 2                    11๐‘›๐œ€ 1/3                                adjusted to best display the dependencies, whilst simul-
      โŽฉ min{( 160๐‘‘ log(2/๐›ฟ)
                              )        ,  (  72๐‘‘
                                                  )      },    when     1 โ‰ค  ๐œ€ < 6,
                                                                                    taneously ensuring that the parameter ๐›พ of the random-
causes the upper bound on the normalized MSE to reduce ized response mechanism is always within its permitted
to:                                                                                 range of [0, 1]. The Python code is available at https:
                                                                                    //github.com/mary-python/dft/blob/master/shuffle.
                    981/3 ๐‘‘ 8/3 log2/3 (2/๐›ฟ)              18๐‘‘ 8/3                      We first confirm that the choice of ๐‘ก = 1 is optimal,
           โŽงmax{ (1โˆ’๐›พ )2 ๐‘›5/3 ๐œ€ 4/3 , (1โˆ’๐›พ )2 ๐‘›5/3 (4๐œ€)2/3 },
           โŽช                                                                        as  predicted by the results of Section 4.5. Indeed, Fig. 1
           โŽช       when ๐œ€ < 1
   ฬ‚ =
  MSE                                                                               (a) shows that the total experimental MSE   ฬ‚ for the ECG
           โŽจmax{ 2๐‘‘ 8/3 (20 log(2/๐›ฟ))2/3 ,             2(92/3 )๐‘‘ 8/3
                                                                          },
           โŽช                   2  5/3
                         (1โˆ’๐›พ ) ๐‘› ๐œ€      4/3              2  5/3
                                                   (1โˆ’๐›พ ) ๐‘› (11๐œ€)     2/3           Heartbeat   Categorization   Dataset  is significantly smaller
           โŽช                                                                        when ๐‘ก = 1, compared to any other small value of ๐‘ก.
           โŽฉ       when      1  โ‰ค  ๐œ€    <   6.
                                                                                       Similarly, Fig. 1 (b) suggests that the total experimental
   By updating Corollary 4.7 in the same way, we can MSE                             ฬ‚ is lowest when ๐‘˜ = 3, which is sufficiently close to
conclude that for the optimal choice ๐‘ก = 1, the normal- the choice of ๐‘˜ selected in the proof of Theorem 4.6, with
ized standard deviation of our unbiased protocol can be all other default parameter values substituted in. Observe
further tightened to:                                                               that the absolute value of the observed MSE is below 0.3
                                                                                    in this case, meaning that the vector is reconstructed to a
                981/6 ๐‘‘ 4/3 log1/3 (2/๐›ฟ)            181/2 ๐‘‘ 4/3                     high degree of accuracy, sufficient for many applications.
         โŽง max{                               ,                    },
                     (1โˆ’๐›พ )๐‘›5/6 ๐œ€ 2/3           (1โˆ’๐›พ )๐‘›5/6 (4๐œ€)1/3
         โŽช
         โŽช                                                                             Next, we verify the bounds of ๐‘‘ 8/3 and ๐‘›โˆ’5/3 from
                when ๐œ€ < 1                                                          Theorem    4.6. Fig. 2 (a) is plotted with a best fit curve
    ๐œŽฬ‚ =
         โŽจmax{ 21/2 ๐‘‘ 4/3 (20 log(2/๐›ฟ))1/3 , 21/2 91/3 ๐‘‘ 4/3 },                     with  equation  a multiple of ๐‘‘ 8/3 , exactly as desired. Un-
         โŽช              (1โˆ’๐›พ )๐‘›5/6 ๐œ€ 2/3           (1โˆ’๐›พ )๐‘›5/6 (11๐œ€)1/3
         โŽช                                                                          surprisingly, the MSE increases as ๐‘‘ goes up according
         โŽฉ      when 1 โ‰ค ๐œ€ < 6.
                                                                                    to this superlinear dependence. Meanwhile, Fig. 2 (b)
                                                                                    fits a curve dependent on ๐‘›โˆ’7/6 , sufficiently close to the
                                                                                    required result. We see the benefit of increasing ๐‘›: as ๐‘›
5. Experimental Evaluation                                                          increases by a factor of 10 across the plot, the error de-
                                                                                    creases by more than two orders of magnitude. In Fig. 3,
In this section we present and compare the bounds gen- we verify the dependency ๐œ€ โˆ’4/3 in the two ranges ๐œ€ < 1
erated by applying Algorithms 2 and 3 to an ECG Heart- and 1 โ‰ค ๐œ€ < 6. The behavior for ๐œ€ < 1 is quite smooth,
beat Categorization Dataset in Python, available at https:



                                                                        7
Mary Scott et al. CEUR Workshop Proceedings                                                                                  1โ€“10




            (a) Experimental error by vector dimension ๐‘‘              (b) Experimental error by number of vectors ๐‘› used

Figure 2: Bar charts with best fit curves confirming the dependencies ๐‘‘ 8/3 (a) and ๐‘›โˆ’5/3 (b) from Theorem 4.6.




           (a) Experimental error by value of ๐œ€ where ๐œ€ < 1           (b) Experimental error by value of ๐œ€ where 1 โ‰ค ๐œ€ < 6

Figure 3: Bar charts with best fit curves confirming the dependency ๐œ€ โˆ’4/3 from Theorem 4.6 in the two ranges ๐œ€ < 1 (a) and
1 โ‰ค ๐œ€ < 6 (b).



but becomes more variable for larger ๐œ€ values.                      addition of a new dimension ๐‘‘ introduces a new depen-
   In conclusion, these experiments confirm that picking            dency for the bound, as well as the possibility of sampling
๐‘ก = 1 and ๐‘˜ = 3 serves to minimize the error. The lines             ๐‘ก coordinates from each ๐‘‘-dimensional vector. For this
of best fit confirm the dependencies on the other param-            extension, we formally defined the vector view as the
eters from Section 4 for ๐‘‘, ๐œ€ and ๐‘›, by implementing and            knowledge of the analyzer upon receiving the random-
applying Algorithms 2 and 3 to an ECG Heartbeat Catego-             ized vectors, and expressed it as a union of overlapping
rization Dataset in Python. The experiments demonstrate             scalar views. Through the use of advanced composition
that the MSE observed in practice is sufficiently small to          results from Dwork et al. [9], we showed that the estima-
allow effective reconstruction of average vectors for a             tor now has normalized MSE ๐‘‚๐œ€,๐›ฟ (๐‘‘ 8/3 ๐‘ก๐‘›โˆ’5/3 ) which can
suitably large cohort of users.                                     be further improved to ๐‘‚๐œ€,๐›ฟ (๐‘‘ 8/3 ๐‘›โˆ’5/3 ) by setting ๐‘ก = 1.
                                                                       Our contribution has provided a stepping stone be-
                                                                    tween the summation of the scalar case discussed by
6. Conclusion                                                       Balle et al. [2] and the linearization of more sophisticated
Our results extend a result from Balle et al. [2] for scalar structures such as matrices and higher-dimensional ten-
sums to provide a protocol ๐’ซ๐‘‘,๐‘˜,๐‘›,๐‘ก in the Single-Message sors, both of which are reliant on the functionality of the
Shuffle Model for the private summation of vector-valued vector case. As mentioned in Section 2, there is poten-
messages (๐‘ฅโƒ—1 , โ€ฆ , ๐‘ฅโƒ—๐‘› ) โˆˆ ([0, 1]๐‘‘ )๐‘› . It is not surprising that tial for further exploration in the Multi-Message Shuffle
the normalized MSE of the resulting estimator has a de- Model to gain additional privacy, echoing the follow-up
pendence on ๐‘›โˆ’5/3 , as this was the case for scalars, but the paper of Balle et al. [17].




                                                               8
Mary Scott et al. CEUR Workshop Proceedings                                                                                                      1โ€“10



A. Proof of Lemma 4.5                                                                    Thus we have:

Lemma 4.5. Condition (2) holds.                                                                N๐œƒ      โ€ฒ           ๐‘                     ๐‘
                                                                                         Pr[      โ‰ฅ ๐‘’ ๐œ€ ] โ‰ค exp ( โˆ’ (๐œ€ โ€ฒ /2)2 ) + exp ( โˆ’ (๐œ€ โ€ฒ /โˆš7)2 )
                                                                                               N๐œ™                  3                     2
Proof. The way in which we split the vector view (i.e., to
consider a single uniformly sampled coordinate of each                                14 ๐œ€ โ€ฒ2
                                                                          โ‰ค 2 exp (โˆ’ โ€ฒ2       log(2๐‘ก/๐›ฟ)) โ‰ค ๐›ฟ/๐‘ก.
vector-valued message in turn), means that we can apply                              2๐œ€ 7
a proof that is analogous to the scalar-valued case [2].
                                                              We now apply another Chernoff bound to show that
We work through the key steps needed.
                                                            ๐‘  โ‰ค  2E[๐‘ ], which can be used to give a bound on ๐›พ.
  Recall from Section 4.1 that the case where the ๐‘›th user
                                                            The following calculation proves that Pr[๐‘  โ‰ฅ 2E(๐‘ )] โ‰ค
submits a uniformly random message independent of
                                                            exp(โˆ’E(๐‘ )/3), using E(๐‘ ) = (๐‘› โˆ’ 1)๐‘ก/๐‘‘:
their input satisfies DP trivially. Otherwise, the ๐‘›th user
submits their true message, and we assume that analyzer                             ๐‘›โˆ’1                  ๐‘›
                                                            Pr[๐‘  โ‰ฅ 2E(๐‘ )] โ‰ค exp ( โˆ’      ๐‘ก/๐‘‘) โ‰ค exp ( โˆ’ ) < ๐›ฟ/3๐‘ก,
removes from ๐‘Œโƒ— (๐›ผ๐‘— ) any truthful messages associated with                           3                  3
                                                 (๐›ผ )
the first ๐‘› โˆ’ 1 users. Denote ๐‘›๐‘™ ๐‘— to be the count of ๐‘— th                               for all reasonable values of ๐›ฟ.
coordinates remaining with a particular value ๐‘™ โˆˆ [๐‘˜]. If                                   Substituting these bounds on ๐‘  and ๐‘ into ๐›พ ๐‘ /๐‘˜ = ๐‘
 (๐›ผ )                 โ€ฒ(๐›ผ )                                                                                    ๐œ€
๐‘ฅโƒ—๐‘› ๐‘— = ๐œƒ and ๐‘ฅโƒ—๐‘› ๐‘— = ๐œ™, we obtain the relationship                                      along with ๐œ€ โ€ฒ =              gives:
                                                                                                            2โˆš2๐‘ก log(1/๐›ฟ)
                                  (๐›ผ )                       (๐›ผ )
                                โƒ— = ๐‘‰๐›ผ ]
                     Pr[Viewโ„ณ๐‘— (๐ท)                          ๐‘›๐œƒ ๐‘—                               112๐‘˜๐‘ก log(1/๐›ฟ) log(2๐‘ก/๐›ฟ) 56๐‘‘๐‘˜ log(1/๐›ฟ) log(2๐‘ก/๐›ฟ)
                                      ๐‘—
                                                        =           .                    ๐›พโ‰ฅ                            โ‰ฅ                        .
                               (๐›ผ )                          (๐›ผ )                                         ๐‘ ๐œ€ 2                 (๐‘› โˆ’ 1)๐œ€ 2
                                โƒ— โ€ฒ ) = ๐‘‰๐›ผ ]
                     Pr[Viewโ„ณ๐‘— (๐ท                           ๐‘›๐œ™ ๐‘—
                                          ๐‘—


                                              (๐›ผ )      (๐›ผ )
We observe that the counts ๐‘›๐œƒ ๐‘— and ๐‘›๐œ™ ๐‘— follow the bino-
                                                 ๐›พ                               ๐›พ
mial distributions N๐œƒ โˆผ Bin (๐‘ , ๐‘˜ ) + 1 and N๐œ™ โˆผ Bin (๐‘ , ๐‘˜ )
respectively, where ๐‘  denotes the number of times that
the coordinate ๐‘— is sampled. In expectation, ๐‘  = (๐‘› โˆ’ 1)๐‘ก/๐‘‘,
and below we will show that it is close to its expectation:

                                         (๐›ผ )
                                              โƒ— = V๐›ผ ]
                                  Pr[Viewโ„ณ๐‘— (๐ท)     ๐‘—                      โ€ฒ
        Pr             (๐›ผ )
                       ๐‘— โƒ—    [                                         โ‰ฅ ๐‘’๐œ€ ]
             V๐›ผ๐‘— โˆผViewโ„ณ (๐ท)              (๐›ผ ) โ€ฒ
                                  Pr[Viewโ„ณ๐‘— (๐ทโƒ— ) = V๐›ผ ]
                                                      ๐‘—

                   N๐œƒ     โ€ฒ
        = Pr[         โ‰ฅ ๐‘’๐œ€ ] .
                   N๐œ™
                                         ๐›พ
We define ๐‘ โˆถ= E[N๐œ™ ] = ๐‘˜ โ‹… ๐‘  and split this into the union
of two events, ๐‘๐œƒ โ‰ฅ ๐‘๐‘’             ๐œ€ โ€ฒ /2                      โ€ฒ
                                             and ๐‘๐œ™ โ‰ค ๐‘๐‘’ โˆ’๐œ€ /2 . Applying a
Chernoff bound gives:

                   N๐œƒ      โ€ฒ         ๐‘ โ€ฒ               1 2
             Pr[      โ‰ฅ ๐‘’ ๐œ€ ] โ‰ค exp(โˆ’ (๐‘’ ๐œ€ /2 โˆ’ 1 โˆ’ ) )
                   N๐œ™                3                 ๐‘
                                     ๐‘       โˆ’๐œ€ โ€ฒ /2 2
                              + exp(โˆ’ (1 โˆ’ ๐‘’        ) ).
                                     2

We will choose ๐‘ โ‰ฅ ๐œ€14โ€ฒ2 log(2๐‘ก/๐›ฟ) so that we have:

                              1 ๐œ€ โ€ฒ ๐œ€ โ€ฒ2        ๐œ€ โ€ฒ2     ๐œ€โ€ฒ
  exp (๐œ€ โ€ฒ /2) โˆ’ 1 โˆ’            โ‰ฅ   +    โˆ’              โ‰ฅ .
                              ๐‘   2   8    14 log(2๐‘ก/๐›ฟ)  2

Using ๐œ€ โ€ฒ < 1, we have:

                                                                         ๐œ€โ€ฒ
        (1 โˆ’ exp (โˆ’๐œ€ โ€ฒ /2)) โ‰ฅ (1 โˆ’ exp (โˆ’1/2))๐œ€ โ€ฒ โ‰ฅ                         .
                                                                         โˆš7



                                                                                     9
Mary Scott et al. CEUR Workshop Proceedings                                                                          1โ€“10



References                                                          J. Tinnes, B. Seefeld, PROCHLO: Strong privacy
                                                                    for analytics in the crowd, in: Proceedings of the
 [1] C. Dwork, Differential privacy, in: Proceedings                26th Symposium on Operating Systems Principles,
     of the 33rd International Colloquium on Automata,              ACM, New York City, 2017, pp. 441โ€“459.
     Languages and Programming (ICALP), Springer,              [14] A. Cheu, A. Smith, J. Ullman, D. Zeber, M. Zhilyaev,
     Cham, 2006, pp. 1โ€“12.                                          Distributed differential privacy via shuffling, in:
 [2] B. Balle, J. Bell, A. Gascรณn, K. Nissim, The privacy           Annual International Conference on the Theory
     blanket of the shuffle model, in: Annual Inter-                and Applications of Cryptographic Techniques,
     national Cryptology Conference, Springer, Cham,                Springer, Cham, 2019, pp. 375โ€“403.
     2019, pp. 638โ€“667.                                        [15] B. Balle, J. Bell, A. Gascรณn, K. Nissim, Im-
 [3] B. McMahan, E. Moore, D. Ramage, S. Hampson,                   proved summation from shuffling, arXiv preprint
     B. A. y Arcas, Communication-efficient learning of             arXiv:1909.11225, 2019.
     deep networks from decentralized data, in: Artifi-        [16] Y. Ishai, E. Kushilevitz, R. Ostrovsky, A. Sahai, Cryp-
     cial Intelligence and Statistics Conference, PMLR,             tography from anonymity, in: 47th Annual IEEE
     New York City, 2017, pp. 1273โ€“1282.                            Symposium on Foundations of Computer Science,
 [4] M. Abadi, A. Chu, I. Goodfellow, Deep learning                 IEEE, New York City, 2006, pp. 239โ€“248.
     with differential privacy, in: Proceedings of the         [17] B. Balle, J. Bell, A. Gascรณn, K. Nissim, Private sum-
     2016 ACM SIGSAC Conference on Computer and                     mation in the multi-message shuffle model, in:
     Communications Security, ACM, New York City,                   Proceedings of the 2020 ACM SIGSAC Conference
     2016, pp. 308โ€“318.                                             on Computer Communications and Security, ACM,
 [5] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone,              New York City, 2020, pp. 657โ€“676.
     B. McMahan, S. Patel, D. Ramage, A. Segal, K. Seth,       [18] B. Ghazi, N. Golowich, R. Kumar, R. Pagh, A. Vel-
     Practical secure aggregation for privacy-preserving            ingker, On the power of multiple anonymous mes-
     machine learning, in: Proceedings of the 2017 ACM              sages, in: Advances in Cryptologyโ€”EUROCRYPT
     SIGSAC Conference on Computer and Communi-                     2021, Springer, Cham, 2021, pp. 463โ€“488.
     cations Security, ACM, New York City, 2017, pp.           [19] B. Ghazi, P. Manurangsi, R. Pagh, A. Velingker, Pri-
     1175โ€“1191.                                                     vate aggregation from fewer anonymous messages,
 [6] L. Sweeney, k-anonymity: A model for protect-                  in: Advances in Cryptologyโ€”EUROCRYPT 2020,
     ing privacy, International Journal of Uncertainty,             Springer, Cham, 2020, pp. 798โ€“827.
     Fuzziness and Knowledge-Based Systems 10 (2002)           [20] P. Kairouz, S. Oh, P. Viswanath, Extremal mecha-
     557โ€“570.                                                       nisms for local differential privacy, The Journal of
 [7] A. Machanavajjhala, D. Kifer, J. Gehrke, M. Venki-             Machine Learning Research 17 (2016) 492โ€“542.
     tasubramaniam, l-diversity: Privacy beyond k-             [21] P. Kairouz, K. Bonawitz, D. Ramage, Discrete dis-
     anonymity, in: ACM Transactions on Knowledge                   tribution estimation under local privacy, in: Pro-
     Discovery from Data (TKDD), ACM, New York City,                ceedings of the 33rd International Conference on
     2007, pp. 3โ€“es.                                                Machine Learning, volume 48, ACM, New York City,
 [8] N. Li, T. Li, S. Venkatasubramanian, t-closeness: Pri-         2016, pp. 2436โ€“2444.
     vacy beyond k-anonymity and l-diversity, in: 2007         [22] A. Bhowmick, J. Duchi, J. Freudiger, G. Kapoor,
     IEEE 23rd International Conference on Data Engi-               R. Rogers, Protection against reconstruction and
     neering, IEEE, New York City, 2007, pp. 106โ€“115.               its applications in private federated learning, arXiv
 [9] C. Dwork, A. Roth, The algorithmic foundations                 preprint arXiv:1812.00984, 2018.
     of differential privacy, Foundations and Trends in
     Theoretical Computer Science 9 (2014) 211โ€“407.
[10] รš. Erlingsson, V. Pihur, A. Korolova, RAPPOR: Ran-
     domized aggregatable privacy-preserving ordinal
     response, in: Proceedings of the 2014 ACM SIGSAC
     Conference on Computer and Communications Se-
     curity, ACM, New York City, 2014, pp. 1054โ€“1067.
[11] A. D. P. Team, Learning with privacy at scale, 2017.
[12] B. Ding, J. Kulkarni, S. Yekhanin, Collecting teleme-
     try data privately, in: Advances in Neural Infor-
     mation Processing Systems, ACM, New York City,
     2017, pp. 3571โ€“3580.
[13] A. Bittau, รš. Erlingsson, P. Maniatis, I. Mironov,
     A. Raghunathan, D. Lie, M. Rudominer, U. Kode,



                                                          10