=Paper= {{Paper |id=Vol-3163/paper7 |storemode=property |title=Applying the Shuffle Model of Differential Privacy to Vector Aggregation |pdfUrl=https://ceur-ws.org/Vol-3163/BICOD21_paper_4.pdf |volume=Vol-3163 |authors=Mary Scott,Graham Cormode,Carsten Maple |dblpUrl=https://dblp.org/rec/conf/bncod/ScottCM21 }} ==Applying the Shuffle Model of Differential Privacy to Vector Aggregation== https://ceur-ws.org/Vol-3163/BICOD21_paper_4.pdf
Applying the Shuffle Model of Differential Privacy to Vector
Aggregation
Mary Scott1 , Graham Cormode1 and Carsten Maple2
1
    Department of Computer Science, University of Warwick, Coventry, CV4 7AL, UK
2
    WMG, University of Warwick, Coventry, CV4 7AL, UK


                                             Abstract
                                             In this work we introduce a new protocol for vector aggregation in the context of the Shuffle Model, a recent model within
                                             Differential Privacy (DP). It sits between the Centralized Model, which prioritizes the level of accuracy over the secrecy of the
                                             data, and the Local Model, for which an improvement in trust is counteracted by a much higher noise requirement. The
                                             Shuffle Model was developed to provide a good balance between these two models through the addition of a shuffling step,
                                             which unbinds the users from their data whilst maintaining a moderate noise requirement. We provide a single message
                                             protocol for the summation of real vectors in the Shuffle Model, using advanced composition results. Our contribution
                                             provides a mechanism to enable private aggregation and analysis across more sophisticated structures such as matrices and
                                             higher-dimensional tensors, both of which are reliant on the functionality of the vector case.

                                             Keywords
                                             Differential privacy (DP), single-message shuffle model, local randomizer, randomized response, mean squared error (MSE).



1. Introduction                                                                                                              There are many practical applications of the Single-
                                                                                                                          Message Shuffle Model in Federated Learning, where
Differential Privacy (DP) [1] is a strong, mathematical                                                                   multiple users collaboratively solve a Machine Learning
definition of privacy that guarantees a measurable level of                                                               problem, the results of which simultaneously improves
confidentiality for any data subject in the dataset to which                                                              the model for the next round [3]. The updates generated
it is applied. In this way, useful collective information                                                                 by the users after each round are high-dimensional vec-
can be learned about a population, whilst simultaneously                                                                  tors, so this data type will prove useful in applications
protecting the personal information of each data subject.                                                                 such as training a Deep Neural Network to predict the
   In particular, DP guarantees that the impact on any                                                                    next word that a user types [4]. Additionally, aggrega-
particular individual as a result of analysis on a dataset                                                                tion is closely related to Secure Aggregation, which can
is the same, whether or not the individual is included in                                                                 be used to compute the outputs of Machine Learning
the dataset. This guarantee is quantified by a parameter                                                                  problems such as the one above [5].
𝜀, which represents good privacy if it is small. However,                                                                    Our contribution is a protocol in the Single-Message
finding an algorithm that achieves DP often requires a                                                                    Shuffle Model for the private summation of vector-valued
trade-off between privacy and accuracy, as a smaller 𝜀                                                                    messages, extending an existing result from Balle et al. [2]
sacrifices accuracy for better privacy, and vice versa. DP                                                                by permitting the 𝑛 users to each submit a vector of real
enables data analyses such as the statistical analysis of                                                                 numbers instead of a scalar. The resulting estimator is
the salaries of a population. This allows useful collec-                                                                  unbiased and has normalized mean squared error (MSE)
tive information to be studied, as long as 𝜀 is adjusted                                                                  𝑂𝜀,𝛿 (𝑑 8/3 𝑛−5/3 ), where 𝑑 is the dimension of each vector.
appropriately to satisfy the definition of DP.                                                                               This vector summation protocol above can be extended
   In this work we focus on protocols in the Single-Message                                                               to produce a similar protocol for the linearization of ma-
Shuffle Model [2], a one-time data collection model where                                                                 trices. It is important to use matrix reduction to en-
each of 𝑛 users is permitted to submit a single message.                                                                  sure that the constituent vectors are linearly indepen-
We have chosen to apply the Single-Message Shuffle                                                                        dent. This problem can be extended further to higher-
Model to the problem of vector aggregation, as there are                                                                  dimensional tensors, which are useful for the representa-
links to Federated Learning and Secure Aggregation.                                                                       tion of multi-dimensional data in Neural Networks.
BICOD21: British International Conference on Databases, March 28,
2022, London, UK
Envelope-Open mary.p.scott@warwick.ac.uk (M. Scott);
                                                                                                                          2. Related Work
g.cormode@warwick.ac.uk (G. Cormode); cm@warwick.ac.uk
(C. Maple)                                                                                                                The earliest attempts at protecting the privacy of users in
Orcid 0000-0003-0799-5840 (M. Scott); 0000-0002-0698-0922                                                                 a dataset focused on simple ways of suppressing or gen-
(G. Cormode); 0000-0002-4715-212X (C. Maple)                                                                              eralising the data. Examples include 𝑘-anonymity [6], 𝑙-
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).                         diversity [7] and 𝑡-closeness [8]. However, such attempts
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)




                                                                                                                      1
Mary Scott et al. CEUR Workshop Proceedings                                                                                   1–10



have been shown to be insufficient, as proved by numer-            Multi-Message Shuffle Model, an extension of the Single-
ous examples [9].                                                  Message Shuffle Model that permits each of the 𝑛 users
   This harmful leakage of sensitive information can be            to submit more than one message, using several inde-
easily prevented through the use of DP, as this mathe-             pendent shufflers to securely compute the sum. In this
matically guarantees that the chance of a linkage attack           work, Balle et al. contributed a recursive construction
on an individual in the dataset is almost identical to that        based on the protocol from [2], as well as an alternative
on an individual not in the dataset.                               mechanism which implements a discretized distributed
   Ever since DP was first conceptualized in 2006 by               noise addition technique using the result from Ishai et
Dwork et al. [1], the majority of research in the field            al. [16].
has focused on two opposing models. In the Centralized                Also relevant to our research is the work of Ghazi et
Model, users submit their sensitive personal informa-              al. [18], which explored the related problems of private
tion directly to a trusted central data collector, who adds        frequency estimation and selection in a similar context,
random noise to the raw data to provide DP, before as-             drawing comparisons between the errors achieved in the
sembling and analyzing the aggregated results.                     Single-Message and Multi-Message Shuffle Models. A
   In the Local Model, DP is guaranteed when each user             similar team of authors produced a follow-up paper [19]
applies a local randomizer to add random noise to their            describing a more efficient protocol for private summa-
data before it is submitted. The Local Model differs from          tion in the Single-Message Shuffle Model, using the ‘in-
the Centralized Model in that the central entity does not          visibility cloak’ technique to facilitate the addition of
see the users’ raw data at any point, and therefore does           zero-sum noise without coordination between the users.
not have to be trusted. However, the level of noise re-
quired per user for the same privacy guarantee is much
higher, which limits the usage of Local Differential Pri-          3. Preliminaries
vacy (LDP) to major companies such as Google [10], Ap-
                                                                   We consider randomized mechanisms [9] ℳ, ℛ under
ple [11] and Microsoft [12].                                                                                            ⃗ 𝐷⃗′
                                                                   domains 𝕏, 𝕐, and apply them to input datasets 𝐷,
   Neither of these two extensively studied models can
                                                                   to generate (vector-valued) messages 𝑥⃗𝑖 , 𝑥⃗𝑖′ . We write
provide a good balance between the trust of the central
                                                                   [𝑘] = {1, … , 𝑘} and ℕ for the set of natural numbers.
entity and the level of noise required to guarantee DP.
Hence, in recent years researchers have tried to create
intermediate models that reap the benefits of both.                3.1. Models of Differential Privacy
   In 2017, Bittau et al. [13] introduced the Encode, Shuf-        The essence of Differential Privacy (DP) is the require-
fle, Analyze (ESA) model, which provides a general frame-          ment that the contribution 𝑥⃗𝑖 of a user 𝑖 to a dataset
work for the addition of a shuffling step in a private pro-        𝐷⃗ = (𝑥⃗1 , … , 𝑥⃗𝑛 ) does not have much effect on the out-
tocol. After the data from each user is encoded, it is             come of the mechanism applied to that dataset.
randomly permuted to unbind each user from their data                 In the centralized model of DP, random noise is only in-
before analysis takes place. In 2019, Cheu et al. [14] for-        troduced after the users’ inputs are gathered by a (trusted)
malized the Shuffle Model as a special case of the ESA                                                  ⃗ ′ that differs from 𝐷⃗
                                                                   aggregator. Consider a dataset 𝐷
model, which connects this additional shuffling step to                                                                  ⃗   ⃗
                                                                   only in the contribution of a single user, denoted 𝐷 ≃ 𝐷 ′ .
the Local Model. In the Shuffle Model, the local random-
                                                                   Also let 𝜀 ≥ 0 and 𝛿 ∈ (0, 1). We say that a randomized
izer applies a randomized mechanism on a per-element
                                                                   mechanism ℳ ∶ 𝕏𝑛 → 𝕐 is (𝜀, 𝛿)-differentially private
basis, potentially replacing a truthful value with another
                                                                       ⃗ ≃𝐷
                                                                   if ∀𝐷     ⃗ ′ , ∀𝐸 ⊆ 𝕐:
randomly selected domain element. The role of these
independent reports is to create what we call a privacy                       ⃗ ∈ 𝐸] ≤ 𝑒 𝜀 ⋅ Pr[ℳ(𝐷
                                                                         Pr[ℳ(𝐷)                  ⃗ ′ ) ∈ 𝐸] + 𝛿 [9].
blanket, which masks the outputs which are reported
truthfully.                                                          In this definition, we assume that the trusted aggrega-
   As well as the result on the private summation of scalar-       tor obtains the raw data from all users and introduces
valued messages in the Single-Message Shuffle Model                the necessary perturbations.
that we will be using [2], Balle et al. have published               In the local model of DP, each user 𝑖 independently uses
two more recent works that solve related problems. The             randomness on their input 𝑥⃗𝑖 ∈ 𝕏 by using a local ran-
first paper [15] improved the distributed 𝑛-party summa-           domizer ℛ ∶ 𝕏 → 𝕐 to obtain a perturbed result ℛ(𝑥⃗𝑖 ).
tion protocol from Ishai et al. [16] in the context of the         We say that the local randomizer is (𝜀, 𝛿)-differentially
Single-Message Shuffle Model to require 𝑂(1 + 𝜋/ log 𝑛)                        ⃗ 𝐷
                                                                   private if ∀𝐷, ⃗ ′ , ∀𝐸 ⊆ 𝕐:
scalar-valued messages, instead of a logarithmic depen-
                                                                          Pr[ℛ(𝑥⃗𝑖 ) ∈ 𝐸] ≤ 𝑒 𝜀 ⋅ Pr[ℛ(𝑥⃗𝑖′ ) ∈ 𝐸] + 𝛿 [2],
dency of 𝑂(log 𝑛 + 𝜋), to achieve statistical security 2−𝜋 .
The second paper [17] introduced two new protocols for             where 𝑥⃗𝑖′ ∈ 𝕏 is some other valid input vector that 𝑖 could
the private summation of scalar-valued messages in the             hold. The Local Model guarantees that any observer will



                                                               2
Mary Scott et al. CEUR Workshop Proceedings                                                                                  1–10



not have access to the raw data from any of the users.                 Algorithm 1: Local Randomizer ℛ𝛾𝑃𝐻
                                                                                                        ,𝑘,𝑛
That is, it removes the requirement for trust. The price
is that this requires a higher level of noise per user to               Public Parameters:
achieve the same privacy guarantee.                                      𝛾 ∈ [0, 1], domain size 𝑘, and number of
                                                                         parties 𝑛
                                                                        Input: 𝑥𝑖 ∈ [𝑘]
3.2. Single-Message Shuffle Model                                       Output: 𝑦𝑖 ∈ [𝑘]
The Single-Message Shuffle Model sits in between the                    Sample 𝑏 ← Ber (𝛾 )
Centralized and Local Models of DP [2]. Let a protocol                  if 𝑏 = 0 then let 𝑦𝑖 ← 𝑥𝑖
𝒫 in the Single-Message Shuffle Model be of the form                    else sample 𝑦𝑖 ← Unif ([𝑘])
𝒫 = (ℛ, 𝒜 ), where ℛ ∶ 𝕏 → 𝕐 is the local randomizer,                   return 𝑦𝑖
and 𝒜 ∶ 𝕐𝑛 → ℤ is the analyzer of 𝒫. Overall, 𝒫 im-
plements a mechanism 𝒫 ∶ 𝕏𝑛 → ℤ as follows. Each
user 𝑖 independently applies the local randomizer to their            4.1. Basic Randomizer
message 𝑥⃗𝑖 to obtain a message 𝑦⃗𝑖 = ℛ(𝑥⃗𝑖 ). Subsequently,
the messages (⃗       𝑦1 , … , 𝑦⃗𝑛 ) are randomly permuted by a       First, we describe a basic local randomizer applied by each
trusted shuffler 𝒮 ∶ 𝕐𝑛 → 𝕐𝑛 . The random permutation                 user 𝑖 to an input 𝑥𝑖 ∈ [𝑘]. The output of this protocol is a
𝒮 (⃗
   𝑦1 , … , 𝑦⃗𝑛 ) is submitted to an untrusted data collector,        (private) histogram of shuffled messages over the domain
who applies the analyzer 𝒜 to obtain an output for the                [𝑘].
mechanism. In summary, the output of 𝒫 (𝑥⃗1 , … , 𝑥⃗𝑛 ) is               The Local Randomizer ℛ𝛾𝑃𝐻     ,𝑘,𝑛 , shown in Algorithm 1,
given by:                                                             applies a generalized randomized response mechanism
                                                                      that returns the true message 𝑥𝑖 with probability 1 −
       𝒜 ∘ 𝒮 ∘ ℛ 𝑛 (𝑥)
                    ⃗ = 𝒜 (𝒮 (ℛ(𝑥⃗1 ), … , ℛ(𝑥⃗𝑛 ))).                 𝛾 and a uniformly random message with probability 𝛾.
                                                                      Such a basic randomizer is used by Balle et al. [2] in the
   Note that the data collector observing the shuffled mes-           Single-Message Shuffle Model for scalar-valued messages,
sages 𝒮 (⃗
         𝑦1 , … , 𝑦⃗𝑛 ) obtains no information about which            as well as in several other previous works in the Local
user generated each of the messages. Therefore, the pri-              Model [20, 21, 22]. In Section 4.3, we find an appropriate
vacy of 𝒫 relies on the indistinguishability between the              𝛾 to optimize the proportion of random messages that are
shuffles 𝒮 ∘ ℛ 𝑛 (𝐷) ⃗ and 𝒮 ∘ ℛ 𝑛 (𝐷⃗ ′ ) for datasets 𝐷
                                                        ⃗ ≃𝐷
                                                           ⃗ ′.       submitted, and therefore guarantee DP.
The analyzer can represent the shuffled messages as a                    We now describe how the presence of these random
histogram, which counts the number of occurrences of                  messages can form a ‘privacy blanket’ to protect against
the possible outputs of 𝕐.                                            a difference attack on a particular user. Suppose we apply
                                                                      Algorithm 1 to the messages from all 𝑛 users. Note that
3.3. Measuring Accuracy                                               a subset 𝐵 of approximately 𝛾 𝑛 of these users returned a
                                                                      uniformly random message, while the remaining users
In Section 4 we use the mean squared error to compare                 returned their true message. Following Balle et al. [2],
the overall output of a private summation protocol in the             the analyzer can represent the messages sent by users in
Single-Message Shuffle Model with the original dataset.               𝐵 by a histogram 𝑌1 of uniformly random messages, and
The MSE is used to measure the average squared differ-                can form a histogram 𝑌2 of truthful messages from users
ence in the comparison between a fixed input 𝑓 (𝐷)  ⃗ to              not in 𝐵. As these subsets are mutually exclusive and
                                               ⃗ In this
the randomized protocol 𝒫, and its output 𝒫 (𝐷).                      collectively exhaustive, the information represented by
context, MSE(𝒫 , 𝐷)⃗ = E[(𝒫 (𝐷)  ⃗ − 𝑓 (𝐷))
                                        ⃗ 2 ], where the              the analyzer is equivalent to the histogram 𝑌 = 𝑌1 ∪ 𝑌2 .
expectation is taken over the randomness of 𝒫. Note                      Consider two neighbouring datasets, each consisting
             ⃗ = 𝑓 (𝐷),⃗ MSE is equivalent to variance,               of 𝑛 messages from 𝑛 users, that differ only on the in-
when E[𝒫 (𝐷)]
                                                                      put from the 𝑛th user. To simplify the discussion and
i.e.:
                                                                      subsequent proof, we temporarily omit the action of the
         ⃗ = E[(𝒫 (𝐷)
 MSE(𝒫 , 𝐷)        ⃗ − E[𝒫 (𝐷)])
                            ⃗ 2 ] = Var[𝒫 (𝐷)].
                                           ⃗                          shuffler. By the post-processing property of DP, this can
                                                                      be reintroduced later on without adversely affecting the
                                                                      privacy guarantees. To achieve DP we need to find an
4. Vector Sum in the Shuffle Model                                    appropriate 𝛾 such that when Algorithm 1 is applied, the
                                                                      change in 𝑌 is appropriately bounded. As the knowledge
In this section we introduce our protocol for vector sum-             of either the set 𝐵 or the messages from the first 𝑛 − 1
mation in the Shuffle Model and tune its parameters to                users does not affect DP, we can assume that the ana-
optimize accuracy.                                                    lyzer knows both of these details. This lets the analyzer
                                                                      remove all of the truthful messages associated with the




                                                                  3
Mary Scott et al. CEUR Workshop Proceedings                                                                                                 1–10



first 𝑛 − 1 users from 𝑌.                                                Algorithm 2: Local Randomizer ℛ𝑑,𝑘,𝑛,𝑡
   If the 𝑛th user is in 𝐵, this means their submission is
                                                                          Public Parameters: 𝑘, 𝑡, dimension 𝑑, and
independent of their input, so we trivially satisfy DP.
                                                                           number of parties 𝑛
Otherwise, the (curious) analyzer knows that the 𝑛th user                                (1)      (𝑑)
has submitted their true message 𝑥𝑛 . The analyzer can                    Input: 𝑥⃗𝑖 = (𝑥𝑖 , … , 𝑥𝑖 ) ∈ [0, 1]𝑑
                                                                                                 (𝛼 )       (𝛼 )
remove all of the truthful messages associated with the                   Output: 𝑦⃗𝑖 = (𝑦𝑖 1 , … , 𝑦𝑖 𝑡 ) ∈ {0, 1, … , 𝑘}𝑡
first 𝑛−1 users from 𝑌, and obtain 𝑌1 ∪{𝑥𝑛 }. The subsequent
                                                                          Sample (𝛼1 , … , 𝛼𝑡 ) ← Unif ([𝑑])
privacy analysis will argue that this does not reveal 𝑥𝑛                        (𝛼 )          (𝛼 )             (𝛼 )      (𝛼 )
if 𝛾 is set so that 𝑌1 , the histogram of random messages,                Let 𝑥𝑖̄ 𝑗 ← ⌊𝑥𝑖 𝑗 𝑘⌋ + Ber (𝑥𝑖 𝑗 𝑘 − ⌊𝑥𝑖 𝑗 𝑘⌋)
                                                                                (𝛼𝑗 )                   (𝛼𝑗 )
appropriately ‘hides’ 𝑥𝑛 .                                                 ▷ 𝑥𝑖̄      : encoding of 𝑥𝑖        with precision 𝑘
                                                                                (𝛼𝑗 )                                    (𝛼𝑗 )
                                                                           ▷ 𝑦𝑖       : apply Algorithm 1 to each 𝑥𝑖̄
4.2. Private Summation of Vectors                                                           (𝛼 )     (𝛼 )
                                                                          return 𝑦⃗𝑖 = (𝑦𝑖 1 , … , 𝑦𝑖 𝑡 )
Here, we extend the protocol from Section 4.1 to ad-
dress the problem of computing the sum of 𝑛 real vec-
                                 (1)    (𝑑)
tors, each of the form 𝑥⃗𝑖 = (𝑥𝑖 , … , 𝑥𝑖 ) ∈ [0, 1]𝑑 , in the           Algorithm 3: Analyzer 𝒜𝑑,𝑘,𝑡
Single-Message Shuffle Model. Specifically, we analyze                    Public Parameters: 𝑘, 𝑡, and dimension 𝑑
the utility of a protocol 𝒫𝑑,𝑘,𝑛,𝑡 = (ℛ𝑑,𝑘,𝑛,𝑡 , 𝒜𝑑,𝑘,𝑡 ) for this        Input: Multiset {⃗
                                                                                           𝑦𝑖 }𝑖∈[𝑛] , with
purpose, by using the MSE from Section 3.3 as the accu-                                (𝛼 )          (𝛼 )
                                                                                (𝑦𝑖 1 , … , 𝑦𝑖 𝑡 ) ∈ {0, 1, … , 𝑘}𝑡
racy measure. In the scalar case, each user applies the
                                                                          Output: 𝑧⃗ = (𝑧 (1) , … , 𝑧 (𝑑) ) ∈ [0, 1]𝑑
protocol to their entire input [2]. Moving to the vector
case, we allow each user to independently sample a set                           (𝑙)      (𝛼 )
                                                                          Let 𝑦𝑖      ← 𝑦𝑖 𝑗
of 1 ≤ 𝑡 ≤ 𝑑 coordinates from their vector to report. Our                       (𝛼𝑗 )                                (𝑙)
analysis allows us to optimize the parameter 𝑡.                             ▷ 𝑦𝑖      : submission corresponding to 𝑥𝑖
   Hence, the first step of the Local Randomizer ℛ𝑑,𝑘,𝑛,𝑡 ,                                                        (1)          (𝑑)
presented in Algorithm 2, is to uniformly sample 𝑡 co-                    Let (𝑧 ̂(1) , … , 𝑧 ̂(𝑑) ) ← ( 1𝑘 ∑𝑖 𝑦𝑖 , … , 1𝑘 ∑𝑖 𝑦𝑖 )
ordinates (𝛼1 , … , 𝛼𝑡 ) ∈ [𝑑] (without replacement) from                 Let
each vector 𝑥⃗𝑖 . To compute a differentially private ap-                  (𝑧 (1) , … , 𝑧 (𝑑) ) ← (DeBias (𝑧 ̂(1) ), … ,DeBias (𝑧 ̂(𝑑) ))
                                                                                            (𝑙)
                                                                           ▷ DeBias (𝑧 ̂ ) = (𝑧 ̂
                                                                                                      (𝑙) − 𝛾 ⋅ |𝑦 (𝑙) |)/(1 − 𝛾 )
proximation of ∑𝑖 𝑥⃗𝑖 , we fix a quantization level 𝑘. Then                                                  2    𝑖
                                (𝛼 )              (𝛼 )
we randomly round each 𝑥𝑖 𝑗 to obtain 𝑥𝑖̄ 𝑗 as either                     return 𝑧⃗ = (𝑧 (1) , … , 𝑧 (𝑑) )
  (𝛼 )         (𝛼 )
⌊𝑥𝑖 𝑗 𝑘⌋ or ⌈𝑥𝑖 𝑗 𝑘⌉. Next, we apply the randomized re-
                                                     (𝛼 )
sponse mechanism from Algorithm 1 to each 𝑥𝑖̄ 𝑗 , which                  4.3. Privacy Analysis of Algorithms
                      (𝛼 )                                    (𝛼 )
sets each output 𝑦𝑖 𝑗 independently to be equal to 𝑥𝑖̄ 𝑗
                                                            In this section, we will find an appropriate 𝛾 that en-
with probability 1 − 𝛾, or a random value in {0, 1, … , 𝑘}
                                    (𝛼𝑗 )                   sures that the mechanism described in Algorithms 2 and
with probability 𝛾. Each 𝑦𝑖 will contribute to a his- 3 satisfies (𝜀, 𝛿)-DP for vector-valued messages in the
                          (𝛼 )           (𝛼 )               Single-Message Shuffle Model. To achieve this, we prove
togram of the form (𝑦1 𝑗 , … , 𝑦𝑛 𝑗 ) as in Section 4.1.
    The Analyzer 𝒜𝑑,𝑘,𝑡 , shown in Algorithm 3, aggregates the following theorem, where we initially assume 𝜀 < 1
the histograms to approximate ∑𝑖 𝑥⃗𝑖 by post-processing to simplify our computations. At the end of this section,
the vectors coordinate-wise. More precisely, the analyzer we discuss how to cover the additional case 1 ≤ 𝜀 < 6 to
                     (𝛼 )      (𝑙)                          suit our experimental study.
sets each output 𝑦𝑖 𝑗 to 𝑦𝑖 , where the new label 𝑙 is from
                               (𝑙)
its corresponding input 𝑥𝑖 of the original 𝑑-dimensional Theorem 4.1. The shuffled mechanism ℳ = 𝒮 ∘ ℛ𝑑,𝑘,𝑛,𝑡
                                 (𝑙)                        is (𝜀, 𝛿)-DP for any 𝑑, 𝑘, 𝑛 ∈ ℕ, {𝑡 ∈ ℕ | 𝑡 ∈ [𝑑]}, 𝜀 < 6 and
vector 𝑥⃗𝑖 . For all inputs 𝑥𝑖 that were not sampled, we
      (𝑙)                                                   𝛿 ∈ (0, 1] such that:
set 𝑦𝑖 = 0. Subsequently, the analyzer aggregates the
                                                                          56𝑑𝑘 log(1/𝛿) log(2𝑡/𝛿)
sets of outputs from all users corresponding to each of                                           ,     when 𝜀 < 1
                                                                                  (𝑛−1)𝜀 2
those 𝑙 coordinates in turn, so that a 𝑑-dimensional vector        𝛾  = { 2016𝑑𝑘 log(1/𝛿) log(2𝑡/𝛿)
is formed. Finally, a standard debiasing step is applied to                        (𝑛−1)𝜀 2
                                                                                                    , when 1 ≤ 𝜀 < 6.
this vector to remove the scaling and rounding applied to
                                                                           ⃗                         ⃗′                ′
each submission. DeBias returns an unbiased estimator, Proof. Let 𝐷 = (𝑥⃗1 , … , 𝑥⃗𝑛 ) and 𝐷 = (𝑥⃗1 , … , 𝑥⃗𝑛 ) be the
𝑧⃗, which calculates an estimate of the true sum of the two neighbouring
                                                                   th
                                                                                     datasets differing only in the input of
vectors by subtracting the expected uniform noise from the 𝑛 user, as used in Section 4.1. (1)Here each            (𝑑)
                                                                                                                          vector-
the randomized sum of the vectors.                          valued    message    𝑥⃗𝑖  is of the form   (𝑥𝑖 , … , 𝑥 𝑖   ).  Recall
                                                            from Section 4.1 that we assume that the analyzer can



                                                                     4
Mary Scott et al. CEUR Workshop Proceedings                                                                                                       1–10



see the users in 𝐵 (i.e., the subset of users that returned a              sufficient to derive (1) from:
uniformly random message), as well as the inputs from
                                                                                                                 (𝛼 )
the first 𝑛 − 1 users.                                                                                              ⃗ = V𝛼 ]
                                                                                                         Pr[Viewℳ𝑗 (𝐷)    𝑗                  ′
   We now introduce the vector view VViewℳ (𝐷)       ⃗ as the                Pr            (𝛼𝑗 )
                                                                                                 ⃗
                                                                                                     [                                ≥ 𝑒𝜀 ] ≤ 𝛿 ′,
                                                                                  V𝛼𝑗 ∼Viewℳ (𝐷)                (𝛼 ) ′
collection of information that the analyzer is able to see                                               Pr[Viewℳ𝑗 (𝐷⃗ ) = V𝛼 ]
                                                                                                                             𝑗
after the mechanism ℳ is applied to all vector-valued                                                                                               (2)
messages in the dataset 𝐷.       ⃗ VViewℳ (𝐷)⃗ is defined as                     ̃ = ⋃ 𝛼 V𝛼 , 𝜀 ′ =
                                                                           where V                                   𝜀
                                                                                                                            and 𝛿 ′ = 𝛿𝑡 .
                                                                                        𝑗  𝑗                  2√2𝑡 log(1/𝛿)
             ⃗ ⃗    ⃗             ⃗
the tuple (𝑌 , 𝐷∩ , 𝑏), where 𝑌 is the multiset containing
the outputs {⃗ 𝑦1 , … , 𝑦⃗𝑛 } of the mechanism ℳ(𝐷),⃗ 𝐷   ⃗ ∩ is           Lemma 4.4. Condition (2) implies condition (1).
the vector containing the inputs (𝑥⃗1 , … , 𝑥⃗𝑛−1 ) from the
                                                                           Proof. We can express VViewℳ (𝐷)  ⃗ as the composition
first 𝑛 − 1 users, and ⃗𝑏 contains binary vectors (⃗𝑏1 , … , ⃗𝑏𝑛 )                                   (𝛼1 )       (𝛼 )
which indicate for which coordinates each user reports                     of the 𝑡 scalar views Viewℳ , … , Viewℳ𝑡 , as:
truthful information. This vector view can be projected to
                                                                                      ⃗ = V]
                                                                           Pr[VViewℳ (𝐷)  ̃
𝑡 overlapping scalar views by applying Algorithm 2 only
to the 𝑗 th uniformly sampled coordinate 𝛼𝑗 ∈ [𝑑] from                                        (𝛼 )
                                                                                           ⃗ = V𝛼 ∧ ⋯ ∧ View 𝑡 (𝐷)
                                                                              = Pr[Viewℳ1 (𝐷)                   ⃗ = V𝛼 ]       (𝛼 )
                                                         (𝛼 )                                    1          ℳ         𝑡
each user, where 𝑗 ∈ [𝑡]. The 𝑗 th scalar view Viewℳ𝑗 (𝐷)      ⃗                       (𝛼 ) ⃗                      (𝛼 ) ⃗
                                                    (𝛼  )                     = Pr[Viewℳ1 (𝐷) = V𝛼1 ] ⋅ ⋯ ⋅ Pr[Viewℳ𝑡 (𝐷) = V𝛼𝑡 ].
                                            (𝛼
            ⃗ is defined as the tuple (𝑌⃗ , 𝐷
of VViewℳ (𝐷)                                 𝑗 )     𝑗   ⃗
                                                  ⃗ ∩ , 𝑏 ) ),
                                                            (𝛼𝑗

where:                                                          Our desired result is immediate by applying Corol-
                                                             lary 4.3, which states that the use of 𝑡 overlapping (𝜀 ′ , 𝛿 ′ )-
               𝑌⃗ (𝛼𝑗 ) = ℳ(𝐷  ⃗ (𝛼𝑗 ) ) = {𝑦1(𝛼𝑗 ) , … , 𝑦𝑛(𝛼𝑗 ) },
                                                             DP mechanisms, when taken together, is (𝜀, 𝛿)-DP. This
                                                             applies in our setting, since we have assumed that
               ⃗ ∩(𝛼𝑗 ) = (𝑥1(𝛼𝑗 ) , … , 𝑥𝑛−1
               𝐷
                                           (𝛼𝑗 )
                                                 )                       ⃗ satisfies the requirements of (𝜀, 𝛿)-DP, and
                                                             VViewℳ (𝐷)
                            (𝛼𝑗 )         (𝛼𝑗 )              that each  of the 𝑡 overlapping scalar views is formed iden-
        and     ⃗𝑏 𝑗 = (𝑏 , … , 𝑏𝑛 )
                  (𝛼  )
                            1
                                                             tically but for a different uniformly sampled coordinate
                                    ⃗ ∩ and ⃗𝑏, but contain- of the vector-valued messages.
                                  ⃗ 𝐷
are the analogous definitions of 𝑌,
ing only the information referring to the 𝑗 th uniformly        To complete the proof of Theorem 4.1 for 𝜀 < 1, it
sampled coordinate of each vector-valued message.            remains to show that for a uniformly sampled coordinate
  The following advanced composition results will be                         (𝛼 )
                                                             𝛼𝑗 ∈ [𝑑], Viewℳ𝑗 (𝐷) ⃗ satisfies (𝜀 ′ , 𝛿 ′ )-DP.
used in our setting to get a tight upper bound:
Theorem 4.2 (Dwork et al. [9]). For all 𝜀 ′ , 𝛿 ′ , 𝛿 ≥ 0, Lemma 4.5. Condition (2) holds.
the class of (𝜀 ′ , 𝛿 ′ )-differentially private mechanisms satis-
                                                                   Proof. See Appendix A.
fies (𝜀, 𝑟𝛿 ′ + 𝛿)-differential privacy under 𝑟-fold adaptive
composition for:                                                      We now show that the above proof can be adjusted to
                                                 ′
                                                                   cover  the additional case 1 ≤ 𝜀 < 6. This will be sufficient
               𝜀 = √2𝑟 log(1/𝛿)𝜀 ′ + 𝑟𝜀 ′ (𝑒 𝜀 − 1).               to complete the proof of our main Theorem 4.1.
                                                                      First, we scale the setting of 𝜀 ′ by a multiple of 6 in
Corollary 4.3. Given target privacy parameters 0 < 𝜀 < 1 Corollary 4.3 so that the advanced composition property
and 𝛿 > 0, to ensure (𝜀, 𝑟𝛿 ′ +𝛿) cumulative privacy loss over holds for all 1 ≤ 𝜀 < 6. We now insert 𝜀 ′ =              𝜀
𝑟 mechanisms, it suffices that each mechanism is (𝜀 ′ , 𝛿 ′ )-DP,                                                 12√2𝑟 log(1/𝛿)
                                                                   into the proof of Theorem 4.1, resulting in a change of
where:
                                       𝜀                           constant from 56 to 2016.
                          𝜀′ =                 .
                                2√2𝑟 log(1/𝛿)
                     ⃗ satisfies (𝜀, 𝛿)-DP it suffices to
                                                                           4.4. Accuracy Bounds for Shuffled Vector
To show that VViewℳ (𝐷)
prove that:                                                                     Sum
                                                                           We now formulate an upper bound for the MSE of our
                   Pr[VViewℳ (𝐷)   ̃
                              ⃗ = V]                                       protocol, and then identify the value(s) of 𝑡 that minimize
  PrV∼VView
    ̃          ⃗ [                     ≥ 𝑒 𝜀 ] ≤ 𝛿. (1)
            ℳ (𝐷)             ⃗ ′   ̃                                      this upper bound.
                   Pr[VViewℳ (𝐷 ) = V]
                                                                                                                                      (𝛼 )       (𝛼 )
                                                                First, note that encoding the coordinate 𝑥𝑖 𝑗 as 𝑥𝑖̄ 𝑗 =
By considering this vector view as a union of overlap-          (𝛼𝑗 )          (𝛼𝑗 )      (𝛼𝑗 )
ping scalar views, and letting 𝑟 = 𝑡 in Corollary 4.3, it is ⌊𝑥𝑖 𝑘⌋ + Ber (𝑥𝑖 𝑘 − ⌊𝑥𝑖 𝑘⌋) in Algorithm 2 ensures
                                                                       (𝛼 )          (𝛼 )
                                                             that 𝔼[𝑥𝑖̄ 𝑗 /𝑘] = 𝔼[𝑥𝑖 𝑗 ]. This means that our protocol
                                                             is unbiased. For any unbiased random variable 𝑋 with



                                                                       5
Mary Scott et al. CEUR Workshop Proceedings                                                                                                    1–10



𝑎 < 𝑋 < 𝑏 then Var[𝑋 ] ≤ (𝑏 − 𝑎)2 /4, and so the MSE                     To obtain the error between the estimated average
per coordinate due to the fixed-point approximation of                 vector and the true average vector, we simply take the
the true vector in ℛ𝑑,𝑘,𝑛,𝑡 is at most 4𝑘1 2 . Meanwhile, the          square root of the result obtained in Theorem 4.6.
MSE when ℛ𝑑,𝑘,𝑛,𝑡 submits a random vector is at most 12                Corollary 4.7. For every statistical query 𝑞 ∶ 𝒳 ↦
per coordinate.                                                        [0, 1]𝑑 , 𝑑, 𝑛 ∈ ℕ, {𝑡 ∈ ℕ | 𝑡 ∈ [𝑑]}, 𝜀 < 6 and 𝛿 ∈ (0, 1],
   We now use the unbiasedness of our protocol to obtain               there is an (𝜀, 𝛿)-DP 𝑛-party unbiased protocol for estimat-
a result for estimating the squared error between the                  ing 𝑛𝑑 ∑𝑖 𝑞(𝑥⃗𝑖 ) in the Single-Message Shuffle Model with
estimated average vector and the true average vector.                  standard deviation
When calculating the MSE, each coordinate location is                                             (2𝑡)1/2 𝑑 4/3 (14 log(1/𝛿) log(2𝑡/𝛿))1/3
used with expectation 𝑛/𝑑. Therefore, we define the                                           ⎧                                            ,
                                                                                                                 (1−𝛾 )𝑛5/6 𝜀 2/3
                      ̂ as the normalization of the                                           ⎪
                                                                                              ⎪
normalized MSE, or MSE,                                                                            when 𝜀 < 1
MSE by a factor of (𝑛/𝑑)2 .                                                  𝜎(𝒫
                                                                              ̂ 𝑑,𝑘,𝑛,𝑡 ) =         1/2 4/3                   1/3
                                                                                              ⎨ (8𝑡) 𝑑 (63 log(1/𝛿) log(2𝑡/𝛿)) ,
                                                                                              ⎪
                                                                                              ⎪             (1−𝛾 )𝑛 5/6 𝜀 2/3
Theorem 4.6. For any 𝑑, 𝑛 ∈ ℕ, {𝑡 ∈ ℕ | 𝑡 ∈ [𝑑]}, 𝜀 < 6
and 𝛿 ∈ (0, 1], there exists a parameter 𝑘 such that 𝒫𝑑,𝑘,𝑛,𝑡                                 ⎩ when 1 ≤ 𝜀 < 6,
is (𝜀, 𝛿)-DP and                                                       where 𝜎̂ denotes the error between the estimated average
                         2𝑡𝑑 8/3 (14 log(1/𝛿) log(2𝑡/𝛿))2/3            vector and the true average vector.
                       ⎧                                    ,
                                   (1−𝛾 )2 𝑛5/3 𝜀 4/3            To summarize, we have produced an unbiased pro-
                       ⎪
                       ⎪ when 𝜀 < 1
     ̂ 𝑑,𝑘,𝑛,𝑡 ) =
     MSE(𝒫                                                    tocol for the computation of the sum of 𝑛 real vectors
                            8/3                     2/3
                       ⎨ 8𝑡𝑑 (63 log(1/𝛿) log(2𝑡/𝛿)) ,        in the Single-Message Shuffle Model with normalized
                       ⎪        (1−𝛾 ) 2 𝑛5/3 𝜀 4/3
                       ⎪                                      MSE 𝑂𝜀,𝛿 (𝑑 8/3 𝑡𝑛−5/3 ), using advanced composition re-
                       ⎩ when 1 ≤ 𝜀 < 6,                      sults from Dwork et al. [9]. Minimizing this bound as a
       ̂ denotes the squared error between the estimated function of 𝑡 leads us to choose 𝑡 = 1, but any choice of 𝑡
where MSE
average vector and the true average vector.                   that is small and not dependent on 𝑑 produces a bound of
                                                              the same order. In our experimental study, we determine
                          𝑑
Proof. We consider the ∑𝑙=1 DeBias (𝑧 ̂(𝑙) ) of 𝒫𝑑,𝑘,𝑛,𝑡 com- that the best choice of 𝑡 in practice is indeed 𝑡 = 1.
                                              𝑡      𝑛      (𝛼 )
pared to the corresponding input ∑𝑗=1 ∑𝑖=1 𝑥𝑖 𝑗 over
            ⃗ We use the bounds on the variance of the                 4.5. Improved bounds for t=1
the dataset 𝐷.
randomized response mechanism from Theorem 4.6 to           We observe that in the optimal case in which 𝑡 = 1, we
give us an upper bound for this comparison.                 can tighten the bounds further, as we do not need to
                                                          2 invoke the advanced composition results when each user
                         𝑑                  𝑡 𝑛
                                     (𝑙)            (𝛼𝑗 )   samples only a single coordinate. This changes the value
MSE(𝒫𝑑,𝑘,𝑛,𝑡 ) = sup E[(∑ DeBias (𝑧 ̂ ) − ∑ ∑ 𝑥𝑖 ) ]
                   ⃗
                   𝐷    𝑙=1               𝑗=1 𝑖=1           of 𝛾 by a factor of 𝑂(log(1/𝛿)), which propagates through
                                                  2
                                                            to the expression for the MSE. That is, we can more
                  𝑡 𝑛
                              (𝛼𝑗 )        (𝛼𝑗 )            simply set 𝜀 ′ = 𝜀 and 𝛿 ′ = 𝛿 in the proof of Theorem 4.1.
    = sup E[(∑ ∑(DeBias (𝑦𝑖 /𝑘) − 𝑥𝑖 )) ]                   When 𝜀 < 1, the computation is straightforward, with
       ⃗
       𝐷        𝑗=1 𝑖=1
              𝑡 𝑛                                           𝑐 ≥ 𝜀14′2 log(2𝑡/𝛿) being chosen as before. However, when
                                     (𝛼 )          (𝛼 ) 2
     = sup ∑ ∑ E[(DeBias (𝑦𝑖 𝑗 /𝑘) − 𝑥𝑖 𝑗 ) ]                          1 ≤ 𝜀 < 6, a tighter 𝑐 ≥ 𝜀80′2 log(2𝑡/𝛿) must be selected, as
         ⃗ 𝑗=1 𝑖=1
         𝐷                                                             the condition 𝜀 ′ < 1 no longer holds.
            𝑡 𝑛
                                       (𝛼 )                              Using 𝜀 ′ < 6, we have:
     = sup ∑ ∑ Var[DeBias (𝑦𝑖 𝑗 /𝑘)]
         ⃗ 𝑗=1 𝑖=1
         𝐷                                                                                                                2            𝜀′
                                                                         (1 − exp (−𝜀 ′ /2)) ≥ (1 − exp (−                   )) 𝜀 ′ ≥      .
            𝑡𝑛             (𝛼 )         𝑡𝑛      1−𝛾 𝛾                                                                   3√15          2√10
     =            sup Var[𝑦1 1 /𝑘] ≤          (     + )
         (1 − 𝛾 )2 (𝛼1)              (1 − 𝛾 )2 4𝑘 2  2 Thus, we have:
                  𝑥1

          𝑡𝑛       1  𝐴𝜀 𝑑𝑘 log(1/𝛿) log(2𝑡/𝛿)                               N𝜃      ′           𝑐                     𝑐 𝜀′ 2
     ≤           ( 2+                          ),                      Pr[      ≥ 𝑒 𝜀 ] ≤ exp ( − (𝜀 ′ /2)2 ) + exp ( − (     ) )
               2
       (1 − 𝛾 ) 4𝑘            (𝑛 − 1)𝜀 2                                     N𝜙                  3                     2 2√10
where 𝐴𝜀 = 28 when 𝜀 < 1, and 𝐴𝜀 = 1008 when 1 ≤ 𝜀 <                                                       80 𝜀 ′2
                                                                                        ≤ 2 exp (−                 log(2𝑡/𝛿)) ≤ 𝛿/𝑡,
6. In other words, 𝐴𝜀 is equal to half the constant term                                                  2𝜀 ′2 40
in the expression of 𝛾 stated in Theorem 4.1. The choice               which yields:
            (𝑛−1)𝜀 2
𝑘 = 4𝐴 𝑑 log(1/𝛿) log(2𝑡/𝛿) minimizes the bracketed sum                                14𝑑𝑘 log(2/𝛿)
                                                                                               27𝑑𝑘
        𝜀                                                                    max{ (𝑛−1)𝜀 2 , (𝑛−1)𝜀  },                    when 𝜀 < 1
above and the bounds in the statement of the theorem                     𝛾 ={    80𝑑𝑘 log(2/𝛿)  36𝑑𝑘
follow.                                                                      max{ (𝑛−1)𝜀 2 , 11(𝑛−1)𝜀 },                   when 1 ≤ 𝜀 < 6.



                                                                   6
Mary Scott et al. CEUR Workshop Proceedings                                                                                                1–10




        (a) Experimental error by number of coordinates 𝑡 re-                  (b) Experimental error by number of buckets 𝑘 used
            tained

                                                                                                         ̂ for the ECG
Figure 1: Bar charts confirming that the choices 𝑡 = 1 (a) and 𝑘 = 3 (b) minimize the total experimental MSE
Heartbeat Categorization Dataset.



   Note that the above expression for 𝛾 in the case 𝜀 < 1                           //www.kaggle.com/shayanfazeli/heartbeat. We analyse
coincides with the result obtained by Balle et al. in the                           the effect of changing one key parameter at a time, whilst
scalar case [2]. Putting this expression for 𝛾 in the proof                         the others remain the same. Our default settings are
of Theorem 4.6, with the choice                                                     vector dimension 𝑑 = 100, rounding parameter 𝑘 = 3,
                                                                                    number of users 𝑛 = 50000, number of coordinates to
                               1/3
                   𝑛𝜀 2                    𝑛𝜀 1/3                                   sample 𝑡 = 1, and differential privacy parameters 𝜀 = 0.95
      ⎧min{(                 )       , (        )     },       when     𝜀 <  1
              28𝑑 log(2/𝛿)                54𝑑                                       and 𝛿 = 0.5. The ranges of all parameters have been
𝑘=                              1/3
      ⎨             𝑛𝜀 2                    11𝑛𝜀 1/3                                adjusted to best display the dependencies, whilst simul-
      ⎩ min{( 160𝑑 log(2/𝛿)
                              )        ,  (  72𝑑
                                                  )      },    when     1 ≤  𝜀 < 6,
                                                                                    taneously ensuring that the parameter 𝛾 of the random-
causes the upper bound on the normalized MSE to reduce ized response mechanism is always within its permitted
to:                                                                                 range of [0, 1]. The Python code is available at https:
                                                                                    //github.com/mary-python/dft/blob/master/shuffle.
                    981/3 𝑑 8/3 log2/3 (2/𝛿)              18𝑑 8/3                      We first confirm that the choice of 𝑡 = 1 is optimal,
           ⎧max{ (1−𝛾 )2 𝑛5/3 𝜀 4/3 , (1−𝛾 )2 𝑛5/3 (4𝜀)2/3 },
           ⎪                                                                        as  predicted by the results of Section 4.5. Indeed, Fig. 1
           ⎪       when 𝜀 < 1
   ̂ =
  MSE                                                                               (a) shows that the total experimental MSE   ̂ for the ECG
           ⎨max{ 2𝑑 8/3 (20 log(2/𝛿))2/3 ,             2(92/3 )𝑑 8/3
                                                                          },
           ⎪                   2  5/3
                         (1−𝛾 ) 𝑛 𝜀      4/3              2  5/3
                                                   (1−𝛾 ) 𝑛 (11𝜀)     2/3           Heartbeat   Categorization   Dataset  is significantly smaller
           ⎪                                                                        when 𝑡 = 1, compared to any other small value of 𝑡.
           ⎩       when      1  ≤  𝜀    <   6.
                                                                                       Similarly, Fig. 1 (b) suggests that the total experimental
   By updating Corollary 4.7 in the same way, we can MSE                             ̂ is lowest when 𝑘 = 3, which is sufficiently close to
conclude that for the optimal choice 𝑡 = 1, the normal- the choice of 𝑘 selected in the proof of Theorem 4.6, with
ized standard deviation of our unbiased protocol can be all other default parameter values substituted in. Observe
further tightened to:                                                               that the absolute value of the observed MSE is below 0.3
                                                                                    in this case, meaning that the vector is reconstructed to a
                981/6 𝑑 4/3 log1/3 (2/𝛿)            181/2 𝑑 4/3                     high degree of accuracy, sufficient for many applications.
         ⎧ max{                               ,                    },
                     (1−𝛾 )𝑛5/6 𝜀 2/3           (1−𝛾 )𝑛5/6 (4𝜀)1/3
         ⎪
         ⎪                                                                             Next, we verify the bounds of 𝑑 8/3 and 𝑛−5/3 from
                when 𝜀 < 1                                                          Theorem    4.6. Fig. 2 (a) is plotted with a best fit curve
    𝜎̂ =
         ⎨max{ 21/2 𝑑 4/3 (20 log(2/𝛿))1/3 , 21/2 91/3 𝑑 4/3 },                     with  equation  a multiple of 𝑑 8/3 , exactly as desired. Un-
         ⎪              (1−𝛾 )𝑛5/6 𝜀 2/3           (1−𝛾 )𝑛5/6 (11𝜀)1/3
         ⎪                                                                          surprisingly, the MSE increases as 𝑑 goes up according
         ⎩      when 1 ≤ 𝜀 < 6.
                                                                                    to this superlinear dependence. Meanwhile, Fig. 2 (b)
                                                                                    fits a curve dependent on 𝑛−7/6 , sufficiently close to the
                                                                                    required result. We see the benefit of increasing 𝑛: as 𝑛
5. Experimental Evaluation                                                          increases by a factor of 10 across the plot, the error de-
                                                                                    creases by more than two orders of magnitude. In Fig. 3,
In this section we present and compare the bounds gen- we verify the dependency 𝜀 −4/3 in the two ranges 𝜀 < 1
erated by applying Algorithms 2 and 3 to an ECG Heart- and 1 ≤ 𝜀 < 6. The behavior for 𝜀 < 1 is quite smooth,
beat Categorization Dataset in Python, available at https:



                                                                        7
Mary Scott et al. CEUR Workshop Proceedings                                                                                  1–10




            (a) Experimental error by vector dimension 𝑑              (b) Experimental error by number of vectors 𝑛 used

Figure 2: Bar charts with best fit curves confirming the dependencies 𝑑 8/3 (a) and 𝑛−5/3 (b) from Theorem 4.6.




           (a) Experimental error by value of 𝜀 where 𝜀 < 1           (b) Experimental error by value of 𝜀 where 1 ≤ 𝜀 < 6

Figure 3: Bar charts with best fit curves confirming the dependency 𝜀 −4/3 from Theorem 4.6 in the two ranges 𝜀 < 1 (a) and
1 ≤ 𝜀 < 6 (b).



but becomes more variable for larger 𝜀 values.                      addition of a new dimension 𝑑 introduces a new depen-
   In conclusion, these experiments confirm that picking            dency for the bound, as well as the possibility of sampling
𝑡 = 1 and 𝑘 = 3 serves to minimize the error. The lines             𝑡 coordinates from each 𝑑-dimensional vector. For this
of best fit confirm the dependencies on the other param-            extension, we formally defined the vector view as the
eters from Section 4 for 𝑑, 𝜀 and 𝑛, by implementing and            knowledge of the analyzer upon receiving the random-
applying Algorithms 2 and 3 to an ECG Heartbeat Catego-             ized vectors, and expressed it as a union of overlapping
rization Dataset in Python. The experiments demonstrate             scalar views. Through the use of advanced composition
that the MSE observed in practice is sufficiently small to          results from Dwork et al. [9], we showed that the estima-
allow effective reconstruction of average vectors for a             tor now has normalized MSE 𝑂𝜀,𝛿 (𝑑 8/3 𝑡𝑛−5/3 ) which can
suitably large cohort of users.                                     be further improved to 𝑂𝜀,𝛿 (𝑑 8/3 𝑛−5/3 ) by setting 𝑡 = 1.
                                                                       Our contribution has provided a stepping stone be-
                                                                    tween the summation of the scalar case discussed by
6. Conclusion                                                       Balle et al. [2] and the linearization of more sophisticated
Our results extend a result from Balle et al. [2] for scalar structures such as matrices and higher-dimensional ten-
sums to provide a protocol 𝒫𝑑,𝑘,𝑛,𝑡 in the Single-Message sors, both of which are reliant on the functionality of the
Shuffle Model for the private summation of vector-valued vector case. As mentioned in Section 2, there is poten-
messages (𝑥⃗1 , … , 𝑥⃗𝑛 ) ∈ ([0, 1]𝑑 )𝑛 . It is not surprising that tial for further exploration in the Multi-Message Shuffle
the normalized MSE of the resulting estimator has a de- Model to gain additional privacy, echoing the follow-up
pendence on 𝑛−5/3 , as this was the case for scalars, but the paper of Balle et al. [17].




                                                               8
Mary Scott et al. CEUR Workshop Proceedings                                                                                                      1–10



A. Proof of Lemma 4.5                                                                    Thus we have:

Lemma 4.5. Condition (2) holds.                                                                N𝜃      ′           𝑐                     𝑐
                                                                                         Pr[      ≥ 𝑒 𝜀 ] ≤ exp ( − (𝜀 ′ /2)2 ) + exp ( − (𝜀 ′ /√7)2 )
                                                                                               N𝜙                  3                     2
Proof. The way in which we split the vector view (i.e., to
consider a single uniformly sampled coordinate of each                                14 𝜀 ′2
                                                                          ≤ 2 exp (− ′2       log(2𝑡/𝛿)) ≤ 𝛿/𝑡.
vector-valued message in turn), means that we can apply                              2𝜀 7
a proof that is analogous to the scalar-valued case [2].
                                                              We now apply another Chernoff bound to show that
We work through the key steps needed.
                                                            𝑠 ≤  2E[𝑠], which can be used to give a bound on 𝛾.
  Recall from Section 4.1 that the case where the 𝑛th user
                                                            The following calculation proves that Pr[𝑠 ≥ 2E(𝑠)] ≤
submits a uniformly random message independent of
                                                            exp(−E(𝑠)/3), using E(𝑠) = (𝑛 − 1)𝑡/𝑑:
their input satisfies DP trivially. Otherwise, the 𝑛th user
submits their true message, and we assume that analyzer                             𝑛−1                  𝑛
                                                            Pr[𝑠 ≥ 2E(𝑠)] ≤ exp ( −      𝑡/𝑑) ≤ exp ( − ) < 𝛿/3𝑡,
removes from 𝑌⃗ (𝛼𝑗 ) any truthful messages associated with                           3                  3
                                                 (𝛼 )
the first 𝑛 − 1 users. Denote 𝑛𝑙 𝑗 to be the count of 𝑗 th                               for all reasonable values of 𝛿.
coordinates remaining with a particular value 𝑙 ∈ [𝑘]. If                                   Substituting these bounds on 𝑠 and 𝑐 into 𝛾 𝑠/𝑘 = 𝑐
 (𝛼 )                 ′(𝛼 )                                                                                    𝜀
𝑥⃗𝑛 𝑗 = 𝜃 and 𝑥⃗𝑛 𝑗 = 𝜙, we obtain the relationship                                      along with 𝜀 ′ =              gives:
                                                                                                            2√2𝑡 log(1/𝛿)
                                  (𝛼 )                       (𝛼 )
                                ⃗ = 𝑉𝛼 ]
                     Pr[Viewℳ𝑗 (𝐷)                          𝑛𝜃 𝑗                               112𝑘𝑡 log(1/𝛿) log(2𝑡/𝛿) 56𝑑𝑘 log(1/𝛿) log(2𝑡/𝛿)
                                      𝑗
                                                        =           .                    𝛾≥                            ≥                        .
                               (𝛼 )                          (𝛼 )                                         𝑠𝜀 2                 (𝑛 − 1)𝜀 2
                                ⃗ ′ ) = 𝑉𝛼 ]
                     Pr[Viewℳ𝑗 (𝐷                           𝑛𝜙 𝑗
                                          𝑗


                                              (𝛼 )      (𝛼 )
We observe that the counts 𝑛𝜃 𝑗 and 𝑛𝜙 𝑗 follow the bino-
                                                 𝛾                               𝛾
mial distributions N𝜃 ∼ Bin (𝑠, 𝑘 ) + 1 and N𝜙 ∼ Bin (𝑠, 𝑘 )
respectively, where 𝑠 denotes the number of times that
the coordinate 𝑗 is sampled. In expectation, 𝑠 = (𝑛 − 1)𝑡/𝑑,
and below we will show that it is close to its expectation:

                                         (𝛼 )
                                              ⃗ = V𝛼 ]
                                  Pr[Viewℳ𝑗 (𝐷)     𝑗                      ′
        Pr             (𝛼 )
                       𝑗 ⃗    [                                         ≥ 𝑒𝜀 ]
             V𝛼𝑗 ∼Viewℳ (𝐷)              (𝛼 ) ′
                                  Pr[Viewℳ𝑗 (𝐷⃗ ) = V𝛼 ]
                                                      𝑗

                   N𝜃     ′
        = Pr[         ≥ 𝑒𝜀 ] .
                   N𝜙
                                         𝛾
We define 𝑐 ∶= E[N𝜙 ] = 𝑘 ⋅ 𝑠 and split this into the union
of two events, 𝑁𝜃 ≥ 𝑐𝑒             𝜀 ′ /2                      ′
                                             and 𝑁𝜙 ≤ 𝑐𝑒 −𝜀 /2 . Applying a
Chernoff bound gives:

                   N𝜃      ′         𝑐 ′               1 2
             Pr[      ≥ 𝑒 𝜀 ] ≤ exp(− (𝑒 𝜀 /2 − 1 − ) )
                   N𝜙                3                 𝑐
                                     𝑐       −𝜀 ′ /2 2
                              + exp(− (1 − 𝑒        ) ).
                                     2

We will choose 𝑐 ≥ 𝜀14′2 log(2𝑡/𝛿) so that we have:

                              1 𝜀 ′ 𝜀 ′2        𝜀 ′2     𝜀′
  exp (𝜀 ′ /2) − 1 −            ≥   +    −              ≥ .
                              𝑐   2   8    14 log(2𝑡/𝛿)  2

Using 𝜀 ′ < 1, we have:

                                                                         𝜀′
        (1 − exp (−𝜀 ′ /2)) ≥ (1 − exp (−1/2))𝜀 ′ ≥                         .
                                                                         √7



                                                                                     9
Mary Scott et al. CEUR Workshop Proceedings                                                                          1–10



References                                                          J. Tinnes, B. Seefeld, PROCHLO: Strong privacy
                                                                    for analytics in the crowd, in: Proceedings of the
 [1] C. Dwork, Differential privacy, in: Proceedings                26th Symposium on Operating Systems Principles,
     of the 33rd International Colloquium on Automata,              ACM, New York City, 2017, pp. 441–459.
     Languages and Programming (ICALP), Springer,              [14] A. Cheu, A. Smith, J. Ullman, D. Zeber, M. Zhilyaev,
     Cham, 2006, pp. 1–12.                                          Distributed differential privacy via shuffling, in:
 [2] B. Balle, J. Bell, A. Gascón, K. Nissim, The privacy           Annual International Conference on the Theory
     blanket of the shuffle model, in: Annual Inter-                and Applications of Cryptographic Techniques,
     national Cryptology Conference, Springer, Cham,                Springer, Cham, 2019, pp. 375–403.
     2019, pp. 638–667.                                        [15] B. Balle, J. Bell, A. Gascón, K. Nissim, Im-
 [3] B. McMahan, E. Moore, D. Ramage, S. Hampson,                   proved summation from shuffling, arXiv preprint
     B. A. y Arcas, Communication-efficient learning of             arXiv:1909.11225, 2019.
     deep networks from decentralized data, in: Artifi-        [16] Y. Ishai, E. Kushilevitz, R. Ostrovsky, A. Sahai, Cryp-
     cial Intelligence and Statistics Conference, PMLR,             tography from anonymity, in: 47th Annual IEEE
     New York City, 2017, pp. 1273–1282.                            Symposium on Foundations of Computer Science,
 [4] M. Abadi, A. Chu, I. Goodfellow, Deep learning                 IEEE, New York City, 2006, pp. 239–248.
     with differential privacy, in: Proceedings of the         [17] B. Balle, J. Bell, A. Gascón, K. Nissim, Private sum-
     2016 ACM SIGSAC Conference on Computer and                     mation in the multi-message shuffle model, in:
     Communications Security, ACM, New York City,                   Proceedings of the 2020 ACM SIGSAC Conference
     2016, pp. 308–318.                                             on Computer Communications and Security, ACM,
 [5] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone,              New York City, 2020, pp. 657–676.
     B. McMahan, S. Patel, D. Ramage, A. Segal, K. Seth,       [18] B. Ghazi, N. Golowich, R. Kumar, R. Pagh, A. Vel-
     Practical secure aggregation for privacy-preserving            ingker, On the power of multiple anonymous mes-
     machine learning, in: Proceedings of the 2017 ACM              sages, in: Advances in Cryptology—EUROCRYPT
     SIGSAC Conference on Computer and Communi-                     2021, Springer, Cham, 2021, pp. 463–488.
     cations Security, ACM, New York City, 2017, pp.           [19] B. Ghazi, P. Manurangsi, R. Pagh, A. Velingker, Pri-
     1175–1191.                                                     vate aggregation from fewer anonymous messages,
 [6] L. Sweeney, k-anonymity: A model for protect-                  in: Advances in Cryptology—EUROCRYPT 2020,
     ing privacy, International Journal of Uncertainty,             Springer, Cham, 2020, pp. 798–827.
     Fuzziness and Knowledge-Based Systems 10 (2002)           [20] P. Kairouz, S. Oh, P. Viswanath, Extremal mecha-
     557–570.                                                       nisms for local differential privacy, The Journal of
 [7] A. Machanavajjhala, D. Kifer, J. Gehrke, M. Venki-             Machine Learning Research 17 (2016) 492–542.
     tasubramaniam, l-diversity: Privacy beyond k-             [21] P. Kairouz, K. Bonawitz, D. Ramage, Discrete dis-
     anonymity, in: ACM Transactions on Knowledge                   tribution estimation under local privacy, in: Pro-
     Discovery from Data (TKDD), ACM, New York City,                ceedings of the 33rd International Conference on
     2007, pp. 3–es.                                                Machine Learning, volume 48, ACM, New York City,
 [8] N. Li, T. Li, S. Venkatasubramanian, t-closeness: Pri-         2016, pp. 2436–2444.
     vacy beyond k-anonymity and l-diversity, in: 2007         [22] A. Bhowmick, J. Duchi, J. Freudiger, G. Kapoor,
     IEEE 23rd International Conference on Data Engi-               R. Rogers, Protection against reconstruction and
     neering, IEEE, New York City, 2007, pp. 106–115.               its applications in private federated learning, arXiv
 [9] C. Dwork, A. Roth, The algorithmic foundations                 preprint arXiv:1812.00984, 2018.
     of differential privacy, Foundations and Trends in
     Theoretical Computer Science 9 (2014) 211–407.
[10] Ú. Erlingsson, V. Pihur, A. Korolova, RAPPOR: Ran-
     domized aggregatable privacy-preserving ordinal
     response, in: Proceedings of the 2014 ACM SIGSAC
     Conference on Computer and Communications Se-
     curity, ACM, New York City, 2014, pp. 1054–1067.
[11] A. D. P. Team, Learning with privacy at scale, 2017.
[12] B. Ding, J. Kulkarni, S. Yekhanin, Collecting teleme-
     try data privately, in: Advances in Neural Infor-
     mation Processing Systems, ACM, New York City,
     2017, pp. 3571–3580.
[13] A. Bittau, Ú. Erlingsson, P. Maniatis, I. Mironov,
     A. Raghunathan, D. Lie, M. Rudominer, U. Kode,



                                                          10