Privacy Amplification for Episodic Training Methods Vandy Tombs1 , Olivera Kotevska1 and Steven Young1 1 Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN 37830 Abstract It has been shown that differential privacy bounds improve when subsampling within a randomized mechanism. Episodic training, utilized in many standard machine learning techniques, uses a multistage subsampling procedure which has not been previously analyzed for privacy bound amplification. In this paper, we focus on improving the calculation of privacy bounds in episodic training by thoroughly analyzing privacy amplification due to subsampling with a multi-stage subsam- pling procedure. The newly developed bound can be incorporated into existing privacy accounting methods. 1. Introduction the privacy amplification due to subsampling as well as a complete analysis for Poisson and simple random subsam- As more data is being utilized by algorithms and machine pling both with and without replacement subsampling learning techniques, rigorously maintaining the privacy methods. of this data has become important. Cyber security, health, The subsampling methods analyzed previously include and census data collection are all examples of fields that many of the subsampling methods utilized by machine are seeing increased scrutiny for ensuring the privacy learning; however, the methods does not capture batches of data, and it is well known that just anonymizing the formed by algorithms that use episodic training. Episodic data by removing features such as name is not sufficient training methods are utilized by a variety of machine to guarantee privacy due to vulnerabilities such as re- learning algorithms, such as meta learning (e.g., [8, 9]) identification attacks, especially in the case when an or metric learning (e.g., [10, 11]) algorithms. Domain adversary has access to auxiliary knowledge or data (see generalization algorithms have also frequently utilized e.g. [1, 2]). episodic training [12]. Differential privacy, first introduced by Dwork, is one In this paper, we analyze the privacy amplification technical definition of privacy that has been studied due to the subsampling method utilized in an episodic widely in the literature [3, 4]. This definition provides training regime. Specifically, we notice forming batches rigorous guarantees for the privacy of data that is uti- in episodic training is a multistage subsampling method, lized by an algorithm and has several nice properties like and we provide a complete analysis of the improved dif- robustness to post processing and strong composition ferential privacy bounds when applying a mechanism to theorems. a sample drawn using multistage subsampling. The re- Machine learning practitioners initially integrated dif- sulting theorem can be easily applied to episodic training ferential privacy by naively applying these composition methods and integrated with privacy accounting meth- theorems algorithm by assuming that the algorithm ac- ods such as the moment’s accountant [5]. This bound cessed the entire training set on each step of training. can also be utilized by practitioners of other domains Abadi et al. [5] noticed the data is subsampled into that use multistage subsampling within their algorithms. batches, so only a subset of the data is utilized for each step of training. This allowed for improved privacy bounds; however, they assumed that batches were created us- 2. Background and Related Works ing Poisson sampling. Later authors showed improved bounds for creating batches using simple random sam- 2.1. Multistage Subsampling pling without replacement [6]. And most recently, Balle In a multistage sampling procedure, the universe from et al. [7] provided a fully unified theory for determining which samples are drawn is partitioned. These partitions This manuscript has been authored by UT-Battelle, LLC, under may contain the examples we are ultimately interested contract DE-AC05-00OR22725 with the US Department of En- in sampling or may contain one or several levels of parti- ergy (DOE). The publisher acknowledges the US government li- tions. The subsampling procedure is to sample partitions cense to provide public access under the DOE Public Access Plan at each level until examples are sampled. For example, (http://energy.gov/downloads/doe-public-access-plan). Research sponsored by the Laboratory Directed Research and De- if we are interested in the demographics of students at velopment Program of Oak Ridge National Laboratory, managed by a school, we could partition students by teacher, sample UT-Battelle, LLC, for the U. S. Department of Energy. some number of teachers and then sample students from each sampled teacher. Β© 2022 Author:Pleasefillinthe\copyrightclause macro To see that episodic training is a multistage subsam- CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) β€² pling procedure, consider how training batches are formed Given πœ€ > 0, let πœ€β€² = π‘™π‘œπ‘”(1+πœ‚(π‘’πœ€ βˆ’1)) and 𝛽 = π‘’πœ€ /π‘’πœ€ , in Algorithm 2 of [8]. In this work, a subset of tasks are the following holds: sampled from a collection of tasks, then the examples are sampled and provided to the training algorithm. This is π·π‘’πœ€β€² (πœ‡||πœ‡β€² ) = πœ‚π·π‘’πœ€ (πœ‡1 ||(1 βˆ’ 𝛽)πœ‡0 + π›½πœ‡β€²1 ) a 2-stage sampling procedure since the training data is The final theorem provides the concrete privacy ampli- only partitioned into two levels: tasks and examples. In fication that we need for our analysis. Before presenting multistage subsampling, the first level of partitions are this, we need to define when two distributions 𝜐, 𝜐 β€² ∈ the primary sampling units and the final level is called P(π‘Œ ) are π‘‘π‘Œ -compatible. Let πœ‹ be a coupling of 𝜐, 𝜐 β€² , the ultimate sampling units and this final level contains define π‘‘π‘Œ (𝑦, 𝑦 β€² ) = π‘‘π‘Œ (𝑦, supp(𝜐 β€² )) where (𝑦, 𝑦 β€² ) ∈ the examples we are ultimately interested in sampling. supp(πœ‹) and the distance between a point 𝑦 and supp(𝜐 β€² ) For more details on multistage subsampling, see e.g., [13]. is defined to be the distance between 𝑦 and the closest point in supp(𝜐 β€² ). 2.2. Differential Privacy Theorem 2. Let 𝐢(𝜐, 𝜐 β€² ) be the set of all couplings Since our analysis utilizes the tools of Balle et al. [7], we between 𝜐 and 𝜐 β€² and for π‘˜ β‰₯ 1 let π‘Œπ‘˜ = {𝑦 ∈ supp(𝜐) : introduce the necessary notations and definitions from π‘‘π‘Œ (𝑦, supp(𝜐 β€² )) = π‘˜}. If 𝜐 and 𝜐 β€² are π‘‘π‘Œ -compatible, it. Let 𝒰 be an input space equipped with a binary sym- then the following holds: metric relation ≃𝒰 that describes the concept of neigh- βˆ‘οΈ βˆ‘οΈ boring inputs. For our purposes, 𝒰 is a universe that the πœ‹βˆˆπΆ(𝜐,𝜐 min β€² ) πœ‹π‘¦,𝑦′ 𝛿ℳ,π‘‘π‘Œ (𝑦,𝑦′ ) (πœ€) = 𝜐(π‘Œπ‘˜ )𝛿ℳ,π‘˜ (πœ€) training data is drawn from and the relation will be the 𝑦,𝑦 β€² π‘˜β‰₯1 add-one/remove-one relation, thus two training sets are We are now equipped to begin an analysis of the pri- related if they differ by the addition or removal of one vacy amplification due to multistage subsampling. element. Given a randomized algorithm or mechanism β„³ : 𝑋 β†’ P(𝑍), where P(𝑍) is the set of probability mea- 3. OUR APPROACH: Privacy sures on the output space 𝑍, β„³ is (πœ€, 𝛿)-differentially private w.r.t ≃𝒰 if for every pair 𝒯 ≃𝒰 𝒯 β€² and every Bounds for Multistage measurable subset 𝐸 βŠ† 𝑍, Sampling Analysis Pr[β„³(𝒯 ) ∈ 𝐸] ≀ π‘’πœ€ Pr[β„³(𝒯 β€² ) ∈ 𝐸] + 𝛿. We will begin the analysis with an example. Through this example, we will introduce the notation necessary Utilizing the tools from [7] requires expressing differen- for the general analysis. tial privacy in terms of 𝛼-divergence given by Example 3.1. Let 𝒰 be a universe of 18 examples from 𝐷𝛼 (πœ‡||πœ‡β€² ) := sup(πœ‡(𝐸) βˆ’ π›Όπœ‡(𝐸)) which the database or training data is drawn from. Sup- 𝐸 pose we can categorize the data from the universe at 3 of two probability measures πœ‡, πœ‡ ∈ P(𝑍), where 𝐸 different levels, so we will perform a 3-stage sampling. β€² ranges over all measurable subsets of 𝑍. Differential pri- Let vacy can then be stated in terms of 𝛼-divergence; specif- 𝒰 = π‘ˆ1 βˆͺ π‘ˆ2 ically, a mechanism β„³ is (πœ€, 𝛿)-differentially private if = (π‘ˆ11 βˆͺ π‘ˆ12 βˆͺ π‘ˆ13 ) βˆͺ (π‘ˆ21 βˆͺ π‘ˆ22 ) and only if π·π‘’πœ– (β„³(𝒯 )||β„³(𝒯 β€² )) ≀ 𝛿 for every adjacent datasets 𝒯 ≃𝒰 𝒯 β€² . (οΈ€ = {𝑒111 , 𝑒112 , 𝑒113 , 𝑒114 } βˆͺ {𝑒121 , 𝑒122 } We can now define the privacy profile of a mechanism )οΈ€ (οΈ€ βˆͺ {𝑒131 , 𝑒132 , 𝑒133 } βˆͺ {𝑒211 , 𝑒212 , 𝑒213 , 𝑒214 } β„³ as 𝛿ℳ = sup𝒯 ≃𝒰 𝒯 β€² π·π‘’πœ– (β„³(𝒯 )||β„³(𝒯 β€² )), which )οΈ€ associates each privacy parameter 𝛼 = π‘’πœ€ with a bound βˆͺ {𝑒221 , 𝑒222 , 𝑒223 , 𝑒224 , 𝑒225 } on the 𝛼-divergence between the results of the mecha- In this example, π‘ˆπ‘–1 for 𝑖 ∈ {1, 2} are the primary sam- nism on two adjacent datasets. pling units, the π‘ˆπ‘–1 𝑖2 are the ultimate sampling units and Two theorems from [7] are important in our analysis. the 𝑒𝑖1 𝑖2 𝑖3 are the examples that would be provided to a The first is Advanced Joint Convexity, which we restate intraining algorithm. terms of 𝛼 = π‘’πœ€ since we are interested in applying this In general, let 𝒰 be a universe from which the training theorem to improve the privacy bounds due to multistage data is drawn and suppose a finite number of levels, 𝑁𝐿 , subsampling. partition this universe. Define π‘ˆπ‘– be the primary sam- Theorem 1. ([7], Advanced Joint Convexity of 𝐷𝑒 ) Let pling units and let π‘ˆπ‘–1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 be the sampling units of πœ€ πœ‡, πœ‡β€² ∈ P(𝑍) be measures satisfying πœ‡ = (1 βˆ’ πœ‚)πœ‡0 + the π‘ˆπ‘–1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’2 unit. π‘ˆπ‘–1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 is an ultimate sam- πœ‚πœ‡1 and πœ‡β€² = (1 βˆ’ πœ‚)πœ‡0 + πœ‚πœ‡β€²1 for some πœ‚, πœ‡0 , πœ‡1 , πœ‡β€²1 . pling unit which contain the examples we are interest in sampling. Note that we require that each sampling unit Coupling Theorem from [7]. We just need to compute: be of finite size except the ultimate sampling units, which βˆ‘οΈ may be infinite. The multistage sampling procedure can 𝑇 𝑉 (πœ‡, πœ‡β€² ) = 1 βˆ’ min (πœ‡(𝑒), πœ‡β€² (𝑒)) be described by Algorithm 1: Multistage Sampling. Most π‘’βˆˆπ‘ˆ episodic training procedures only use 2- or 3-stage sam- pling but we analyze the general case; which may have Note we can easily extend our probability measures πœ‡, πœ‡β€² applications to other scientific domains (e.g. medical do- to the entire universe by setting the inclusion probability mains) where multistage sampling may have more levels. to 0 for any element not in 𝑇 or 𝑇 β€² . For all elements 𝑑 ∈ 𝒯 β€² βˆ– 𝑇𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 , we have min(πœ‡(𝑑), πœ‡β€² (𝑑)) = πœ‡(𝑑) = πœ‡β€² (𝑑). Since 𝑑𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 1 ̸∈ 𝒯 β€² , we also have Algorithm 1: Multistage Sampling min(πœ‡(𝑑𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 1 ), πœ‡β€² (𝑑𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 1 )) = 0 . So we Set 𝑃 π‘Ÿπ‘’π‘£πΏπ‘’π‘£π‘’π‘™ := π‘ˆπ‘– ⋃︀ just need to consider the elements of the ultimate unit Set πΆπ‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘’π‘£π‘’π‘™ := βˆ… from which we removed an element. Since, we removed Given 𝑛𝑗 : the number of units to be sampled at an element from this unit, the probability πœ‡β€² (𝑑) > πœ‡(𝑑) each level (1 ≀ 𝑗 ≀ 𝑁𝐿 ) since 𝑇𝑖′1 𝑖2 ···𝑖𝑁 (the ultimate unit missing an element for 𝑗 ∈ {1, ...., 𝑁𝐿 } do πΏβˆ’1 for 𝑆 ∈ PrevLevel do in 𝑇 β€² ) has fewer elements than 𝑇𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 , therefore sample without replacement 𝑛𝑗 elements for all 𝑑𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 𝑖 ∈ 𝑇𝑖′1 𝑖2 ···𝑖𝑁 and 𝑖 ΜΈ= 1, we have from 𝑆 πΏβˆ’1 πœ‡(𝑑𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 𝑖 ) < πœ‡β€² (𝑑′𝑖1 𝑖2 ···𝑖𝑁 𝑖 ) where add sampled elements to πΆπ‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘’π‘£π‘’π‘™ πΏβˆ’1 end βˆοΈ€π‘πΏ 𝑃 π‘Ÿπ‘’π‘£πΏπ‘’π‘£π‘’π‘™ = πΆπ‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘’π‘£π‘’π‘™ 𝑗=1 𝑛𝑗 πœ‡(𝑑𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 𝑖 ) = πΆπ‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘πΏπ‘’π‘£π‘’π‘™ := βˆ… |π‘ˆπ‘–1 ||π‘ˆπ‘–1 𝑖2 | Β· Β· Β· |𝑇𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 | end βˆοΈ€π‘πΏ 𝑗=1 𝑛𝑗 πœ‡β€² (𝑑′𝑖1 𝑖2 ···𝑖𝑁 𝑖) = . πΏβˆ’1 |π‘ˆπ‘–1 ||π‘ˆπ‘–1 𝑖2 | Β· Β· Β· |𝑇𝑖′1 𝑖2 ···𝑖𝑁 | Now, let 𝒯 βŠ‚ 𝒰 be the training data or database we πΏβˆ’1 are analyzing. We will require that the training data has Thus at least one element from each sampling unit described βˆ‘οΈ βˆ‘οΈ above. Thus we only allow the ultimate sampling units min (πœ‡(𝑒), πœ‡β€² (𝑒)) = πœ‡(𝑑) = 1βˆ’πœ‡(𝑑𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 1 ). of the training data 𝑇𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 βŠ‚ π‘ˆπ‘–1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 , to be π‘’βˆˆπ’° π‘‘βˆˆπ’― β€² a non-empty finite subset of the ultimate sampling units Hence the total variational distance is just the inclusion with at least 𝑛𝑁𝐿 elements (i.e. at least the number of probability of the element we removed. Determining the units that will be sampled from the ultimate sampling total variational distance when adding an element from units). All other sampling units defined for the universe 𝒰 to 𝒯 is similar to the above argument. will remain the same for the training set. We can now provide an amplified privacy bound for We want to analyze the privacy bound on algorithms multistage subsampling. that use a multistage subsampling procedure on 𝒯 . To do this, we will apply the theorems from [7] and will analyze Theorem 3. Let β„³β€² be a subsampled mechanism on 𝒯 this sampling procedure under the add-one/remove one described by Algorithm 1 and let π‘š1 π‘š2 . . . π‘šπ‘πΏ βˆ’1 be relation. We begin by defining a probability measure the index of the penultimate sampling unit that satisfies for this sampling procedure. We can do this by simply defining min (|π‘ˆπ‘–1 ||π‘ˆπ‘–1 𝑖2 | Β· Β· Β· |𝑇𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 |). 𝑖1 ,𝑖2 ,...,𝑖𝑁 πΏβˆ’1 βˆοΈ€π‘πΏ 𝑗=1 𝑛𝑗 Then, for any πœ– β‰₯ 0, we have that 𝛿ℳ′ (πœ–β€² ) ≀ πœ‚π›Ώβ„³β€² (πœ–) πœ‡(𝑑𝑖1 𝑖2 ···𝑖𝑁𝐿 ) = |π‘ˆπ‘–1 ||π‘ˆπ‘–1 𝑖2 | Β· Β· Β· |𝑇𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 | βˆοΈ€π‘πΏ 𝑛𝑗 for and πœ‚ = |π‘ˆπ‘š ||π‘ˆπ‘š π‘š |Β·Β·Β·|𝑇 𝑗=1 π‘š π‘š Β·Β·Β·π‘š | and πœ–β€² = 1 1 2 1 2 π‘πΏβˆ’1 where 𝑑𝑖1 𝑖2 ···𝑖𝑁𝐿 is in the ultimate unit 𝑇𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 . π‘™π‘œπ‘”(1 + πœ‚(π‘’πœ– βˆ’ 1)) under the add-one/remove-one rela- Now consider 𝒯 β€² created by removing one element tion. from 𝒯 , say without loss of generality, 𝑑𝑖1 𝑖2 Β·Β·Β·π‘–π‘πΏβˆ’1 1 for To fully complete the proof, let 𝒯 , 𝒯 β€² be training sets some 𝑖1 , 𝑖2 , ..., π‘–π‘πΏβˆ’1 . The probability measure πœ‡β€² for drawn from 𝒰 with 𝒯 β‰ƒπ‘Ÿ 𝒯 β€² under the add-one/remove- sampling from 𝒯 β€² can be defined similar to above. We one relation β‰ƒπ‘Ÿ and let 𝑆𝛾 (𝒯 ) denote the subsampling wish to compute the total variational distance between mechanism described by Algorithm 1 for 𝛾 = 𝑇 𝑉 (πœ‡, πœ‡β€² ). these two measure so that we can apply the Advanced Let 𝒯0 = 𝒯 ∩ 𝒯 β€² , then by definition of β‰ƒπ‘Ÿ , 𝒯0 = 𝒯 or 𝒯0 = 𝒯 β€² . Let πœ”0 = 𝑆𝛾 (𝒯0 ), πœ‡ = 𝑆𝛾 (𝒯 ) and πœ‡β€² = 𝑆𝛾 (𝒯 β€² ). Then the decompositions of πœ‡ and πœ‡β€² induced [4] C. Dwork, Differential privacy: A survey of re- by their maximal coupling have that πœ‡1 = πœ”0 when sults, in: M. Agrawal, D. Du, Z. Duan, A. Li (Eds.), 𝒯0 = 𝒯 or πœ‡β€²1 = πœ”0 when 𝒯0 = 𝒯 β€² . We only need to Theory and Applications of Models of Computa- consider 𝒯0 = 𝒯 β€² since this is when the maximum is tion, Springer Berlin Heidelberg, Berlin, Heidelberg, obtained in applying advanced joint convexity. Finally, 2008, pp. 1–19. we note that one can easily create a π‘‘β‰ƒπ‘Ÿ -compatible pair [5] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, according to the definition provided in [7] by first sam- I. Mironov, K. Talwar, L. Zhang, Deep learning pling 𝑦 from πœ‡ and building 𝑦 β€² by adding 𝑣 (which may with differential privacy, in: Proceedings of the be empty) to 𝑦. Thus for each dataset pair, by Theorem 2016 ACM SIGSAC Conference on Computer and 7 of [7], we have 𝛿𝑀 β€² (πœ€β€² ) ≀ 𝛾𝛿𝑀 (πœ€). In order to get a Communications Security, CCS ’16, Association for bound for all possible training set pairs, we need to take Computing Machinery, New York, NY, USA, 2016, πœ‚ = π‘šπ‘–π‘›(𝒯 ,𝒯 β€² ) (𝛾𝒯 β‰ƒπ‘Ÿ 𝒯 β€² ). This occurs exacty when we p. 308–318. URL: https://doi.org/10.1145/2976749. remove an element from the penultimate unit with index 2978318. doi:10.1145/2976749.2978318. π‘š1 π‘š2 Β· π‘šπ‘πΏ βˆ’1 which completes the proof. [6] Y.-X. Wang, B. Balle, S. Kasiviswanathan, We briefly mention how one might incorporate this Subsampled rΓ©nyi differential privacy and new bound into a privacy accounting method. Many analytical moments accountant, Journal of accounting methods, like the moments accountant [5], Privacy and Confidentiality 10 (2021). URL: https: use the moment generating function in conjunction with //journalprivacyconfidentiality.org/index.php/jpc/ the Gaussian mechanism to calculate the privacy bounds article/view/723. doi:10.29012/jpc.723. while a machine learning algorithm is training. Using [7] B. Balle, G. Barthe, M. Gaboardi, Privacy amplifica- Theorem 4 from [7] with our new bound one can easily tion by subsampling: Tight analyses via couplings derive a subsampled Gaussian that can be utilized in and divergences, in: Proceedings of the 32nd In- algorithms like those described in [5, 14]. ternational Conference on Neural Information Pro- cessing Systems, NIPS’18, Curran Associates Inc., Red Hook, NY, USA, 2018, p. 6280–6290. 4. Conclusion [8] C. Finn, P. Abbeel, S. Levine, Model-agnostic meta- learning for fast adaptation of deep networks, in: This paper completely analyzes the privacy amplification D. Precup, Y. W. Teh (Eds.), Proceedings of the due to multistage subsampling. This provides the correct 34th International Conference on Machine Learn- privacy bounds for any algorithm that utilizes multistage ing, volume 70 of Proceedings of Machine Learn- subsampling, such as machine learning algorithms that ing Research, PMLR, 2017, pp. 1126–1135. URL: use episodic training. Our future goal is to perform exper- https://proceedings.mlr.press/v70/finn17a.html. iments to better understand privacy in machine learning [9] S. Ravi, H. Larochelle, Optimization as a model for algorithms that use episodic training like meta-learning few-shot learning, in: ICLR, 2017. algorithms. We hope our presented approach and discus- [10] O. Vinyals, C. Blundell, T. Lillicrap, k. kavukcuoglu, sion will prove useful to other researchers wanting to D. Wierstra, Matching networks for one shot learn- apply privacy bounds on multistage sampling in other ing, in: D. Lee, M. Sugiyama, U. Luxburg, studies and applications. I. Guyon, R. Garnett (Eds.), Advances in Neural Information Processing Systems, vol- References ume 29, Curran Associates, Inc., 2016. URL: https://proceedings.neurips.cc/paper/2016/file/ [1] A. Narayanan, V. Shmatikov, Robust de- 90e1357833654983612fb05e3ec9148c-Paper.pdf. anonymization of large sparse datasets, in: 2008 [11] J. Snell, K. Swersky, R. Zemel, Prototypical IEEE Symposium on Security and Privacy (sp 2008), networks for few-shot learning, in: I. Guyon, 2008, pp. 111–125. doi:10.1109/SP.2008.33. U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, [2] L. Rocher, J. M. Hendrickx, Y.-A. de Mon- S. Vishwanathan, R. Garnett (Eds.), Advances tjoye, Estimating the success of re- in Neural Information Processing Systems, identifications in incomplete datasets using volume 30, Curran Associates, Inc., 2017. URL: generative models 10 (????) 3069. URL: https://proceedings.neurips.cc/paper/2017/file/ https://doi.org/10.1038/s41467-019-10933-3. cb8da6767461f2812ae4290eac7cbc42-Paper.pdf. doi:10.1038/s41467-019-10933-3. [12] D. Li, J. Zhang, Y. Yang, C. Liu, Y.-Z. Song, [3] C. Dwork, A. Roth, The algorithmic foundations of T. Hospedales, Episodic training for domain gener- differential privacy, Found. Trends Theor. Comput. alization, in: 2019 IEEE/CVF International Confer- Sci. 9 (2014) 211–407. URL: https://doi.org/10.1561/ ence on Computer Vision (ICCV), 2019, pp. 1446– 0400000042. doi:10.1561/0400000042. 1455. doi:10.1109/ICCV.2019.00153. [13] C.-E. SΓ€rndal, B. Swensson, J. Wretman, Model As- sisted Survey Sampling, Springer-Verlag, 2003, p. 124–162. [14] I. Mironov, K. Talwar, L. Zhang, R\’enyi differ- ential privacy of the sampled gaussian mecha- nism (????). URL: http://arxiv.org/abs/1908.10530. arXiv:1908.10530.