<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Privacy Amplification for Episodic Training Methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vandy Tombs</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olivera Kotevska</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steven Young</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Oak Ridge National Laboratory</institution>
          ,
          <addr-line>1 Bethel Valley Road, Oak Ridge, TN 37830</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The publisher acknowledges the US government license to provide public access under the DOE Public Access Plan (</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <abstract>
        <p>It has been shown that diferential privacy bounds improve when subsampling within a randomized mechanism. Episodic training, utilized in many standard machine learning techniques, uses a multistage subsampling procedure which has not been previously analyzed for privacy bound amplification. In this paper, we focus on improving the calculation of privacy bounds in episodic training by thoroughly analyzing privacy amplification due to subsampling with a multi-stage subsampling procedure. The newly developed bound can be incorporated into existing privacy accounting methods.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        the privacy amplification due to subsampling as well as a
complete analysis for Poisson and simple random
subsamAs more data is being utilized by algorithms and machine pling both with and without replacement subsampling
learning techniques, rigorously maintaining the privacy methods.
of this data has become important. Cyber security, health, The subsampling methods analyzed previously include
and census data collection are all examples of fields that many of the subsampling methods utilized by machine
are seeing increased scrutiny for ensuring the privacy learning; however, the methods does not capture batches
of data, and it is well known that just anonymizing the formed by algorithms that use episodic training. Episodic
data by removing features such as name is not suficient training methods are utilized by a variety of machine
to guarantee privacy due to vulnerabilities such as re- learning algorithms, such as meta learning (e.g., [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ])
identification attacks, especially in the case when an or metric learning (e.g., [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]) algorithms. Domain
adversary has access to auxiliary knowledge or data (see generalization algorithms have also frequently utilized
e.g. [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]). episodic training [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Diferential privacy, first introduced by Dwork, is one In this paper, we analyze the privacy amplification
technical definition of privacy that has been studied due to the subsampling method utilized in an episodic
widely in the literature [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. This definition provides training regime. Specifically, we notice forming batches
rigorous guarantees for the privacy of data that is uti- in episodic training is a multistage subsampling method,
lized by an algorithm and has several nice properties like and we provide a complete analysis of the improved
difrobustness to post processing and strong composition ferential privacy bounds when applying a mechanism to
theorems. a sample drawn using multistage subsampling. The
re
      </p>
      <p>
        Machine learning practitioners initially integrated dif- sulting theorem can be easily applied to episodic training
ferential privacy by naively applying these composition methods and integrated with privacy accounting
meththeorems algorithm by assuming that the algorithm ac- ods such as the moment’s accountant [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This bound
cessed the entire training set on each step of training. can also be utilized by practitioners of other domains
Abadi et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] noticed the data is subsampled into that use multistage subsampling within their algorithms.
batches, so only a subset of the data is utilized for each
step of training. This allowed for improved privacy bounds;
however, they assumed that batches were created us- 2. Background and Related Works
ing Poisson sampling. Later authors showed improved
bounds for creating batches using simple random sam- 2.1. Multistage Subsampling
pling without replacement [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. And most recently, Balle
et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] provided a fully unified theory for determining
      </p>
      <p>In a multistage sampling procedure, the universe from
which samples are drawn is partitioned. These partitions
may contain the examples we are ultimately interested
in sampling or may contain one or several levels of
partitions. The subsampling procedure is to sample partitions
at each level until examples are sampled. For example,
if we are interested in the demographics of students at
a school, we could partition students by teacher, sample
some number of teachers and then sample students from
each sampled teacher.</p>
      <p>
        To see that episodic training is a multistage
subsampling procedure, consider how training batches are formed Given  &gt; 0, let ′ = (1+ ( − 1)) and  = /′ ,
in Algorithm 2 of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In this work, a subset of tasks are the following holds:
sampled from a collection of tasks, then the examples are
sampled and provided to the training algorithm. This is ′ ( || ′) =   ( 1||(1 −  ) 0 +  ′1)
a 2-stage sampling procedure since the training data is The final theorem provides the concrete privacy
amplionly partitioned into two levels: tasks and examples. In ifcation that we need for our analysis. Before presenting
multistage subsampling, the first level of partitions are this, we need to define when two distributions ,  ′ ∈
the primary sampling units and the final level is called P( ) are  -compatible. Let  be a coupling of ,  ′,
the ultimate sampling units and this final level contains define  (, ′) =  (, supp( ′)) where (, ′) ∈
the examples we are ultimately interested in sampling. supp( ) and the distance between a point  and supp( ′)
For more details on multistage subsampling, see e.g., [13]. is defined to be the distance between  and the closest
point in supp( ′).
2.2. Diferential Privacy
Theorem 2. Let (,  ′) be the set of all couplings
Since our analysis utilizes the tools of Balle et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we between  and  ′ and for  ≥ 1 let  = { ∈ supp( ) :
introduce the necessary notations and definitions from  (, supp( ′)) = }. If  and  ′ are  -compatible,
it. Let  be an input space equipped with a binary sym- then the following holds:
tbmroaeritinrniigcn girnedplaautttiaso.insF≃odrraowuthrnapftrudorempsocsarenisbd,etshteihseraecluaontniiocveneprwsteoiltflhbnaeetittghhhee-  ∈m(i,n ′) ∑,︁′  ,′  ℳ, (,′)() = ∑≥ ︁1  () ℳ,()
add-one/remove-one relation, thus two training sets are We are now equipped to begin an analysis of the
prirelated if they difer by the addition or removal of one vacy amplification due to multistage subsampling.
element.
      </p>
      <p>Given a randomized algorithm or mechanism ℳ :
 → P(), where P() is the set of probability mea- 3. OUR APPROACH: Privacy
sures on the output space , ℳ is (,  )-diferentially Bounds for Multistage
private w.r.t ≃ if for every pair  ≃  ′ and every
measurable subset  ⊆ , Sampling Analysis
We will begin the analysis with an example. Through
this example, we will introduce the notation necessary
for the general analysis.</p>
      <p>Example 3.1. Let  be a universe of 18 examples from
which the database or training data is drawn from.
Suppose we can categorize the data from the universe at 3
diferent levels, so we will perform a 3-stage sampling.</p>
      <p>Let
 = 1 ∪ 2
= (11 ∪ 12 ∪ 13) ∪ (21 ∪ 22)
= (︀ {111, 112, 113, 114} ∪ {121, 122}
∪ {131, 132, 133}︀) ∪ ︀( {211, 212, 213, 214}
∪ {221, 222, 223, 224, 225}︀)</p>
      <p>
        Pr[ℳ( ) ∈ ] ≤ Pr[ℳ( ′) ∈ ] + .
Utilizing the tools from [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] requires expressing
diferential privacy in terms of  -divergence given by
 ( || ′) := sup( () −
      </p>
      <p>())
of two probability measures ,  ′ ∈ P(), where 
ranges over all measurable subsets of . Diefrential
privacy can then be stated in terms of  -divergence;
specifically, a mechanism ℳ is (,  )-diferentially private if
and only if  (ℳ( )||ℳ( ′)) ≤  for every adjacent
datasets  ≃  ′.</p>
      <p>We can now define the privacy profile of a mechanism
ℳ as  ℳ = sup ≃  ′  (ℳ( )||ℳ( ′)), which
associates each privacy parameter  =  with a bound
on the  -divergence between the results of the
mechanism on two adjacent datasets.</p>
      <p>
        Two theorems from [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] are important in our analysis.
      </p>
      <p>The first is Advanced Joint Convexity, which we restate in
terms of  =  since we are interested in applying this
theorem to improve the privacy bounds due to multistage
subsampling.</p>
      <p>In this example, 1 for  ∈ {1, 2} are the primary
sampling units, the 12 are the ultimate sampling units and
the 123 are the examples that would be provided to a
training algorithm.</p>
      <p>
        In general, let  be a universe from which the training
data is drawn and suppose a finite number of levels, ,
partition this universe. Define  be the primary
samTheorem 1. ([
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Advanced Joint Convexity of  ) Let pling units and let 12· − 1 be the sampling units of
,  ′ ∈ P() be measures satisfying  = (1 −  ) 0 + the 12· − 2 unit. 12· − 1 is an ultimate
sam 1 and  ′ = (1 −  ) 0 +  ′1 for some ,  0,  1,  ′1. pling unit which contain the examples we are interest in
sampling. Note that we require that each sampling unit
be of finite size except the ultimate sampling units, which
may be infinite. The multistage sampling procedure can
be described by Algorithm 1: Multistage Sampling. Most
episodic training procedures only use 2- or 3-stage
sampling but we analyze the general case; which may have
applications to other scientific domains (e.g. medical
domains) where multistage sampling may have more levels.
      </p>
      <sec id="sec-1-1">
        <title>Algorithm 1: Multistage Sampling</title>
        <p>Set   := ⋃︀ 
Set  := ∅
Given  : the number of units to be sampled at
each level (1 ≤  ≤ )
for  ∈ {1, ...., } do
for  ∈ PrevLevel do
sample without replacement  elements
from 
add sampled elements to 
end
end
  = 
 := ∅</p>
        <p>Now, let  ⊂  be the training data or database we
are analyzing. We will require that the training data has
at least one element from each sampling unit described
above. Thus we only allow the ultimate sampling units
of the training data 12· − 1 ⊂ 12· − 1 , to be
a non-empty finite subset of the ultimate sampling units
with at least  elements (i.e. at least the number of
units that will be sampled from the ultimate sampling
units). All other sampling units defined for the universe
will remain the same for the training set.</p>
        <p>
          We want to analyze the privacy bound on algorithms
that use a multistage subsampling procedure on  . To do
this, we will apply the theorems from [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and will analyze
this sampling procedure under the add-one/remove one
relation. We begin by defining a probability measure
for this sampling procedure. We can do this by simply
defining
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Coupling Theorem from [7]. We just need to compute:</title>
        <p>(,  ′) = 1 −
∑︁ min ( (),  ′())
∈
Note we can easily extend our probability measures ,  ′
to the entire universe by setting the inclusion probability
to 0 for any element not in  or  ′. For all elements
 ∈  ′ ∖ 12· − 1 , we have min( (),  ′()) =
 () =  ′(). Since 12· − 1 1 ̸∈  ′, we also have
min( (12· − 1 1),  ′(12· − 1 1)) = 0 . So we
just need to consider the elements of the ultimate unit
from which we removed an element. Since, we removed
an element from this unit, the probability  ′() &gt;  ()
since ′12· − 1 (the ultimate unit missing an element
in  ′) has fewer elements than 12· − 1 , therefore
for all 12· − 1  ∈ ′12· − 1 and  ̸= 1, we have
 (12· − 1 ) &lt;  ′(′12· − 1 ) where
 (12· − 1 ) =
 ′(′12· − 1 ) =
∏︀=1 
∏︀=1 
|1 ||12 | · · · | 12· − 1 |
|1 ||12 | · · · | ′12· − 1 |
.</p>
      </sec>
      <sec id="sec-1-3">
        <title>Thus</title>
        <p>∈
∑︁ min ( (),  ′()) = ∑︁  () = 1−  (12· − 1 1).</p>
        <p>∈ ′
Hence the total variational distance is just the inclusion
probability of the element we removed. Determining the
total variational distance when adding an element from
 to  is similar to the above argument.</p>
        <p>We can now provide an amplified privacy bound for
multistage subsampling.</p>
        <p>Theorem 3. Let ℳ′ be a subsampled mechanism on 
described by Algorithm 1 and let 12 . . . − 1 be
the index of the penultimate sampling unit that satisfies</p>
        <p>min
1,2,...,− 1</p>
        <p>(|1 ||12 | · · · | 12· − 1 |).
∏︀=1</p>
        <p>Then, for any  ≥ 0, we have that  ℳ′ ( ′) ≤  ℳ′ ( )
 (12·  ) = |1 ||12 | · · · | 12· − 1 | for and  = |1 ||12∏|︀·|=112· − 1 | and  ′ =
where 12·  is in the ultimate unit 12· − 1 . (1 +  ( − 1)) under the add-one/remove-one
rela</p>
        <p>Now consider  ′ created by removing one element tion.
from  , say without loss of generality, 12· − 1 1 for
some 1, 2, ..., − 1 . The probability measure  ′ for
sampling from  ′ can be defined similar to above. We
wish to compute the total variational distance between
these two measure so that we can apply the Advanced</p>
        <p>To fully complete the proof, let  ,  ′ be training sets
drawn from  with  ≃  ′ under the
add-one/removeone relation ≃ and let  ( ) denote the subsampling
mechanism described by Algorithm 1 for  =   (,  ′).</p>
        <p>
          Let 0 =  ∩  ′, then by definition of ≃, 0 =  or
0 =  ′. Let 0 =  (0),  =  ( ) and  ′ =
 ( ′). Then the decompositions of  and  ′ induced
by their maximal coupling have that  1 = 0 when
0 =  or  ′1 = 0 when 0 =  ′. We only need to
consider 0 =  ′ since this is when the maximum is
obtained in applying advanced joint convexity. Finally,
we note that one can easily create a ≃ -compatible pair
according to the definition provided in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] by first
sampling  from  and building ′ by adding  (which may
be empty) to . Thus for each dataset pair, by Theorem
7 of [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], we have  ′ (′) ≤   (). In order to get a
bound for all possible training set pairs, we need to take
 = ( , ′)(  ≃ ′ ). This occurs exacty when we
remove an element from the penultimate unit with index
12 · − 1 which completes the proof.
        </p>
        <p>
          We briefly mention how one might incorporate this
new bound into a privacy accounting method. Many
accounting methods, like the moments accountant [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ],
use the moment generating function in conjunction with
the Gaussian mechanism to calculate the privacy bounds
while a machine learning algorithm is training. Using
Theorem 4 from [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] with our new bound one can easily
derive a subsampled Gaussian that can be utilized in
algorithms like those described in [
          <xref ref-type="bibr" rid="ref5">5, 14</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Conclusion</title>
      <p>This paper completely analyzes the privacy amplification
due to multistage subsampling. This provides the correct
privacy bounds for any algorithm that utilizes multistage
subsampling, such as machine learning algorithms that
use episodic training. Our future goal is to perform
experiments to better understand privacy in machine learning
algorithms that use episodic training like meta-learning
algorithms. We hope our presented approach and
discussion will prove useful to other researchers wanting to
apply privacy bounds on multistage sampling in other
studies and applications.
[13] C.-E. Särndal, B. Swensson, J. Wretman, Model
Assisted Survey Sampling, Springer-Verlag, 2003, p.
124–162.
[14] I. Mironov, K. Talwar, L. Zhang, R\’enyi
diferential privacy of the sampled gaussian
mechanism (????). URL: http://arxiv.org/abs/1908.10530.
arXiv:1908.10530.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Shmatikov</surname>
          </string-name>
          ,
          <article-title>Robust deanonymization of large sparse datasets</article-title>
          ,
          <source>in: 2008 IEEE Symposium on Security and Privacy</source>
          (sp
          <year>2008</year>
          ),
          <year>2008</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>125</lpage>
          . doi:
          <volume>10</volume>
          .1109/SP.
          <year>2008</year>
          .
          <volume>33</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Rocher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Hendrickx</surname>
          </string-name>
          , Y.-A. de Montjoye,
          <article-title>Estimating the success of reidentifications in incomplete datasets using generative models 10 (????) 3069</article-title>
          . URL: https://doi.org/10.1038/s41467-019-10933-3. doi:
          <volume>10</volume>
          .1038/s41467-019-10933-3.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <article-title>The algorithmic foundations of diferential privacy</article-title>
          ,
          <source>Found. Trends Theor. Comput. Sci. 9</source>
          (
          <year>2014</year>
          )
          <fpage>211</fpage>
          -
          <lpage>407</lpage>
          . URL: https://doi.org/10.1561/ 0400000042. doi:
          <volume>10</volume>
          .1561/0400000042.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <article-title>Diferential privacy: A survey of results</article-title>
          , in: M.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Duan</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Li (Eds.),
          <source>Theory and Applications of Models of Computation</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2008</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chu</surname>
          </string-name>
          , I. Goodfellow, H. B.
          <string-name>
            <surname>McMahan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Mironov</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Talwar</surname>
            ,
            <given-names>L. Zhang,</given-names>
          </string-name>
          <article-title>Deep learning with diferential privacy</article-title>
          ,
          <source>in: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security</source>
          , CCS '16,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2016</year>
          , p.
          <fpage>308</fpage>
          -
          <lpage>318</lpage>
          . URL: https://doi.org/10.1145/2976749. 2978318. doi:
          <volume>10</volume>
          .1145/2976749.2978318.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.-X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Balle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kasiviswanathan</surname>
          </string-name>
          ,
          <article-title>Subsampled rényi diferential privacy and analytical moments accountant</article-title>
          ,
          <source>Journal of Privacy and Confidentiality</source>
          <volume>10</volume>
          (
          <year>2021</year>
          ). URL: https: //journalprivacyconfidentiality.org/index.php/jpc/ article/view/723. doi:
          <volume>10</volume>
          .29012/jpc.723.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Balle</surname>
          </string-name>
          , G. Barthe,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gaboardi</surname>
          </string-name>
          ,
          <article-title>Privacy amplification by subsampling: Tight analyses via couplings and divergences</article-title>
          ,
          <source>in: Proceedings of the 32nd International Conference on Neural Information Processing Systems</source>
          , NIPS'18, Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2018</year>
          , p.
          <fpage>6280</fpage>
          -
          <lpage>6290</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Finn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Abbeel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Levine</surname>
          </string-name>
          ,
          <article-title>Model-agnostic metalearning for fast adaptation of deep networks</article-title>
          , in: D.
          <string-name>
            <surname>Precup</surname>
            ,
            <given-names>Y. W.</given-names>
          </string-name>
          <string-name>
            <surname>Teh</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 34th International Conference on Machine Learning</source>
          , volume
          <volume>70</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1126</fpage>
          -
          <lpage>1135</lpage>
          . URL: https://proceedings.mlr.press/v70/finn17a.html.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ravi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Larochelle</surname>
          </string-name>
          ,
          <article-title>Optimization as a model for few-shot learning</article-title>
          ,
          <source>in: ICLR</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Blundell</surname>
          </string-name>
          , T. Lillicrap, k. kavukcuoglu, D. Wierstra,
          <article-title>Matching networks for one shot learning</article-title>
          , in: D.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Sugiyama</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Luxburg</surname>
            ,
            <given-names>I. Guyon</given-names>
          </string-name>
          , R. Garnett (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>29</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2016</year>
          . URL: https://proceedings.neurips.cc/paper/2016/file/ 90e1357833654983612fb05e3ec9148c-Paper.pdf .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Snell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Swersky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zemel</surname>
          </string-name>
          ,
          <article-title>Prototypical networks for few-shot learning</article-title>
          , in: I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , R. Garnett (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>30</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2017</year>
          . URL: https://proceedings.neurips.cc/paper/2017/file/ cb8da6767461f2812ae4290eac7cbc42-Paper.pdf .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          , Y.-
          <string-name>
            <given-names>Z.</given-names>
            <surname>Song</surname>
          </string-name>
          , T. Hospedales,
          <article-title>Episodic training for domain generalization</article-title>
          ,
          <source>in: 2019 IEEE/CVF International Conference on Computer Vision</source>
          (ICCV),
          <year>2019</year>
          , pp.
          <fpage>1446</fpage>
          -
          <lpage>1455</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICCV.
          <year>2019</year>
          .
          <volume>00153</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>