<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning a Multimodal Prior Distribution for Generative Adversarial Nets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thomas Goerttler</string-name>
          <email>thomas.goerttler@ni.tu-berlin.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marius Kloft</string-name>
          <email>kloft@cs.uni-kl.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Technical University of Kaiserslautern</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Neural Information Processing Group, Department of Electrical Engineering and Computer Science, Technical University of Berlin</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Generative adversarial nets (GANs) have shown their potential in various tasks like image generation, 3D object generation, image super-resolution, and video prediction. Nevertheless, they are still considered as highly unstable to train and are endangered to miss modes. One problem is that real data is usually discontinuous, whereas the prior distribution is continuous. This circumstance can lead to non-convergence of the GAN and makes it hard for the generator to generate fair results. In this paper, we introduce an approach to directly learn modes in the prior distribution - which map to the modes in the real data - by changing the training procedure of GANs. Our empirical results show that this extension stabilizes the training of GANs, and it captures discrete uniform distributions fairer. We use the score of the earth mover's distance as an evaluation metric to underline this e ect.</p>
      </abstract>
      <kwd-group>
        <kwd>Generative Models</kwd>
        <kwd>Mode Collapse</kwd>
        <kwd>Learning Latent Dis- tributions</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In 2014 generative adversarial nets (GANs) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] were proposed as a novel
generative model, which does not formulate the distribution of training data explicit
but instead allows to sample additional data coming from the distribution. They
directly achieved state-of-the-art results on a lot of di erent tasks from image
generation [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], through image super-resolution [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], 3D object generation [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ],
anomaly detection [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and video prediction [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Despite their success, training GANs is notoriously unstable, and the
theoretical knowledge of why GANs work well is still not fully explored [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. One problem
is that the distribution of data is usually multimodal and discontinuous, whereas
the latent space usually comes from a continuous space e.g., uniform or Gaussian
with no mode respectively only one mode. Therefore, the generator function G
has to learn a transform from the continuous latent space to the
discontinuous multimodal distribution, which can be seen as a mixture of di erent simple
distributions. For example, a human either wears glasses or does not. This
transition is discrete and has to be learned by the generator. However, this is quite
di cult, and generators tend to learn ambiguous faces.
      </p>
      <p>
        Additionally, this makes it di cult for the GAN to train and endangers
mode collapse when the model only captures a single mode and misses other
ones. Gurumurthy et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] propose to de ne a multimodal prior distribution
directly; however, this only works if we know already the real data distribution,
which is not the case in practice. If we knew the distribution already, a GAN
would not be required anymore.
      </p>
      <p>Therefore, we propose to learn the modes directly in the latent distribution.
We achieve this by restricting the prior distribution in the training procedure.
Besides, this helps the training procedure to be more stable and nally, helps
the GAN not to miss modes, as our results show.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Training generative adversarial nets</title>
      <p>
        The idea of GANs, introduced in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], is to have two adversarial neural nets which
play a two-player minimax game. On the one hand, there is a generator
function G which learns the distribution pg over given data x, which draws noise z
randomly from a prior distribution pz and generates out of it an implicit
distribution pg. On the other hand, there is a discriminator function D, which tries to
distinguish accurately between real data x and the generated data G(z; g). It
returns a single scalar which expresses the probability that a given input comes
rather from the data x than from the generator. Both G and D are non-linear
mapping di erentiable functions and in general, expressed by a neural net. In
the training process of GANs, the discriminator D is trained to correctly
discriminate between the real data input x and the generated samples G(z; g). At
the same time, the generator G is trained to fool the discriminator as much as
possible. Thus, it wants to maximize the function D(G(z)), whereas the
discriminator wants to minimize it and maximize D(x). In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the objective function is
expressed as followed:
min max V (D; G) = Ex pdata(x)[log D(x)] + Ez pz(z)[log(1
      </p>
      <p>G D
D(G(z)))]: (1)</p>
      <p>
        The objective function is trained via gradient descent step until convergence,
which is the case when we have reached a Nash equilibrium. In [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], the authors
extended convolution and pooling layers into the architecture. Further
extension of the GAN framework are e.g., the Conditional GAN[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] or the unrolled
GAN[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
Degenerative prior distribution and manifold problem: A huge problem
of GANs is that samples from the generator are degenerative when instantiating
the GANs. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] it is remarked that if the dimension k of the prior
distribution pz is smaller than the dimension n of the data distribution, the output of
the generator will always lie in the k-dimensional manifold in the n-dimensional
space. Also, the distribution of the real data pdata lies often in a o-dimensional
manifold with o &lt;&lt; n. Having two distributions which lie on lower-dimensional
manifold, results in the situation that the support of the real data distribution
pdata and the generated distribution pg often are non-overlapping. In such cases
minimizing divergences is meaningless as they are \maxed out" [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Furthermore, the discriminator can be perfect, which leads to instabilities and also to a
vanishing gradient problem in the generator [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        As minimizing divergence is meaningless if the pg and pdata are disjoint, the
Wasserstein GAN (WGAN) is aiming at minimizing the Wasserstein distance
instead of any type of f-divergence [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. A theoretical solution for ensuring that
the degenerative distributions of the generator and the real data lying on a
lowdimensional manifold overlap, is to add noise to the generated and the real data
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>Mode collapse and mode dropping One common failure of GANs happens
when the generator collapses to a set of parameters, and the GAN always outputs
the same value. This output fools the discriminator so well that the discriminator
cannot discriminate the fake samples from the real data.</p>
      <p>Similar to mode collapse is mode dropping. As there is no communication
between the points in GANs, it can happen that the loss function is close to the
optimum the score of the fake samples G(z) are all almost 0:5 which indicates
that the algorithm has almost reached the Nash equilibrium, but some modes
are not captured and missed out.</p>
      <p>
        Our approach, we introduce in this paper, focuses on manipulating the prior
distribution. In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] also the prior is manipulated, but they use an associative
memory in the learning process.
3
      </p>
      <p>Masked and weighted prior distribution for GANs
In this section, we introduce our novel approaches to stabilize the training of
GANs by nding modes in the prior distribution. We achieve this by masking
and weighting the latent distribution of the GAN during training.
3.1</p>
      <p>Using the information of the discriminator
The standard GAN samples a batch fz(1); : : : ; z(m)g3 of size m from the prior
distribution pz(z) and passes them to the generator. The prior distribution has</p>
      <sec id="sec-2-1">
        <title>3 Note, that we denote the batches as sets although there are not mathematical sets</title>
        <p>
          but arrays or vectors of samples. We adapted the set notation from [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]
a dimension of k, and its distribution is de ned once and is constant over time.
In practice, the uniform distribution Uk( 1; 1) or the standard normal
distribution Nk(0; 1) are used. Training a GAN, there are two steps that are iteratively
repeated: in the rst step the discriminator is updated, and in the second step
the generator. In both steps, we sample a batch fz(1); : : : ; z(m)g from the prior
distribution pz(z) identically and independently. We propose to use the
information the discriminator gives us about every faked sample when we pass the noise
through the generator G(z(i)). The information we obtain for a noise sample z(i)
is a score:
        </p>
        <p>s(i) := s(z(i)) := D(G(z(i)))
The score s(i) lies in the case of standard GAN in [0; 1] and gives us information
about how likely it is that the generated samples fool the current
parameterization g of the discriminator D. Having this information, we restrict and
manipulate the prior distribution before we resample again from the manipulated
prior distribution and optimize the generator and the discriminator with a batch
of resampled noise values fzr(1); : : : ; zr(m)g. We distinguish between two di erent
approaches:
Masking the prior by restricting it to the portion which has a higher probability
of fooling the generator. This gives a hard constraint, and only the part of pz(z)
is processed, which falls into this region. The portion of the part being restricted
is a hyperparameter r which lies in (0; 1]. Because the rate r determines the
portion of the prior distribution we select from, a rate of 1 means that we draw
from the normal GAN prior constantly and a rate r close zero means we only
optimize for a tiny region. In De nition 1, we denote the density function of the
masked prior.</p>
        <p>De nition 1. The probability density function of the masked prior is de ned as
pz,masked(z) =
( 1 pz(z) if P (s(pz(z)) &gt; s(x)) &gt; 1
r
0 otherwise
r f or x
pz
Weighting the prior by using the score to de ne a weight. In this case, we
resample from the prior distribution weighted by the scores, respectively, a function
of the scores. The new density is given in De nition 2.</p>
        <p>De nition 2. The probability density function of the weighted prior is de ned
as</p>
        <p>pz,weighted(z) = pz(z) w(s(z))
and w is chosen such that it holds
and
8s0; s1 2 [0; 1] : s0 &lt; s1 ) w(s0) &lt; w(s1)</p>
        <p>Z
z
pz(z) w(s(z))dz = 1
(2)
(3)
(4)
(5)
Equation 5 ensures that a higher score leads to a higher probability to be drawn
and equation 6 guarantees that pz,weighted is a density as the weights are
normalized. We propose to de ne w(s) such that it additionally holds
(7)
Equation 7 leads to the proportionality between s and w(s).</p>
        <p>In the following, we always call the GAN with a xed prior distribution
traditional GAN or GAN with a constant prior distribution. In Figure 1, we see
how the theoretical prior distribution changes by looking at an example in the
one-dimensional case. While training a traditional GAN, we draw from a uniform
distribution that has a minimum value of 1 and a maximum value of 1. In this
example the scores are determined by the function f (z) = (z 0:3)3 + z + 0:0173
for x 2 [ 1; 1] (Figure 1 c) ). The theoretical distribution of weighting changes
due to the priors, if the weighting function is w(z) = f (z). The masked prior of
r = 0:5 can be seen under e). The lower the masking score is, the higher their
density value is because the range we draw from decreases.</p>
        <p>As pz(z) is continuous, it is impossible to get the score of each point a priori
because we have uncountable in nity points. Therefore, we have to nd
another way to draw appropriately from the theoretical distributions pz;masked
and pz;weighted. We resample from the batch Z retrieving after masking or
weighting it. In the case of masking, this means that we keep the samples with
the higher score and resample from them Z Z as showcased in Figure 1
f). In case of weighting, this means that we assign a weight to every
sample and resample again from the batch Z weighted with the batch of weights
W = fw(s(1)); : : : ; w(s(n))g(Figure 1 d)). If we have a minibatch size of m, it
follows that the pre-sample size n is higher than m to get enough diversity for
each training step. This is required because the masked region we resample from
only has a size of r n, and we want that value not to be much smaller than m
and ideally higher. The distribution - we draw our masked and weighted samples
from in the algorithm - is not continuous anymore but is based on the pre-sample
batch. Therefore, we do not use densities for masked and weighted prior in the
algorithm but probability mass functions. Thus, we slightly adjust De nition 1,
leading to De nition 3.</p>
        <p>De nition 3. The probability mass function of the masked prior is de ned as
pmz,masked(z) =
( 1
r n
0
if z 2 fz(1); : : : ; z(n)g and s(x) &gt; pct1 r(fs(1); : : : ; s(n)g)
otherwise
where pct1 r is the 1 r percentile. Note, that we assume that the elements
in fz(1); : : : ; z(n)g are disjunct which is true with a probability of 1 as they are
drawn from a continuous distribution. The nite sample version of the weighted
prior is de ned in De nition 4.</p>
        <p>De nition 4. The probability mass function of the weighted prior is de ned as
pmz,weighted(z) =
and w is chosen such that it holds
and
(p(z) w(s(z)) if z 2 Z = fz(1); : : : ; z(n)g
0</p>
        <p>otherwise
8s0; s1 2 [0; 1] : s0 &lt; s1 ) w(s0) &lt; w(s1)
n
X w(s(i)) = 1</p>
        <sec id="sec-2-1-1">
          <title>Layer</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>Layer type</title>
          <p>Hyperparameters
input z
1 Fully Connected</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>2 Fully Connected</title>
          <p>output Fully Connected
500 components sampled from U2( 1; 1)
128 Neurons, ReLU
128 Neurons, ReLU</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>2 (3) Neurons, tanh</title>
        </sec>
        <sec id="sec-2-1-5">
          <title>Layer</title>
        </sec>
        <sec id="sec-2-1-6">
          <title>Layer type</title>
        </sec>
        <sec id="sec-2-1-7">
          <title>Hyperparameters</title>
          <p>In this section, we discuss the results of applying our novel learning algorithm
to GANs on di erent data sets. The data sets are both a synthetic toy data set,
and standard deep learning data sets MNIST and CelebA.We train the GAN
using a multi-layer perceptron as well as the DCGAN. For the DCGAN we use
the implementation of Taehoon Kim4.</p>
          <p>
            To spectate the e ect of masking and weighting the prior distribution, we
apply our GAN extension on a mixture of eight Gaussians lying in R2. The eight
Gaussian mode data set has been used to show stabilizing e ects in [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ], [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ], and
[
            <xref ref-type="bibr" rid="ref13">13</xref>
            ].
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>4 https://github.com/carpedm20/DCGAN-tensor ow</title>
        <p>
          Besides applying our modi ed GAN on a mixture of eight Gaussians, we also
apply them to MNIST [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and CelebA [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], two datasets which are commonly
used in deep learning and image processing tasks.
4.1
        </p>
        <p>Mixture of Gaussians</p>
        <p>
          We compare the performance of the di erent GANs on an example of eight
modes. The parameters of the networks are summarized in Table 1. We use
Wasserstein-GAN [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] with ve discriminator steps per one training step of the
generator, a minibatch size of 500, a learning rate of 0:0001, and we train the
GAN for 50 epochs. The masking rate is 0:5, and for masking and weighting,
we have a pre-sample size of 1000. In Figure 2, we see the result of the GANs.
The traditional GAN does not capture all modes. It can be observed that
especially two modes have gotten a lot of mass. Also, some outliers between each
mode are visible. These are visible anymore if we sampled masked during
generation. Having a look at the prior distribution in Figure 3, we see that di erent
areas get a higher score, which leads to the modes. But as we do not use this
information during training discriminator and generator. Looking at the results
of the masked and the weighted GAN we see a huge improvement. The prior
distribution is separated into eight modes which correspond to the eight modes
in the resulting distribution. The EMDs of the GAN with a traditional prior
distribution is 0:3195. Masking the prior distribution of a traditional GAN only
for the generation, the EMD score is even a little bit higher (0:3378). Although
the outliers disappear, masking only the prior distribution during the
generation does not help in this case. The EMD of the weighted and the masked GAN
though are smaller: 0:0891 respectively 0:1328. We repeated this also with the
standard GAN with the alternative loss function. The results were similar.
        </p>
        <p>
          We also want to investigate the in uence of the masking rate more detailed.
So far, we only used the rate of 0:5, but in general, every value r 2 (0; 1] is
0.8
0.6
0.4
0.2
possible. We repeated the experiment for several rates from 0 to 1. We used the
same parameter setting but increased the pre-sample size to 2000 that we have
more points to resample from which especially helps the low masking rates as
they mask out most of the points. In Figure 4, we see on the x-axis the rate and
on the y-axis the resulting EMD. We plot the average EMD of three di erent
runs of the experiments. We observe that a low masking rate does not work in
this case as it restricts too much of the prior distribution. A masking rate of 0:5
to 0:9 improves the GAN in quality as it has a smaller EMD (Figure 4).
MNIST On MNIST we train a GAN using only multilayer perceptron (MLP)
networks as well the DCGAN architecture. The hyperparameters of the MLP
version are based on the parameters in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] but we use a slower learning rate of
0:01. The minibatch size is 100 and SGD is used with a momentum of 0:5 at the
start, which increases to 0:7 at the epoch of 250. The model is trained 300 epochs
and several times with di erent starting states of the seed. In Figure 5, we see the
resulting images of the traditional GAN and the output of the two masked GAN
and the weighted GAN. Whereas the quality of the resulting pictures is similar,
respectively, one cannot see a clear di erence, we see that the traditional GAN
lacks to capture di erent modes and only captures the digit 1. Masking the GAN
500
400
sce300
en
r
ccu200
O
100
0 0 1 2 3 4 Digit5 6 7 8 9
0 0 1 2 3 4 Digit5 6 7 8 9
0 0 1 2 3 4 Digit5 6 7 8 9
and weighting the GAN solves this problem and leads to more stable results. In
Figure 6, we observe the bar charts of the resulting distribution drawing 2000
samples from the generator and classifying them. They underline that the modes
are captured more fairly for our approach. We also use the DCGAN architecture
to learn the MNIST. The architecture of the generator and the discriminator
nets are adapted from [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. For the convolutional and transposed convolutional
layers, we use a lter width and lter height of 5 and a stride of 1 respectively 2.
The results can be seen in Figure 7. Also, in this setting, the fairness of modes
is better when applying masking and weighting.
CelebA We also apply our new proposed GAN on the CelebA data set. We
use the DCGAN architecture as well as the parameters of [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. We train the
model for 10 epochs. Besides, we reduce the depth of the convolution layers for
a second experiment. This time we allow it to train for 25 epochs as we want to
guarantee that it has time to converge. Reducing the depth of convolution has
also been done in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] to show stabilization e ects.
        </p>
        <p>In Figure 8, the results of the three di erent GANs are shown. We observe
that for the rst parameter setting traditional GAN, masked GAN, and weighted</p>
        <p>withtrdaedpittihoncaolnvGoAluNtions b) with dmeapstkhedcoGnvAoNlutions c) with wdeeipgthhtecdonGvAolNutions
d) traditional GAN with masked GAN with weighted GAN with
non-depth convolutions e) non-depth convolutions f) non-depth convolutions</p>
        <p>GAN produce results of similar quality. If we reduce the depth of the
convolutional layers in the generator and the discriminator, the traditional GAN
captures only a few modes and is not able to replicate the faces properly. However,
also the masked and the weighted GAN images become worse as we see a lot of
similar faces and not the full variety like in the training set. Also, the quality is
reduced, which is caused by the weaker architecture. Nevertheless, the quality of
the results of masked and weighted GAN is better than the traditional GAN.
5</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion and future work</title>
      <p>In this paper, we propose a new extension of GANs, which focuses on reducing
the prior distribution to particular regions instead of leaving it constant. Our
experiments show the potential of the novel idea as it decreases the EMD between
the training data and the generated data. In the case of estimating a multimodal
distribution, we noticed that masked GAN nds the corresponding island in the
latent space. On MNIST, we could observe that the generated distribution is
fairer when applying masking and weighting.</p>
      <p>In the future, we want to tackle the uncomfortableness that we, in our
essential extension, have to use the discriminator to generate new samples. We think
that it is worthwhile to eliminate this. We have two di erent ideas in mind:
diminishing the masking and weighting e ect by either increase the masking
rate or blurring out the weights, and optimizing for the prior distribution, which
shows higher gradients.</p>
      <p>Acknowledgments We thank Robert Vandermeulen, Lukas Ru and Gregoire
Montavon for fruitful discussions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arici</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Celikyilmaz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Associative adversarial networks</article-title>
          .
          <source>CoRR</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Arjovsky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Towards principled methods for training generative adversarial networks</article-title>
          .
          <source>In: ICLR</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Arjovsky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chintala</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Wasserstein generative adversarial networks</article-title>
          .
          <source>In: ICML</source>
          . pp.
          <volume>214</volume>
          {
          <issue>223</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Che</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jacob</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Mode regularized generative adversarial networks</article-title>
          .
          <source>In: ICLR</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Deecke</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vandermeulen</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ru</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mandt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kloft</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Image anomaly detection with generative adversarial networks</article-title>
          .
          <source>In: ECML PKDD</source>
          . pp.
          <volume>3</volume>
          {
          <issue>17</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>The MNIST database of handwritten digit images for machine learning research [best of the web]</article-title>
          .
          <source>IEEE Signal Process. Mag</source>
          .
          <volume>29</volume>
          (
          <issue>6</issue>
          ),
          <volume>141</volume>
          {
          <fpage>142</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courville</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          :
          <article-title>Deep Learning. Adaptive computation and machine learning</article-title>
          , MIT Press (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pouget-Abadie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mirza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warde-Farley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozair</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courville</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Generative adversarial nets</article-title>
          .
          <source>In: NIPS</source>
          . pp.
          <volume>2672</volume>
          {
          <issue>2680</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gulrajani</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahmed</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arjovsky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumoulin</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courville</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          :
          <article-title>Improved training of wasserstein gans</article-title>
          .
          <source>In: NIPS</source>
          . pp.
          <volume>5769</volume>
          {
          <issue>5779</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gurumurthy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarvadevabhatla</surname>
            ,
            <given-names>R.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Babu</surname>
            ,
            <given-names>R.V.</given-names>
          </string-name>
          :
          <article-title>Deligan: Generative adversarial networks for diverse and limited data</article-title>
          .
          <source>In: CVPR</source>
          . pp.
          <volume>4941</volume>
          {
          <issue>4949</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Deep learning face attributes in the wild</article-title>
          .
          <source>In: ICCV</source>
          . pp.
          <volume>3730</volume>
          {
          <issue>3738</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Mathieu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Couprie</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>LeCun</surname>
          </string-name>
          , Y.:
          <article-title>Deep multi-scale video prediction beyond mean square error</article-title>
          .
          <source>In: 4th International Conference on Learning Representations, ICLR</source>
          <year>2016</year>
          , San Juan, Puerto Rico, May 2-
          <issue>4</issue>
          ,
          <year>2016</year>
          , Conference Track Proceedings (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Metz</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poole</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sohl-Dickstein</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Unrolled generative adversarial networks</article-title>
          .
          <source>In: ICLR</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mirza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Osindero</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Conditional generative adversarial nets</article-title>
          .
          <source>CoRR</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Metz</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chintala</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Unsupervised representation learning with deep convolutional generative adversarial networks</article-title>
          .
          <source>In: ICLR</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lucchi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nowozin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Stabilizing training of generative adversarial networks through regularization</article-title>
          .
          <source>In: NIPS</source>
          . pp.
          <year>2015</year>
          {
          <year>2025</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. S nderby, C.K.,
          <string-name>
            <surname>Caballero</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Theis</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huszar</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Amortised MAP inference for image super-resolution</article-title>
          .
          <source>In: ICLR</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Freeman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Tenenbaum</surname>
          </string-name>
          , J.:
          <article-title>Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling</article-title>
          .
          <source>In: NIPS</source>
          . pp.
          <volume>82</volume>
          {
          <issue>90</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>