<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Learning a Multimodal Prior Distribution for Generative Adversarial Nets</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Thomas</forename><surname>Goerttler</surname></persName>
							<email>thomas.goerttler@ni.tu-berlin.de</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Electrical Engineering and Computer Science</orgName>
								<orgName type="laboratory">Neural Information Processing Group</orgName>
								<orgName type="institution">Technical University of Berlin</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marius</forename><surname>Kloft</surname></persName>
							<email>kloft@cs.uni-kl.de</email>
							<affiliation key="aff1">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Technical University of Kaiserslautern</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Humboldt University of Berlin</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Learning a Multimodal Prior Distribution for Generative Adversarial Nets</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">DAC3251E7FF8D9F064659A31B245F3DA</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T18:28+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Generative Models</term>
					<term>Mode Collapse</term>
					<term>Learning Latent Distributions</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Generative adversarial nets (GANs) have shown their potential in various tasks like image generation, 3D object generation, image super-resolution, and video prediction. Nevertheless, they are still considered as highly unstable to train and are endangered to miss modes. One problem is that real data is usually discontinuous, whereas the prior distribution is continuous. This circumstance can lead to non-convergence of the GAN and makes it hard for the generator to generate fair results. In this paper, we introduce an approach to directly learn modes in the prior distribution -which map to the modes in the real data -by changing the training procedure of GANs. Our empirical results show that this extension stabilizes the training of GANs, and it captures discrete uniform distributions fairer. We use the score of the earth mover's distance as an evaluation metric to underline this effect.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In 2014 generative adversarial nets (GANs) <ref type="bibr" target="#b8">[8]</ref> were proposed as a novel generative model, which does not formulate the distribution of training data explicit but instead allows to sample additional data coming from the distribution. They directly achieved state-of-the-art results on a lot of different tasks from image generation <ref type="bibr" target="#b15">[15]</ref>, through image super-resolution <ref type="bibr" target="#b17">[17]</ref>, 3D object generation <ref type="bibr" target="#b18">[18]</ref>, anomaly detection <ref type="bibr" target="#b4">[5]</ref>, and video prediction <ref type="bibr" target="#b12">[12]</ref>.</p><p>Despite their success, training GANs is notoriously unstable, and the theoretical knowledge of why GANs work well is still not fully explored <ref type="bibr" target="#b7">[7]</ref>. One problem is that the distribution of data is usually multimodal and discontinuous, whereas the latent space usually comes from a continuous space e.g., uniform or Gaussian with no mode respectively only one mode. Therefore, the generator function G has to learn a transform from the continuous latent space to the discontinuous multimodal distribution, which can be seen as a mixture of different simple distributions. For example, a human either wears glasses or does not. This transition is discrete and has to be learned by the generator. However, this is quite difficult, and generators tend to learn ambiguous faces.</p><p>Additionally, this makes it difficult for the GAN to train and endangers mode collapse when the model only captures a single mode and misses other ones. Gurumurthy et al. <ref type="bibr" target="#b10">[10]</ref> propose to define a multimodal prior distribution directly; however, this only works if we know already the real data distribution, which is not the case in practice. If we knew the distribution already, a GAN would not be required anymore.</p><p>Therefore, we propose to learn the modes directly in the latent distribution. We achieve this by restricting the prior distribution in the training procedure. Besides, this helps the training procedure to be more stable and finally, helps the GAN not to miss modes, as our results show.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Training generative adversarial nets</head><p>The idea of GANs, introduced in <ref type="bibr" target="#b8">[8]</ref>, is to have two adversarial neural nets which play a two-player minimax game. On the one hand, there is a generator function G which learns the distribution p g over given data x, which draws noise z randomly from a prior distribution p z and generates out of it an implicit distribution p g . On the other hand, there is a discriminator function D, which tries to distinguish accurately between real data x and the generated data G(z, θ g ). It returns a single scalar which expresses the probability that a given input comes rather from the data x than from the generator. Both G and D are non-linear mapping differentiable functions and in general, expressed by a neural net. In the training process of GANs, the discriminator D is trained to correctly discriminate between the real data input x and the generated samples G(z, θ g ). At the same time, the generator G is trained to fool the discriminator as much as possible. Thus, it wants to maximize the function D(G(z)), whereas the discriminator wants to minimize it and maximize D(x). In <ref type="bibr" target="#b8">[8]</ref>, the objective function is expressed as followed:</p><formula xml:id="formula_0">min G max D V (D, G) = E x∼p data (x) [log D(x)] + E z∼pz(z) [log(1 − D(G(z)))]. (1)</formula><p>The objective function is trained via gradient descent step until convergence, which is the case when we have reached a Nash equilibrium. In <ref type="bibr" target="#b15">[15]</ref>, the authors extended convolution and pooling layers into the architecture. Further extension of the GAN framework are e.g., the Conditional GAN <ref type="bibr" target="#b14">[14]</ref> or the unrolled GAN <ref type="bibr" target="#b13">[13]</ref>.</p><p>Degenerative prior distribution and manifold problem: A huge problem of GANs is that samples from the generator are degenerative when instantiating the GANs. In <ref type="bibr" target="#b1">[2]</ref> it is remarked that if the dimension k of the prior distribution p z is smaller than the dimension n of the data distribution, the output of the generator will always lie in the k-dimensional manifold in the n-dimensional space. Also, the distribution of the real data p data lies often in a o-dimensional manifold with o &lt;&lt; n. Having two distributions which lie on lower-dimensional manifold, results in the situation that the support of the real data distribution p data and the generated distribution p g often are non-overlapping. In such cases minimizing divergences is meaningless as they are "maxed out" <ref type="bibr" target="#b1">[2]</ref>. Furthermore, the discriminator can be perfect, which leads to instabilities and also to a vanishing gradient problem in the generator <ref type="bibr" target="#b1">[2]</ref>  <ref type="bibr" target="#b17">[17]</ref>.</p><p>As minimizing divergence is meaningless if the p g and p data are disjoint, the Wasserstein GAN (WGAN) is aiming at minimizing the Wasserstein distance instead of any type of f-divergence <ref type="bibr" target="#b2">[3]</ref>. A theoretical solution for ensuring that the degenerative distributions of the generator and the real data lying on a lowdimensional manifold overlap, is to add noise to the generated and the real data <ref type="bibr" target="#b1">[2]</ref>  <ref type="bibr" target="#b17">[17]</ref>.</p><p>Mode collapse and mode dropping One common failure of GANs happens when the generator collapses to a set of parameters, and the GAN always outputs the same value. This output fools the discriminator so well that the discriminator cannot discriminate the fake samples from the real data.</p><p>Similar to mode collapse is mode dropping. As there is no communication between the points in GANs, it can happen that the loss function is close to the optimum the score of the fake samples G(z) are all almost 0.5 which indicates that the algorithm has almost reached the Nash equilibrium, but some modes are not captured and missed out.</p><p>Our approach, we introduce in this paper, focuses on manipulating the prior distribution. In <ref type="bibr" target="#b0">[1]</ref> also the prior is manipulated, but they use an associative memory in the learning process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Masked and weighted prior distribution for GANs</head><p>In this section, we introduce our novel approaches to stabilize the training of GANs by finding modes in the prior distribution. We achieve this by masking and weighting the latent distribution of the GAN during training.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Using the information of the discriminator</head><p>The standard GAN samples a batch {z (1) , . . . , z (m) } 3 of size m from the prior distribution p z (z) and passes them to the generator. The prior distribution has a dimension of k, and its distribution is defined once and is constant over time. In practice, the uniform distribution U k (−1, 1) or the standard normal distribution N k (0, 1) are used. Training a GAN, there are two steps that are iteratively repeated: in the first step the discriminator is updated, and in the second step the generator. In both steps, we sample a batch {z (1) , . . . , z (m) } from the prior distribution p z (z) identically and independently. We propose to use the information the discriminator gives us about every faked sample when we pass the noise through the generator G(z (i) ). The information we obtain for a noise sample z (i) is a score:</p><formula xml:id="formula_1">s (i) := s(z (i) ) := D(G(z (i) ))<label>(2)</label></formula><p>The score s (i) lies in the case of standard GAN in [0, 1] and gives us information about how likely it is that the generated samples fool the current parameterization θ g of the discriminator D. Having this information, we restrict and manipulate the prior distribution before we resample again from the manipulated prior distribution and optimize the generator and the discriminator with a batch of resampled noise values {z</p><formula xml:id="formula_2">(1) r , . . . , z<label>(m) r</label></formula><p>}. We distinguish between two different approaches:</p><p>Masking the prior by restricting it to the portion which has a higher probability of fooling the generator. This gives a hard constraint, and only the part of p z (z) is processed, which falls into this region. The portion of the part being restricted is a hyperparameter r which lies in (0, 1]. Because the rate r determines the portion of the prior distribution we select from, a rate of 1 means that we draw from the normal GAN prior constantly and a rate r close zero means we only optimize for a tiny region. In Definition 1, we denote the density function of the masked prior.</p><p>Definition 1. The probability density function of the masked prior is defined as</p><formula xml:id="formula_3">p z,masked (z) = 1 r • p z (z) if P (s(p z (z)) &gt; s(x)) &gt; 1 − r f or x ∼ p z 0 otherwise<label>(3)</label></formula><p>Weighting the prior by using the score to define a weight. In this case, we resample from the prior distribution weighted by the scores, respectively, a function of the scores. The new density is given in Definition 2.</p><p>Definition 2. The probability density function of the weighted prior is defined as</p><formula xml:id="formula_4">p z,weighted (z) = p z (z) • w(s(z))<label>(4)</label></formula><p>and w is chosen such that it holds</p><formula xml:id="formula_5">∀s 0 , s 1 ∈ [0, 1] : s 0 &lt; s 1 ⇒ w(s 0 ) &lt; w(s 1 )<label>(5)</label></formula><p>and</p><formula xml:id="formula_6">z p z (z) • w(s(z))dz = 1<label>(6)</label></formula><p>Equation 5 ensures that a higher score leads to a higher probability to be drawn and equation 6 guarantees that p z,weighted is a density as the weights are normalized. We propose to define w(s) such that it additionally holds ∀s 0 , s 1 ∈ (0, 1] :</p><formula xml:id="formula_7">s 0 s 1 ⇒ w(s 0 ) w(s 1 )<label>(7)</label></formula><p>Equation 7 leads to the proportionality between s and w(s). In the following, we always call the GAN with a fixed prior distribution traditional GAN or GAN with a constant prior distribution. In Figure <ref type="figure" target="#fig_0">1</ref>, we see how the theoretical prior distribution changes by looking at an example in the one-dimensional case. While training a traditional GAN, we draw from a uniform distribution that has a minimum value of −1 and a maximum value of 1. In this example the scores are determined by the function f (z) = (z − 0.3) 3 + z + 0.0173 for x ∈ [−1, 1] (Figure <ref type="figure" target="#fig_0">1 c</ref>) ). The theoretical distribution of weighting changes due to the priors, if the weighting function is w(z) = f (z). The masked prior of r = 0.5 can be seen under e). The lower the masking score is, the higher their density value is because the range we draw from decreases.</p><p>As p z (z) is continuous, it is impossible to get the score of each point a priori because we have uncountable infinity points. Therefore, we have to find another way to draw appropriately from the theoretical distributions p z,masked and p z,weighted . We resample from the batch Z retrieving after masking or weighting it. In the case of masking, this means that we keep the samples with the higher score and resample from them Z * ⊆ Z as showcased in Figure <ref type="figure" target="#fig_0">1</ref> f). In case of weighting, this means that we assign a weight to every sample and resample again from the batch Z weighted with the batch of weights W = {w(s (1) ), . . . , w(s (n) )}(Figure <ref type="figure" target="#fig_0">1 d</ref>)). If we have a minibatch size of m, it follows that the pre-sample size n is higher than m to get enough diversity for each training step. This is required because the masked region we resample from only has a size of r • n, and we want that value not to be much smaller than m and ideally higher. The distribution -we draw our masked and weighted samples from in the algorithm -is not continuous anymore but is based on the pre-sample batch. Therefore, we do not use densities for masked and weighted prior in the algorithm but probability mass functions. Thus, we slightly adjust Definition 1, leading to Definition 3. Definition 3. The probability mass function of the masked prior is defined as (1) , . . . , z (n) } and s(x) &gt; pct 1−r ({s (1) , . . . , s (n) })</p><formula xml:id="formula_8">pm z,masked (z) = 1 r•n if z ∈ {z</formula><formula xml:id="formula_9">0 otherwise<label>(8)</label></formula><p>where pct 1−r is the 1 − r percentile. Note, that we assume that the elements in {z (1) , . . . , z (n) } are disjunct which is true with a probability of 1 as they are drawn from a continuous distribution. The finite sample version of the weighted prior is defined in Definition 4.</p><p>Definition 4. The probability mass function of the weighted prior is defined as</p><formula xml:id="formula_10">pm z,weighted (z) = p(z) • w(s(z)) if z ∈ Z = {z (1) , . . . , z (n) } 0 otherwise (<label>9</label></formula><formula xml:id="formula_11">)</formula><p>and w is chosen such that it holds</p><formula xml:id="formula_12">∀s 0 , s 1 ∈ [0, 1] : s 0 &lt; s 1 ⇒ w(s 0 ) &lt; w(s 1 )<label>(10)</label></formula><p>and </p><formula xml:id="formula_13">n i=1 w(s (i) ) = 1<label>(11)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head><p>In this section, we discuss the results of applying our novel learning algorithm to GANs on different data sets. The data sets are both a synthetic toy data set, and standard deep learning data sets MNIST and CelebA.We train the GAN using a multi-layer perceptron as well as the DCGAN. For the DCGAN we use the implementation of Taehoon Kim<ref type="foot" target="#foot_1">4</ref> . To spectate the effect of masking and weighting the prior distribution, we apply our GAN extension on a mixture of eight Gaussians lying in R 2 . The eight Gaussian mode data set has been used to show stabilizing effects in <ref type="bibr" target="#b3">[4]</ref>, <ref type="bibr" target="#b16">[16]</ref>, and <ref type="bibr" target="#b13">[13]</ref>.</p><p>Besides applying our modified GAN on a mixture of eight Gaussians, we also apply them to MNIST <ref type="bibr" target="#b5">[6]</ref> and CelebA <ref type="bibr" target="#b11">[11]</ref>, two datasets which are commonly used in deep learning and image processing tasks. We compare the performance of the different GANs on an example of eight modes. The parameters of the networks are summarized in Table <ref type="table" target="#tab_0">1</ref>. We use Wasserstein-GAN <ref type="bibr" target="#b2">[3]</ref> with five discriminator steps per one training step of the generator, a minibatch size of 500, a learning rate of 0.0001, and we train the GAN for 50 epochs. The masking rate is 0.5, and for masking and weighting, we have a pre-sample size of 1000. In Figure <ref type="figure" target="#fig_1">2</ref>, we see the result of the GANs. The traditional GAN does not capture all modes. It can be observed that especially two modes have gotten a lot of mass. Also, some outliers between each mode are visible. These are visible anymore if we sampled masked during generation. Having a look at the prior distribution in Figure <ref type="figure" target="#fig_2">3</ref>, we see that different areas get a higher score, which leads to the modes. But as we do not use this information during training discriminator and generator. Looking at the results of the masked and the weighted GAN we see a huge improvement. The prior distribution is separated into eight modes which correspond to the eight modes in the resulting distribution. The EMDs of the GAN with a traditional prior distribution is 0.3195. Masking the prior distribution of a traditional GAN only for the generation, the EMD score is even a little bit higher (0.3378). Although the outliers disappear, masking only the prior distribution during the generation does not help in this case. The EMD of the weighted and the masked GAN though are smaller: 0.0891 respectively 0.1328. We repeated this also with the standard GAN with the alternative loss function. The results were similar.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Mixture of Gaussians</head><p>We also want to investigate the influence of the masking rate more detailed. So far, we only used the rate of 0.5, but in general, every value r ∈ (0, 1] is possible. We repeated the experiment for several rates from 0 to 1. We used the same parameter setting but increased the pre-sample size to 2000 that we have more points to resample from which especially helps the low masking rates as they mask out most of the points. In Figure <ref type="figure">4</ref>, we see on the x-axis the rate and on the y-axis the resulting EMD. We plot the average EMD of three different runs of the experiments. We observe that a low masking rate does not work in this case as it restricts too much of the prior distribution. A masking rate of 0.5 to 0.9 improves the GAN in quality as it has a smaller EMD (Figure <ref type="figure">4</ref>).</p><p>MNIST On MNIST we train a GAN using only multilayer perceptron (MLP) networks as well the DCGAN architecture. The hyperparameters of the MLP version are based on the parameters in <ref type="bibr" target="#b8">[8]</ref> but we use a slower learning rate of 0.01. The minibatch size is 100 and SGD is used with a momentum of 0.5 at the start, which increases to 0.7 at the epoch of 250. The model is trained 300 epochs and several times with different starting states of the seed. In Figure <ref type="figure">5</ref>, we see the resulting images of the traditional GAN and the output of the two masked GAN and the weighted GAN. Whereas the quality of the resulting pictures is similar, respectively, one cannot see a clear difference, we see that the traditional GAN lacks to capture different modes and only captures the digit 1. Masking the GAN and weighting the GAN solves this problem and leads to more stable results. In Figure <ref type="figure" target="#fig_4">6</ref>, we observe the bar charts of the resulting distribution drawing 2000 samples from the generator and classifying them. They underline that the modes are captured more fairly for our approach. We also use the DCGAN architecture to learn the MNIST. The architecture of the generator and the discriminator nets are adapted from <ref type="bibr" target="#b15">[15]</ref>. For the convolutional and transposed convolutional layers, we use a filter width and filter height of 5 and a stride of 1 respectively 2.</p><p>The results can be seen in Figure <ref type="figure" target="#fig_5">7</ref>. Also, in this setting, the fairness of modes is better when applying masking and weighting. CelebA We also apply our new proposed GAN on the CelebA data set. We use the DCGAN architecture as well as the parameters of <ref type="bibr" target="#b15">[15]</ref>. We train the model for 10 epochs. Besides, we reduce the depth of the convolution layers for a second experiment. This time we allow it to train for 25 epochs as we want to guarantee that it has time to converge. Reducing the depth of convolution has also been done in <ref type="bibr" target="#b9">[9]</ref> to show stabilization effects.</p><p>In Figure <ref type="figure" target="#fig_7">8</ref>, the results of the three different GANs are shown. We observe that for the first parameter setting traditional GAN, masked GAN, and weighted  GAN produce results of similar quality. If we reduce the depth of the convolutional layers in the generator and the discriminator, the traditional GAN captures only a few modes and is not able to replicate the faces properly. However, also the masked and the weighted GAN images become worse as we see a lot of similar faces and not the full variety like in the training set. Also, the quality is reduced, which is caused by the weaker architecture. Nevertheless, the quality of the results of masked and weighted GAN is better than the traditional GAN.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion and future work</head><p>In this paper, we propose a new extension of GANs, which focuses on reducing the prior distribution to particular regions instead of leaving it constant. Our experiments show the potential of the novel idea as it decreases the EMD between the training data and the generated data. In the case of estimating a multimodal distribution, we noticed that masked GAN finds the corresponding island in the latent space. On MNIST, we could observe that the generated distribution is fairer when applying masking and weighting.</p><p>In the future, we want to tackle the uncomfortableness that we, in our essential extension, have to use the discriminator to generate new samples. We think that it is worthwhile to eliminate this. We have two different ideas in mind: diminishing the masking and weighting effect by either increase the masking rate or blurring out the weights, and optimizing for the prior distribution, which shows higher gradients.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. In a), c) and e) examples of theoretical distributions of constant, weighted, and masked 0.5 can be seen. In the right column, there are the empirical counterparts. Note, that d) and f) are based on restricting b) and are not samples from the restricted density functions c) and e).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. This figure shows the heatmap of the generated distribution and the EMDs of GAN experiments on eight modes.</figDesc><graphic coords="7,146.96,452.07,67.17,67.17" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig.3. This figure shows the prior distribution of the GANs after 50 epochs for eight modes. We mask the prior for the traditional GAN also although we do not use this information during training.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 4 .Fig. 5 .</head><label>45</label><figDesc>Fig. 4. This figure shows the earth mover's distance depends on the hyperparameter masked rate r for WGAN a) and the standard GAN b).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 6 .</head><label>6</label><figDesc>Fig. 6. Barchart of the different distributions of MNIST</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Fig. 7 .</head><label>7</label><figDesc>Fig. 7. This figure shows the results we have on MNIST using DCGAN.</figDesc><graphic coords="10,153.69,380.36,100.80,100.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Fig. 8 .</head><label>8</label><figDesc>Fig. 8. The upper row figures a), b) and c) show the results of a normal trained DCGAN. Figures d),e) and g) are results where the depth of the convolutional layers is reduced.</figDesc><graphic coords="11,146.15,252.80,100.80,100.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>This table shows the parameter setting we use for the eight modes</figDesc><table><row><cell>Layer</cell><cell>Layer type</cell><cell>Hyperparameters</cell></row><row><cell>input</cell><cell>z</cell><cell>500 components sampled from U2(−1, 1)</cell></row><row><cell>1</cell><cell>Fully Connected</cell><cell>128 Neurons, ReLU</cell></row><row><cell>2</cell><cell>Fully Connected</cell><cell>128 Neurons, ReLU</cell></row><row><cell cols="2">output Fully Connected</cell><cell>2 (3) Neurons, tanh</cell></row><row><cell>Layer</cell><cell>Layer type</cell><cell>Hyperparameters</cell></row><row><cell>input</cell><cell>x</cell><cell>(2, 1) or (3, 1)</cell></row><row><cell>1</cell><cell>Fully Connected</cell><cell>128 Neurons, tanh</cell></row><row><cell>2</cell><cell>Fully Connected</cell><cell>128 Neurons, tanh</cell></row><row><cell>3</cell><cell>Fully Connected</cell><cell>128 Neurons, tanh</cell></row><row><cell cols="2">output Fully Connected</cell><cell>1 Neuron, Sigmoid</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">Note, that we denote the batches as sets although there are not mathematical sets but arrays or vectors of samples. We adapted the set notation from<ref type="bibr" target="#b8">[8]</ref> </note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">https://github.com/carpedm20/DCGAN-tensorflow</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgments We thank Robert Vandermeulen, Lukas Ruff and Grégoire Montavon for fruitful discussions.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Associative adversarial networks</title>
		<author>
			<persName><forename type="first">T</forename><surname>Arici</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>¸elikyilmaz</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>CoRR</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Towards principled methods for training generative adversarial networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Arjovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bottou</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>ICLR</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Wasserstein generative adversarial networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Arjovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chintala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bottou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICML</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="214" to="223" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Mode regularized generative adversarial networks</title>
		<author>
			<persName><forename type="first">T</forename><surname>Che</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Jacob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>ICLR</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Image anomaly detection with generative adversarial networks</title>
		<author>
			<persName><forename type="first">L</forename><surname>Deecke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Vandermeulen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ruff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mandt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kloft</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECML PKDD</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="3" to="17" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">The MNIST database of handwritten digit images for machine learning research [best of the web</title>
		<author>
			<persName><forename type="first">L</forename><surname>Deng</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">IEEE Signal Process. Mag</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="141" to="142" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">J</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Courville</surname></persName>
		</author>
		<title level="m">Deep Learning. Adaptive computation and machine learning</title>
				<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Generative adversarial nets</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">J</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pouget-Abadie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mirza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Warde-Farley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ozair</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Courville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">NIPS</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="2672" to="2680" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Improved training of wasserstein gans</title>
		<author>
			<persName><forename type="first">I</forename><surname>Gulrajani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ahmed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Arjovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dumoulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Courville</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">NIPS</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="5769" to="5779" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Deligan: Generative adversarial networks for diverse and limited data</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gurumurthy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">K</forename><surname>Sarvadevabhatla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">V</forename><surname>Babu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CVPR</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="4941" to="4949" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Deep learning face attributes in the wild</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Tang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICCV</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="3730" to="3738" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Deep multi-scale video prediction beyond mean square error</title>
		<author>
			<persName><forename type="first">M</forename><surname>Mathieu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Couprie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">4th International Conference on Learning Representations, ICLR 2016</title>
				<meeting><address><addrLine>San Juan, Puerto Rico</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">May 2-4, 2016. 2016</date>
		</imprint>
	</monogr>
	<note>Conference Track Proceedings</note>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Unrolled generative adversarial networks</title>
		<author>
			<persName><forename type="first">L</forename><surname>Metz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Poole</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pfau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sohl-Dickstein</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>ICLR</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Conditional generative adversarial nets</title>
		<author>
			<persName><forename type="first">M</forename><surname>Mirza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Osindero</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
			<publisher>CoRR</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Unsupervised representation learning with deep convolutional generative adversarial networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Metz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chintala</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ICLR</title>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Stabilizing training of generative adversarial networks through regularization</title>
		<author>
			<persName><forename type="first">K</forename><surname>Roth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lucchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Nowozin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hofmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">NIPS</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="2015" to="2025" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Amortised MAP inference for image super-resolution</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">K</forename><surname>Sønderby</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Caballero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Theis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Huszár</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>ICLR</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Xue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Freeman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tenenbaum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">NIPS</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="82" to="90" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
