<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Information Technology and Interactions, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Comprehensive Study of Autoencoders' Applications Related to Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Volodymyr Kovenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ilona Bogach</string-name>
          <email>ilona.bogach@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DNN</institution>
          ,
          <addr-line>autoencoders, VAE, CNN, generative models</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vinnytsia National Technical University</institution>
          ,
          <addr-line>Khmelnytsky highway 95, Vinnytsia, 21021</addr-line>
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>0</volume>
      <fpage>2</fpage>
      <lpage>03</lpage>
      <abstract>
        <p>This article incorporates a comprehensive study of autoencoders' applications related to images. First of all, a vanilla autoencoder is described along with details of its architecture and training procedure. Secondly, main methods for regularization of it are exposed, such as dropout and additive gaussian noise. The applications of autoencoders such as image morphing, reconstruction and search are shown. Then, the VAE (variational autoencoder) is highlighted. Main applications of it such as outliers detection and image generation are described. Finally, it's shown that using warm-up for VAE with respect to KL loss gives much more plausible results in terms of image generation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Nowadays the family of machine learning algorithms called neural networks has become a
firstvariant solution for a large set of problems, starting from image classification and ending with voice
generation. The advances in the subdomain of neural networks named deep learning made feature
engineering much easier, as a big number of nonlinear transformations in DNNs (deep neural
networks) serves as feature extractor. What is more, DNNs are capable of finding the hidden structure
of the data and compressing it, saving the most relevant information. These capabilities made DNN’s
a nice choice for dimensionality reduction tasks. Moreover, if to compare dimensionality reduction of
DNNs and PCA[1], the first outperforms the last, as the nonlinearity of DNNs helps them to compress
data in a much more complex way. The capability of finding a hidden structure of the data comes with
a possibility to reconstruct data from it. This particular feature is the main aspect which provides us
with an opportunity to reconstruct corrupted data, generate new data samples and so on. In this article
the applications of autoencoders concerning image data are discussed. As image data is quite complex
in terms of the amount of features, the convolutional neural network architecture is used in
autoencoder. Despite the fact that autoencoder is trained for image reconstruction, it can be used to
tackle various domain tasks, such as image morphing, searching for similar images and transfer
learning. However, the vanilla autoencoder is not capable of image generation, as it is trained only for
data reconstruction from corrupted/non corrupted samples. For the purpose of image generation, VAE
(variational autoencoder)[2] is used. Along with GAN[3], VAE has become a very popular choice for
the generative procedure, as it manages to reconstruct the data with a high accuracy and also satisfies
the condition that all the compressed representations of data samples are as close to each other as
possible, while still being distinct. The mechanics of the autoencoder and VAE are exposed. The
experiments were conducted on a publicly available dataset named “Anime-Face-Dataset”[4]. The up</p>
      <p>2020 Copyright for this paper by its authors.
to date improvements and applications of generative models for images are left for further
discussions.</p>
      <p>Though we didn’t focus on the comparative analysis of our model and existing implementations,
the analysis was conducted in terms of choosing the hyper-parameters, types of AE's model
architectures and training procedures.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Anime-Face-Dataset</title>
      <p>The dataset, called “Anime-Face-Dataset”, on which experiments were performed consisted of
63632 RGB images of anime characters faces (Figure 1).</p>
      <p>The images were resized to 128x128 pixel size and normalized by max scaling in order them to be
in range from 0 to 1. Then, the data was randomly splitted into two subsets: train (75%) and validation
(15%), and used in the same way for all the experiments.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Autoencoder 3.1.</title>
    </sec>
    <sec id="sec-4">
      <title>General Overview</title>
      <p>Autoencoder is an architecture of neural network which is used to reconstruct data. It consists of
two parts: encoder and decoder (Figure 2).</p>
      <p>,
, )</p>
      <p>
        2
, | ,
reconstructed data is minimized.
or MAE (mean absolute error) (
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ).
      </p>
      <p>
        The main goal of encoder is to compress the data in the way that main features of it are preserved,
thus helping a decoder to reconstruct data from the compressed representation. Autoencoders are
trained using back-propagation[5] algorithm in the way that the difference between original and
The obvious choice for a loss function to train such type of networks is MSE (mean squared error)
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
where  
, - is an input image and
      </p>
      <p>, - is a reconstructed image.</p>
      <p>Despite the fact that autoencoder resembles PCA algorithm, it’s much more powerful when
dealing with complex and nonlinear data due to nonlinearity of neural networks. While working with
image data, the convolutional architecture of encoder and decoder is usually used. This choice is
proven by the fact that convolutional layers tend to capture temporal and spatial dependencies in
image through learning relevant filters. Also, by increasing the number of convolutional layers, the
receptive field also increases, thus the last layers of convolutional neural network contain much more
complex features (it will be shown further in the paper). Max pooling layer, which main goal is
decreasing overfitting and feature map size and making model more robust to rotations of pixels, was
used after each convolutional one. In order to reconstruct data decoder uses transpose convolutions
that learn filters to perform upsampling operation (Figure 3).</p>
      <p>As the data was normalized in the range from 0 to 1, the function used in the output layer is
sigmoid. The vanilla autoencoder was trained for 10 epochs using Adamax[6] optimizer with batch
size of 128 samples. The same settings of batch size and normalization are used throughout the paper.
The results showed that vanilla autoencoder didn’t experience overfitting (Figure 4).</p>
      <p>However, the model trained in such a way produces quite blurry reconstructions, it still preserves
various attributes such as hair color, eyes size and so on. (Figure 5)
where 
image.</p>
      <p>
        The interesting and fun application of vanilla autoencoder is so called “image morphing”. The gist
of it, is that the one can make the image which will have mixed attributes from two different images.
To achieve this, the one makes the linear combination of two image codes from encoder and passes it
to decoder (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) (Figure 6):
  =  (  1 ∗ (1 −  ℎ ) +   2 ∗  ℎ ),  ℎ 0
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
≤  ℎ ≤= 1 ,
ℎ ∈ [0,1],   1 and   2 are outputs of encoder part of model for first and second
      </p>
      <p>The one key property of autoencoder is that via training it for data reconstruction its encoder part
learns complex feature representations (Figure 7). This property makes an encoder part of a model a
nice candidate for such application as transfer learning[7]. What is more, using the encoder, the one
can seek for similar images by calculating the distance between image codes using KNN[8] (Figure
8).</p>
      <p>Despite the fact that vanilla autoencoder has lots of applications, it’s never used for the goal it was
trained for: image reconstruction, as there is no logic in reconstructing an original good image into a
blurred version of it. However, what one can do is to train the autoencoder to reconstruct a corrupted
image into a good one. For achieving this goal, the data is usually augmented with noise/rotations/etc.
This trick also boosts regularization of a model, as now it has to deal with much more difficult task. In
this paper the denoising autoencoder is considered. It’s trained to reconstruct images with additive
gaussian noise with mean of 0 and standard deviation of 0.3 into original ones (Figure 9).</p>
      <p>The other possible way to construct denoising autoencoder [9] is to corrupt data on the level of a
model, using a dropout layer (with rate of setting units to 0 equal to 0.3) right after the input one in
the encoder part of the network (Figure 10).</p>
      <p>Though autoencoder can be used to make fascinating things, its reconstructions aren’t very
detailed and this kind of model isn’t able to generate new data samples. Further in the paper we will
consider a probabilistic model which gives an opportunity to produce new data samples from gaussian
noise.
4. VAE</p>
    </sec>
    <sec id="sec-5">
      <title>4.1. General Overview</title>
      <p>The task of understanding data is crucial for generative applications. If we think about generative
process in probabilistic setting, our goal is to model the distribution of our data, mainly  ( ). Having
the model of data distribution, we can then sample from it to generate new data and detect outliers.
However, it turns out that modeling image data is quite a difficult task, as ordinary methods are
infeasible, too restrictive or too slow in terms of synthesizing new data. If we represent the model of
our data by marginalizing out latent variable t (where t is some latent variable, x is conditioned on),
we will arrive at the following integral:  ( ) = ∫  ( | ) ( )
which is intractable (thus, there is
no way we can evaluate marginal likelihood) and true posterior:  ( | ) =
 ( )
 ( | )∗ ( ) is also intractable
(thus, we can’t use EM[10] algorithm). One workaround is to model image distribution using VAE.
VAE is mainly a framework for efficient approximation of ML (maximum likelihood) or MAP
(maximum a posteriori) estimation for the parameters  (parameters of the decoder part of a model),
which gives a possibility to mimic the random process and generate artificial data. VAE fights the
intractability by approximating the intractable true posterior using a probabilistic encoder:  ( | ),
which produces the distribution over latent variables  given data  . In a similar way it represents
 ( | ) using decoder, as given latent code  it produces the distribution of corresponding values of  .
As VAE’s decoder and encoder are neural networks of some architecture, the whole model should be
trained using gradient based methods and back propagation algorithm w.r.t objective function which
involves gradients computation. However, it turns out that it’s infeasible to compute the gradient of
 ( | ,  ), where  is the parameters of the encoder part of the model. The solution proposed by
Kingma is to use a so called reparameterization trick[11], that changes sampling   from  (  |  ,  )to

 =  ⊙   +   (where</p>
      <p>is a random variable from standard gaussian distribution  (0,  ),   and
  are standard deviation and mean modeled by encoder) which is now deffirantiable. The overall
architecture of VAE looks the following way (Figure 11).</p>
      <p>
        The objective function of VAE consists of two parts: reconstruction loss and regularization (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ):


 (  |  ,  ) − 
(  (  )|| (  )),
where  (  |  ,  ) is a probability of image   given latent code   and decoder weights  ,  ( ) is a
prior distribution and  ( ) is a posterior approximation.
      </p>
      <p>
        The first term in objective is often replaced with the reconstruction loss from vanilla autoencoder,
MSE or MAE, however while experimenting with MNIST data, Kingma used to model reconstruction
loss as Bernoulli distribution (which is actually a convenient choice, as MNIST data is grayscale and
is often normalized between 0 and 1). For the case of RGB images, which is considered in this paper,
the reconstruction loss is modeled as Gaussian distribution (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ).
      </p>
      <p>( | (  ),  2(  )) = 
[  =1</p>
      <p>= −

2

=
−  =1
(2 ) −   =1
 (  − 
(  ) )</p>
      <p>2
2 2(  )
 (  − 
2 2(  )
√2  2(  )</p>
      <p>1
−
1
2

+

−
(  − (  ) )
2 2(  )
2</p>
      <p>]
(2  2(  )) =
2
where  (  | (  ),  2(  )) is a probability of image   i given  (  ) and  (  ) which are dependent
on latent code</p>
      <p>
        Different from Kingma and following Doersch approach[12],  2( )
wasn’t modeled by a
decoder, but set as a hyperparameter which was then tuned in experiments. The second part of
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
objective corresponds to Kulback - Leibler divergence[13], which is a measure of how one probability
distribution is different from a second (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ).
      </p>
      <p>Assuming that both prior  ( ) and posterior approximation  ( ) are Gaussian (( ( ) =  (0,  ),
 ( ) =  ( ;  ,  2)) the KL term can be integrated analytically (formula 7).</p>
      <p>
        (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) 

( ( )|| ( )) =
∫
 ( )
= ∫
 ( )(
      </p>
      <p>( ) −
 ( )
 ( )</p>
      <p>
        ( ))
=
∫
 ( )
 ( )
− ∫
 ( )
 ( ) ;
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) ∫
 ( )
 ( ) 
= ∫
 ( ;  ,  2)
 (0,  ) 
= − 

2
      </p>
      <p>
        (2 ) −
2
1    =1(  2 −   2);
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) ∫
 ( )
 ( ) 
= ∫
 ( ;  ,  2)
 ( ;  ,  2) 
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) 
( ( )|| ( )) = − 
(2 ) − 1    =1(  2 −   2) + +
      </p>
      <p>(2 ) +
2
= −

2


2
2
1    =1(1 + 
2
= 1    =1(1 + 
1
2
(2 ) −
   =1(1 +</p>
      <p>( 2));
2
( 2))</p>
      <p>=
( 2
)
−  2 −  2),
where  - dimensionality of latent code  ,  and  - mean and standard deviation which are produced
by model.</p>
      <p>KL term serves as regularization as it forces synthetized data to come from the same distribution
(it encourages the approximate posterior to be close to the prior). As it will be shown further, with
time KL part of the loss starts to increase as VAE generates data samples that differ from each other.
4.2.</p>
    </sec>
    <sec id="sec-6">
      <title>Experiments and applications</title>
      <p>
        The experiments were conducted with respect to variating  2( ). The architecture of VAE’s
autoencoder and encoder is the same as of discussed AE with small changes to fit the framework’s
mechanics. The results were validated due to the quality of VAE’s generative process. The way to
generate new samples using VAE (sample from  ( )) is fairly simple: during the inference only the
generative part of the model (decoder that learned mapping from latent space  to data  ) is used,  is
modeled as standard normal distribution. Due to (
        <xref ref-type="bibr" rid="ref6">6</xref>
        )  2( ) is situated in the denominator, thus if its
value is close to 1, the overall reconstruction loss will be a classical MSE + constant. Whereas if its
value is close to 0, the loss will be infinitely large. According to these assumptions,  2( ) is
sometimes referred to as regularization hyper-parameter. Three different values of  2( ) were tried
out: 0.1, 0.01, 0.001; and each model was trained for 200 epochs (Figure 12).
      </p>
      <p>Due to Figure 12 it’s obvious that the best value for  2 is 0.01, as it gives the most plausible
generation. With  2 = 0.1 the left part of the objective function resembles the classical MSE loss and
the overall training converges to producing a picture which is just the average of samples in the
dataset. With  2 = 0.001 the left side of objective is too huge and it takes a lot to optimize it.</p>
      <p>
        The other experiment which was performed is training a model with so-called KL loss warmup or
annealing suggested by Sønderby et al[14]. The gist of the approach is in adding new parameter to our
objective called β, that is a multiplicator of KL part (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ).
      </p>
      <p>(  |  ,  ) −  ∗ 
(  (  )||
(  )),
where  – annealing parameter.
(q(t)|| ( )) =
∫  ( )
 (t)
 ( )
 .</p>
      <p>
        (
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
      </p>
      <p>Apart from the approach suggested by Higgins et al[15], during annealing  isn’t set to constant,
but is increased from 0 to 1 during warmup epochs. According to Sønerby the main logic behind such
an approach is the fact that variational regularization term causes some of the latent units to become
inactive during training. To tackle this problem we can smoothly switch from deterministic
autoencoder (with  &lt; 1) to a variational one. Along with using KL warmup, batch normalization was
added after each convolutional layer of both encoder and decoder. The warmup was done for 40
epochs, after which the model continued to train for 200 epochs with the original objective function.
The comparison was done against the classical approach of training.</p>
      <p>According to Figure 13, KL loss tends to decrease as along with increase of  , its contribution to
the overall objective also increases. After  approaches 1, KL loss tends to increase as now our model
tries to generate distinct samples. As it’s shown on Figure 14 KL loss tends to increase both during
classical training and training with warmup. Compared to reconstruction loss related to classical
approach, the one related to training with warmup is lower on both training and validation subsets.</p>
      <p>It’s really hard to tell the difference between samples generated by VAE with batch normalization
and warmup and classical VAE, but one can notice that samples generated by VAE with warmup and
batch normalization are sharper a bit and contain less artifacts (Figure 15). Another application of
VAE is outliers detection. By training VAE, we force it to learn the distribution of images in the
observed data. Thus, using the encoder part of the network we can generate a distribution parameters
(mean and standard deviation) for any image, and then calculate the KL divergence for the
distribution of particular image versus prior one. If KL divergence is bigger than some predefined
threshold the image is an outlier. To calculate the threshold, the KL divergence for 10k samples from
the dataset was calculated, the median of scores was denoted as outlier threshold.</p>
    </sec>
    <sec id="sec-7">
      <title>5. Discussion and further work</title>
      <p>In this paper main applications of autoencoder and VAE in terms of image data were discussed.
The possibility of data reconstruction and modeling of its distribution opens doors to lots of
interesting applications, such as image morphing, image reconstruction, image generation and outliers
detection. Though the generative power of trained VAE isn’t brilliant, as images still aren’t that sharp
and some artifacts persist, training it for longer should enhance results. It was shown that using a
warmup of KL loss can give better results in terms of training and lead to better generative results in
perspective.</p>
      <p>Nevertheless, a big part of applications was highlighted, with the improvements of technologies,
new usages of generative models arise. For example, Ma et al. uses a so-called Style-based VAE[16]
to tackle the problem of super-resolution, Larsen et al. [17] mixes VAE and GAN to replace
elementwise errors with feature-wise errors which leads to better data distribution learning. Recently, GANs
have attracted lots of attention. A proposed by Nvidia Style-GAN[18] gives a possibility to generate
samples that are hard to discriminate from real ones even for a human. Although all the exposed
usages of autoencoders are related to image data, this framework is often used with other types of data
as text and signals.</p>
      <p>The example of outliers detection with VAE is shown on Figure 16.</p>
    </sec>
    <sec id="sec-8">
      <title>6. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Jake</given-names>
            <surname>Lever</surname>
          </string-name>
          ,
          <source>Principal component analysis</source>
          ,
          <year>2017</year>
          . URL: https://www.nature.com/articles/nmeth.4346.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Diederik</surname>
            <given-names>P Kingma</given-names>
          </string-name>
          , Max Welling,
          <string-name>
            <surname>Auto-Encoding Variational</surname>
            <given-names>Bayes</given-names>
          </string-name>
          ,
          <year>2014</year>
          . URL: https://arxiv.org/pdf/1312.6114.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Ian</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Goodfellow</surname>
          </string-name>
          , Jean Pouget-Abadie, Mehdi Mirza, Bing Xu,
          <source>Generative Adversarial Networks</source>
          ,
          <year>2014</year>
          . URL : https://arxiv.org/pdf/1406.2661.pdf
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Brian</given-names>
            <surname>Chao</surname>
          </string-name>
          ,
          <string-name>
            <surname>Anime-</surname>
          </string-name>
          Face-Dataset,
          <year>2020</year>
          . URL: https://github.com/bchao1/Anime-Face-Dataset.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Yann le Cun</surname>
          </string-name>
          ,
          <article-title>A Theoretical Framework for Back-</article-title>
          <string-name>
            <surname>Propagation</surname>
          </string-name>
          ,
          <year>1988</year>
          . URL: http://yann.lecun.com/exdb/publis/pdf/lecun-88.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Diederik</surname>
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>Jimmy</given-names>
          </string-name>
          <string-name>
            <surname>Ba</surname>
          </string-name>
          .
          <article-title>Adam: A Method for Stochastic Optimization</article-title>
          ,
          <year>2014</year>
          . URL: https://arxiv.org/abs/1412.6980.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Jason</given-names>
            <surname>Yosinski</surname>
          </string-name>
          , Jeff Clune, Yoshua Bengio, and Hod Lipson,
          <article-title>How transferable are features in deep neural networks</article-title>
          ?,
          <year>2014</year>
          . URL: https://papers.nips.cc/paper/5347-how
          <article-title>-transferable-arefeatures-in-deep-neural-networks</article-title>
          .
          <source>pdf.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Zhonheng</given-names>
            <surname>Gang</surname>
          </string-name>
          ,
          <article-title>Introduction to Machine Learning : K-nearest neighbors</article-title>
          ,
          <year>2016</year>
          . URL: https://www.researchgate.net/publication/303958989_Introduction_to_machine_learning_Knearest_neighbors.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Dominic</given-names>
            <surname>Mon</surname>
          </string-name>
          , Denoising Autoencoders explained,
          <year>2017</year>
          . URL: https://towardsdatascience.com
          <article-title>/denoising-autoencoders-explained-dbb82467fc2.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Jason</given-names>
            <surname>Brownlee</surname>
          </string-name>
          ,
          <article-title>A Gentle Introduction to Expectation-Maximiztion (EM algorithm</article-title>
          ),
          <year>2019</year>
          . URL: https://machinelearningmastery.com/expectation-maximization
          <string-name>
            <surname>-</surname>
          </string-name>
          em-algorithm/.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>David</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Blei</surname>
          </string-name>
          ,
          <article-title>Variational inference: A review for statisticians, 2016</article-title>
          . URL: https://www.tandfonline.com/doi/full/10.1080/01621459.
          <year>2017</year>
          .
          <volume>1285773</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Carl</given-names>
            <surname>Doersch</surname>
          </string-name>
          .
          <source>Tutorial on Variational Autoencoders</source>
          ,
          <year>2016</year>
          . URL: https://arxiv.org/pdf/1606.05908.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Anna-Lena</surname>
            <given-names>Popkes</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kullback-Leibler Divergence</surname>
          </string-name>
          ,
          <year>2019</year>
          . URL: http://alpopkes.com/files/kl_divergence.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Casper</given-names>
            <surname>Kaae</surname>
          </string-name>
          <string-name>
            <surname>Sønderby</surname>
          </string-name>
          , Tapani Raiko,
          <article-title>Lars Maaløe and others</article-title>
          ,
          <source>Ladder Variational Autoencoders</source>
          ,
          <year>2016</year>
          . URL: https://arxiv.org/pdf/1602.02282.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Irina</surname>
            <given-names>Higgins</given-names>
          </string-name>
          , Loic Matthey,
          <article-title>Arka Pal and others, beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework</article-title>
          ,
          <year>2017</year>
          . URL: https://openreview.net/references/pdf?id=
          <fpage>Sy2fzU9gl</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Xin</surname>
            <given-names>Ma</given-names>
          </string-name>
          , Yi Li,
          <article-title>Huaibo Huang and others</article-title>
          ,
          <source>Style-based Variational Autoencoder for Real-World Super-Resolution</source>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/pdf/
          <year>1912</year>
          .10227.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Anders</given-names>
            <surname>Boesen Lindbo Larsen</surname>
          </string-name>
          , Søren Kaae Sønderby,
          <article-title>Hugo Larochelle and Ole Winther, Autoencoding beyond pixels using a learned similarity metric</article-title>
          ,
          <year>2016</year>
          . URL: https://arxiv.org/pdf/1512.09300.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Tero</surname>
            <given-names>Karras</given-names>
          </string-name>
          , Samuli Laine,
          <string-name>
            <given-names>Timo</given-names>
            <surname>Aila</surname>
          </string-name>
          ,
          <article-title>A Style-Based Generator Architecture for Generative Adversarial Networks</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/pdf/
          <year>1812</year>
          .04948.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>